mirror of https://git.evulid.cc/cyberes/char-archive-server.git synced 2026-04-12 04:10:14 -06:00

Website and backend server for the Character Archive. https://char-archive.evulid.cc

Python 48%
Vue 36%
TypeScript 6.8%
CSS 4.1%
JavaScript 2.9%
Other 2.2%

Find a file

Cyberes 014bd4da72 update link		2025-10-03 23:13:08 -06:00
backend	add files	2025-10-03 23:03:46 -06:00
cloudflare-cache-rules	add files	2025-10-03 23:03:46 -06:00
frontend	add files	2025-10-03 23:03:46 -06:00
matomo stats	add mato stats	2025-10-03 23:12:21 -06:00
other	add files	2025-10-03 23:03:46 -06:00
.gitignore	add files	2025-10-03 23:03:46 -06:00
README.md	update link	2025-10-03 23:13:08 -06:00

README.md

chub-archive-server

Website and backend server for the Character Archive.

This is the source code for the char-archive website. The software stack consists of a Python backend, Postgres database, Vue.js frontend, and Meilisearch index.

Be aware of the fact that the scrapers are pulling extremely questionable data from across the internet. Any shitty content is hosted under your name.

No commit history is provided because this project wasn't developed for public consumption of the source.

Remember to have fun.

About the Project

Chatbots powered by artificial intelligence have been around for decades, but only recently have they become capable of engaging in human-like interactivity. Following the release of OpenAI's GPT-3.5 in March of 2022, creative individuals discovered that the AI could take on "personalities" and role-play as a character. A community formed around chatting with these "bots" and sharing the "character cards" that defined a personality. Concerned about the capabilities of the AI and the creativity of the users, the corporations that owned the AI models took steps to restrict this activity, claiming it was "out of scope" and "unsafe". The Character Archive was created to protect this creativity.

Like the website says, this project was created to archive a unique moment in AI history. It was pretty successful as well:

The site was showing very strong growth and visitors across the world were downloading all sorts of cards. Unfortunately, I was not able to devote any more time to running a complex website with many moving parts.

System Requirements

This server ran on this machine:

AMD Ryzen 7 5700X 8-Core Processor
128GB RAM (probably only need 64GB, this was originally for Elasticsearch)
2x 1TB SATA SSD
2x 1TB Samsung 990 Pro

The database and files are stored on the 990s, the VM/CT OSes are on the SATA SSDs.

Host Setup

This is a very very rough guide on how to get things up and running. I ran everything on Proxmox. These are the hosts:

Website

char-archive

CT. 12 cores, 13GB RAM. Runs the website and database.

Install Postgres and Python 3.12
Create your venv and install the requirements
Import the database and put the files somewhere
Figure out where the database connection config strings are (there are a few) and enter your details
Enable and start the Systemd timers
Enable and start the archive-server service
Enable and start the frontend-msg service
Deploy the image proxy Cloudflare Worker in Workers/image-proxy
Install the GeoIP database (see GeoIP.md)
Install node.js version 22.15.0
Go to search-parser/ and do npm install
Enable and start the search-parser.service
Download the latest release of cyberes/crazy-file-server. This serves the file browser and was originally built for serving the entire archive when it was just a collection of raw files. An example config file is located at backend/crazyfs.yaml. Don't worry about setting up Elasticsearch it isn't needed anymore.
Install and enable crazyfs.service
Set up the nginx website. An example config is provided.

You will probably have to dive into the code to figure out how the various services work. It isn't very complicated (the most complicated part is how the individual sites are abstracted and handled) but it wasn't designed to be distributed so there is no unified config.

I know for a fact that you will have to update the frontend code to account for your new website domain. The current code is set up with the assumption that it's running on char-archive.evulid.cc.

Meilisearch

char-meili

CT. 10 cores, 34GB RAM (could shrink these down probably 50%). Runs exclusively Meilisearch.

Install Meilisearch
python3 create-meilisearch.py

Proxy Router

proxy

CT. 4 cores, 4GB RAM. Runs a proxy router/load balancer. I don't recommend running this on a cloud VM as you will be moving multiple terabytes of data per month.

Set up or gain access to at least 1 proxy server. Squid works fine.
Download the latest release from cyberes/proxy-loadbalancer/releases. I put it in /srv/loadbalancer.
Follow the README.md file in proxy-loadbalancer to install it

I recommend setting this value in the config.

thirdparty_test_urls:
 - https://rentry.org/8ygmz29h
 - https://files.catbox.moe/1hvrlj.png
 - https://gateway.chub.ai/search?excludetopics=&first=20&page=1&namespace=*&search=sex&include_forks=true&nsfw=false&nsfw_only=false&require_custom_prompt=false&require_example_dialogues=false&require_images=false&require_expressions=false&nsfl=false&asc=false&min_ai_rating=0&min_tokens=50&max_tokens=100000&chub=true&require_lore=false&exclude_mine=true&require_lore_embedded=false&require_lore_linked=false&sort=default&min_tags=2&topics=&inclusive_or=false&recommended_verified=false&require_alternate_greetings=false&count=false

NFS Storage Server

char-datastore

VM. 4 cores, 5GB RAM (could increase these by 50%). Stores the data and ran an NFS server.

My /etc/exports file contains this:

/mnt/share 10.1.0.8(rw,sync,no_subtree_check) 10.1.0.11(rw,sync,no_subtree_check)

The two IPs are the char-archive and the char-scraper hosts.

In your Proxmox host, you will have to mount the NFS share since you can't easily mount it in the CTs. I put this in its /etc/fstab:

10.1.0.9:/mnt/share /mnt/lxc/nfs/share nfs auto,nofail,noatime,nolock,intr,tcp,actimeo=1800 0 0

Where 10.1.0.9 is the char-datastore IP.

Then give the CT access to it. For example, this is /etc/pve/lxc/xxx.conf:

mp0: /mnt/lxc/nfs/share,mp=/mnt/share

You may encounter issues regarding UIDs/GIDs. This can be a nightmare. The file other/mount-nfs-shares-in-lxc.md tries to help you but you're on your own.

Scraper

char-scraper

VM. 8 cores, 45GB RAM. Ran the web scrapers.

Create a new non-root user
Change to that user
git clone https://git.evulid.cc/cyberes/chub-archive-scraper
Create a venv and install the requirements
Install the services in systemd/ and enable the timers

The scraper is a spaghetti code disaster. It's gone through like 3 separate rewrites but has not gotten any less complicated. That said, a lot of work has been put into making sure the scrapers are reliable.

Data Setup

Download the last torrent and place the files on char-datastore. Then import the database. It's pretty simple.

Cloudflare

The website uses pretty heavy Cloudflare caching. Make sure to purchase Cache Reserve. The folder cloudflare-cache-rules/ contains screenshots of the rules you need to create.