- Python 48%
- Vue 36%
- TypeScript 6.8%
- CSS 4.1%
- JavaScript 2.9%
- Other 2.2%
| backend | ||
| cloudflare-cache-rules | ||
| frontend | ||
| matomo stats | ||
| other | ||
| .gitignore | ||
| README.md | ||
chub-archive-server
Website and backend server for the Character Archive.
This is the source code for the char-archive website. The software stack consists of a Python backend, Postgres database, Vue.js frontend, and Meilisearch index.
Be aware of the fact that the scrapers are pulling extremely questionable data from across the internet. Any shitty content is hosted under your name.
No commit history is provided because this project wasn't developed for public consumption of the source.
Remember to have fun.
About the Project
Chatbots powered by artificial intelligence have been around for decades, but only recently have they become capable of engaging in human-like interactivity. Following the release of OpenAI's GPT-3.5 in March of 2022, creative individuals discovered that the AI could take on "personalities" and role-play as a character. A community formed around chatting with these "bots" and sharing the "character cards" that defined a personality. Concerned about the capabilities of the AI and the creativity of the users, the corporations that owned the AI models took steps to restrict this activity, claiming it was "out of scope" and "unsafe". The Character Archive was created to protect this creativity.
Like the website says, this project was created to archive a unique moment in AI history. It was pretty successful as well:
The site was showing very strong growth and visitors across the world were downloading all sorts of cards. Unfortunately, I was not able to devote any more time to running a complex website with many moving parts.
System Requirements
This server ran on this machine:
- AMD Ryzen 7 5700X 8-Core Processor
- 128GB RAM (probably only need 64GB, this was originally for Elasticsearch)
- 2x 1TB SATA SSD
- 2x 1TB Samsung 990 Pro
The database and files are stored on the 990s, the VM/CT OSes are on the SATA SSDs.
Host Setup
This is a very very rough guide on how to get things up and running. I ran everything on Proxmox. These are the hosts:
Website
char-archive
CT. 12 cores, 13GB RAM. Runs the website and database.
- Install Postgres and Python 3.12
- Create your venv and install the requirements
- Import the database and put the files somewhere
- Figure out where the database connection config strings are (there are a few) and enter your details
- Enable and start the Systemd timers
- Enable and start the
archive-serverservice - Enable and start the
frontend-msgservice - Deploy the image proxy Cloudflare Worker in
Workers/image-proxy - Install the GeoIP database (see
GeoIP.md) - Install node.js version
22.15.0 - Go to
search-parser/and donpm install - Enable and start the
search-parser.service - Download the latest release of cyberes/crazy-file-server. This serves the file browser and was originally built for serving the entire archive when it was just a collection of raw files. An example config file is located at
backend/crazyfs.yaml. Don't worry about setting up Elasticsearch it isn't needed anymore. - Install and enable
crazyfs.service - Set up the nginx website. An example config is provided.
You will probably have to dive into the code to figure out how the various services work. It isn't very complicated (the most complicated part is how the individual sites are abstracted and handled) but it wasn't designed to be distributed so there is no unified config.
I know for a fact that you will have to update the frontend code to account for your new website domain. The current code is set up with the assumption that it's running on char-archive.evulid.cc.
Meilisearch
char-meili
CT. 10 cores, 34GB RAM (could shrink these down probably 50%). Runs exclusively Meilisearch.
- Install Meilisearch
python3 create-meilisearch.py
Proxy Router
proxy
CT. 4 cores, 4GB RAM. Runs a proxy router/load balancer. I don't recommend running this on a cloud VM as you will be moving multiple terabytes of data per month.
- Set up or gain access to at least 1 proxy server. Squid works fine.
- Download the latest release from cyberes/proxy-loadbalancer/releases. I put it in
/srv/loadbalancer. - Follow the
README.mdfile inproxy-loadbalancerto install it
I recommend setting this value in the config.
thirdparty_test_urls:
- https://rentry.org/8ygmz29h
- https://files.catbox.moe/1hvrlj.png
- https://gateway.chub.ai/search?excludetopics=&first=20&page=1&namespace=*&search=sex&include_forks=true&nsfw=false&nsfw_only=false&require_custom_prompt=false&require_example_dialogues=false&require_images=false&require_expressions=false&nsfl=false&asc=false&min_ai_rating=0&min_tokens=50&max_tokens=100000&chub=true&require_lore=false&exclude_mine=true&require_lore_embedded=false&require_lore_linked=false&sort=default&min_tags=2&topics=&inclusive_or=false&recommended_verified=false&require_alternate_greetings=false&count=false
NFS Storage Server
char-datastore
VM. 4 cores, 5GB RAM (could increase these by 50%). Stores the data and ran an NFS server.
My /etc/exports file contains this:
/mnt/share 10.1.0.8(rw,sync,no_subtree_check) 10.1.0.11(rw,sync,no_subtree_check)
The two IPs are the char-archive and the char-scraper hosts.
In your Proxmox host, you will have to mount the NFS share since you can't easily mount it in the CTs. I put this in its /etc/fstab:
10.1.0.9:/mnt/share /mnt/lxc/nfs/share nfs auto,nofail,noatime,nolock,intr,tcp,actimeo=1800 0 0
Where 10.1.0.9 is the char-datastore IP.
Then give the CT access to it. For example, this is /etc/pve/lxc/xxx.conf:
mp0: /mnt/lxc/nfs/share,mp=/mnt/share
You may encounter issues regarding UIDs/GIDs. This can be a nightmare. The file other/mount-nfs-shares-in-lxc.md tries to help you but you're on your own.
Scraper
char-scraper
VM. 8 cores, 45GB RAM. Ran the web scrapers.
- Create a new non-root user
- Change to that user
git clone https://git.evulid.cc/cyberes/chub-archive-scraper- Create a venv and install the requirements
- Install the services in
systemd/and enable the timers
The scraper is a spaghetti code disaster. It's gone through like 3 separate rewrites but has not gotten any less complicated. That said, a lot of work has been put into making sure the scrapers are reliable.
Data Setup
Download the last torrent and place the files on char-datastore. Then import the database. It's pretty simple.
Cloudflare
The website uses pretty heavy Cloudflare caching. Make sure to purchase Cache Reserve. The folder cloudflare-cache-rules/ contains screenshots of the rules you need to create.
