Moving away from the swarm

Apr 18, 2026

So I think it's time to move away from Swarm and Ceph.

The swarm (currently) relies on Ceph, and Ceph amplifies network writes like crazy and toasts the whole dang Proxmox cluster when it does. Not optimal. I like the HA aspect but, realistically, I don't know how necessary it is for my circumstances and, pragmatically speaking, I think it's costing me more availability than it's providing.

Moreover I'm getting tired of how Swarm mode doesn't support the modern Compose syntax. That doesn't seem like it's ever going to get better, so I think it's time to jump ship. (I know, I know, I probably should have known all this. Alas, I insist on learning most things the hard way.)

For the Proxmox cluster, my previous strategy of using node-local ZFS storage on the SSDs seemed to work fine. Trick is now there's things that depend on Ceph, so I have to migrate them off Ceph first. Most obviously we have everything running in the Swarm. Technically there's four active stacks, but only two that matter: GoToSocial and FreshRSS.

Moving the GoToSocial DB will definitely be the riskiest component, but I think it's workable: just pgdump everything I need and then load it on the new instance. I have a reverse proxy handling GoToSocial ingress at the cloud, which makes cutover a cinch: shut down the existing GTS instance, get it running on the NAS, cut the cloud ingress over to the NAS container, voila. There will be downtime, but such is the nature of selfhosted stuff. (And it'll still probably be better than Github as of late, so there.)

Once the swarm is shut down then Ceph is defunct and I can free up those disks, and possibly repave the nodes to PVE 8 and join them into a cluster with the big new server.

Longer term I think what I'll do is follow the selfhosted k3s setup guide in the self-deployment book I'm reading. I think I'm finally ready to just use k8s for what it's designed for rather than trying to reinvent it from scratch. I'll still have to solve the storage problem, which I'm not thrilled about, but that's doable.

What I'm not certain how to handle is databases: I think centralizing all my postgreses (postgresi?) onto a single host is overall a good idea. Then instead of having to monitor and back up like a dozen individual postgres instances I can focus on one. Migrating everything to that one, well, that'll be tricky, but I can get there.

What I don't know is where it should live. Naively having it as a member of the cluster (whatever cluster I build) feels intuitive, but my worry is that will introduce hidden complications: the shiny new server mostly has spinning disks for storage. I'm planning to RaidZ a few of them, so their read/write should be better than 'normal'. I'm also planning to slap an SSD ARC on them as well, which should help significantly with their speed.

I guess the DB for GoToSocial is currently running on a jank Ceph setup that seems unable to handle more than 200MB/s of IO, and it's doing fine. It's also the database that sees the most read/write in my whole stack, so maybe it's a moot point insofar as HDD vs SSD.

Which then just leaves the question of where to home this single instance. Given the storage complications of k3s I'm tempted to just run Postgres on an LXC or a full up VM on Proxmox. Then applications can (theoretically) just run statelessly in k3s and connect to the (single) Postgres node that's just independently backed up (or even replica'd, if I feel like going hard).

That said, I already have to solve storage in some way for a bunch of the stuff I want to move to k3s anyway. Several applications expect some amount of local static storage that behaves like a local disk. I'm eyeing SeaweedFS and SeaweedFSFiler to fill that in currently, matching the Gluster setup I used to have a while back (and that I experimented with before moving the swarm to Ceph): make a few VMs with some disks, run SeaweedFS on 'em, expose endpoints to k3s, bam, clustered shared storage. It'll still be slow as hell over the network, but the containers shouldn't be doing anything crazy on that storage anyway.

So that's the plan:

  1. Move FreshRSS off the Swarm (and onto my NAS)
  2. Move GoToSocial off the Swarm (and onto my NAS)
  3. Shut down the swarm
  4. Migrate a bunch of containers off the current Proxmox cluster and onto the new server
  5. Shut down Ceph
  6. Repave each of the current Proxmox cluster to Proxmox 8
    1. though I hear 9 is close to release; maybe I install 9 on the server (since it's not hosting anything important right now), then bump each mini node to 9. Ideas, ideas....
  7. Spool up a Postgres server
    1. with backups
    2. and monitoring
  8. Migrate a bunch of my existing Postgres DBs that exist as companion containers in Compose stacks to the new shared instance
  9. Spool up k3s
  10. Begin migrating Compose stacks off the NAS and onto k3s
  11. Profit
https://hnr.spacefish.net/posts/feed.xml