How to 'scale' Infinite Scale? How to run two or more instances in a cluster WITHOUT kubernetes

I’m on a quest to run OCIS on my homelab which consists a couple Intel NUCs running Proxmox, having a common CEPH with RBD and CephFS and a standalone NAS runnig OpenMediaVault.

I specifically do not want to run kubernetes. NUCs are just not enough powerful for that. Also I use it at work heavily so I know well how hard it is to maintain.

So far I managed to run every service in Docker Standalone or Swarm. Where it is possible/makes sense to run more then one replica of a service I do so easily.

But I hit a wall with OCIS.

I tried to decypher the helm-chart provided https://github.com/owncloud/ocis-charts/tree/main/charts/ocis but its just too complicated.

So far I figured that in order to run an OCIS cluster I would need a couple ‘common’ service to serve all the instances at once. Somehow I have to keep certain things in sync between nodes.

I already configured Nats as a cache store for couple things(I had earlier attemps with Redis aswell), and outsourced IDP to Authelia. Quite a journey on its own.

But still, my two instance of OCIS acts as separate things:
I’m authenticated in one, but after a browser refresh suddenly I’m not.
I upload a file that after a couple clicks in the GUI I cannot access anymore even tho the file is there… and all sorts of weird things.

I realize I’m still missing something, probably a lot of things.

My setup so far:

version: "3.9"

services:
  nodes:
    image: owncloud/ocis:7.0
    hostname: ocis-{{.Task.Slot}}
    ports:
      - 9200:9200
    networks:
      - bana_global
    entrypoint:
      - /bin/sh
    command: ["-c", "if [ \"$OCIS_REPLICA\" = \"1\" ]; then ocis init || true; fi; ocis server"]
    environment:
      TZ: "Europe/Budapest"
      PUID: "1000"
      PGID: "1000"
      DEMO_USERS: "false"
      OCIS_INSECURE: "true"
      PROXY_TLS: "false"
      PROXY_HTTP_ADDR: 0.0.0.0:9200
      OCIS_URL: "https://ocis.my.domain/"
      OCIS_LOG_LEVEL: debug
      OCIS_LOG_COLOR: "true"
      OCIS_LOG_PRETTY: "true"
      STORAGE_USERS_OCIS_ROOT: /ocisdata
      STORAGE_USERS_ID_CACHE_STORE: "nats-js-kv"
      STORAGE_USERS_ID_CACHE_STORE_NODES: "nats:9233"
      OCIS_REPLICA: "{{.Task.Slot}}"
      OCIS_OIDC_ISSUER: "https://authelia.my.domain"
      WEB_OIDC_CLIENT_ID: ownCloud-web
      PROXY_OIDC_REWRITE_WELLKNOWN: "true"
      PROXY_OIDC_ACCESS_TOKEN_VERIFY_METHOD: none

    volumes:
      - user-data:/ocisdata
      - ocis-data:/var/lib/ocis
    configs:
      - source: ocis-config
        target: /etc/ocis/ocis.yaml
    deploy:
      mode: replicated
      replicas: 1
      placement:
        max_replicas_per_node: 1

  nats:
    image: nats
    hostname: nats-1
    command: [
      "--jetstream",
      "--store_dir", "/data",
      "--port", "9233",
      "--cluster_name", "NATS",
      "--http_port", "8222",
      "--server_name", "nats-1"
    ]
    volumes:
      - nats_data-1:/data
    networks:
      - bana_global
    deploy:
      mode: replicated
      replicas: 1
      placement:
        max_replicas_per_node: 1

configs:
  ocis-config:
    file: ./ocis.yaml

networks:
  bana_global:
    external: true

volumes:
  user-data:
    driver_opts:
      type: "nfs"
      o: "addr=LOCALIP,rw,hard,nfsvers=4.2"
      device: ":/share/ocisdata"
  ocis-data:
    driver: local
    driver_opts:
      type: ''
      o: bind
      device: /data3/ocis
  nats_data-1:
    driver: local
    driver_opts:
      type: ''
      o: bind
      device: /data3/nats_data-1

Storage backend is twofold. User data is on my NAS(OpenMediaVault) via NFS 4.2

I tried really hard to make PosixFS work, but failed. That worth a topic on its own later :slight_smile:
So currently ‘ocis’ storage driver is in use.

System data and Nats is on CephFS mounted (/data3) to my VMs.

Its published via Nginx on my own domain, secured with Certbot

server {
    listen 80;
    listen [::]:80;

    server_name ocis.my.domain;
    resolver 127.0.0.11 valid=10s;
    resolver_timeout 5s;

    location / {
        return 301 http://$server_name$request_uri;
    }
}

server {
    listen 443 ssl proxy_protocol;
    listen [::]:443 ssl proxy_protocol;
    http2 on;
    server_name ocis.my.domain;
    resolver 127.0.0.11 valid=30s;
    resolver_timeout 30s;

    ssl_certificate /etc/letsencrypt/live/my.domain/fullchain.pem; # managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/my.domain/privkey.pem; # managed by Certbot
    include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
    ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot

    location / {
        resolver 127.0.0.11 valid=30s;
        resolver_timeout 30s;
        set $upstream_ocis http://ocis_nodes:9200;
        proxy_pass $upstream_ocis;
        proxy_set_header Host $http_host;
        proxy_hide_header Content-Security-Policy;
    }
}

Loadbalancing is done by the Docker’s network. It works for any other service.

What do I miss? How do I make 2 or more OCIS instances to work together?

There are all sort of ‘CACHE’ enviroment variables. Do I need to set all to Nats? Or just some?
In the documentation, a lot of variables are marked ‘pre 5.0’ so its hard to figure what do I need and what I dont.