I’m not sure if this belongs here as it’s configuration error on my part or in the Github issues as it is a bug, but I figured I’ll try here first…
Steps to reproduce
These are the steps to reproduce from a fully clean, publicly accessible system, with docker and a few other common dependencies (like git
) installed, which has access to the internet and ports 80 and 443 exposed. I’m not sure how much of it is relevant, but here you go…
- Set up a standard Traefik configuration along the lines of this config1. I use DigitalOcean for my DNS but one could also use standalone/HTTP verification for ACME and set the DNS manually.
- Clone my configuration and checkout the relevant branch
$ git clone https://git.tams.tech/TWS/ocis-deployment.git $ cd ocis-deployment $ git checkout feature/office-suite
- Run the initialization commands
$ mkdir -p mounts/{config,data} $ docker compose run init $ sh gen-secrets.sh $ sh dns.sh # if using digitalocean DNS, requires `doctl auth init` be run on the machine once before
- Start the service
$ docker compose up
To put it another way…
Since this basically amounts to “go out to some other web site and deploy the configuration”, I’ll summarize here the relevant aspect of the configuration which is a problem as more generalized “steps”
- Get a working OCIS deployment, behind a Traefik (or other) reverse proxy
- Set the
$GATEWAY_GRPC_ADDR
environment variable to0.0.0.0:9142
on theocis
service - Add a private network to connect the app provider with OCIS
- Add an app-provider configuration, as laid out in this example2 in the OCIS repo, pointing it at your already existing
ocis
container via DNS. Connect the private app-provider service and the OCIS service to the new app-provider network, not the network Traefik is connected to.
Expected behaviour
All services start and eventually stabilize to a state where they are running without error
Actual behaviour
Error message produced regularly (approximately every 20 seconds) for as long as the service remains up, even after all other services have rebooted enough time to resolve their dependency errors.
hard to read raw output, or...
ocis-app-provider-1 | {"level":"error","pid":1,"error":"rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 172.18.0.12:9142: i/o timeout\"","time":"2023-09-19T16:25:18.957971967Z","caller":"github.com/cs3org/reva/v2@v2.16.1-0.20230911153145-a2e2320f3448/internal/grpc/services/appprovider/appprovider.go:164","message":"error registering app provider: error calling add app provider"}
{
"level": "error",
"pid": 1,
"error": "rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 172.18.0.12:9142: i/o timeout\"",
"time": "2023-09-19T16:25:18.957971967Z",
"caller": "github.com/cs3org/reva/v2@v2.16.1-0.20230911153145-a2e2320f3448/internal/grpc/services/appprovider/appprovider.go:164",
"message": "error registering app provider: error calling add app provider"
}
Server configuration
Operating system: NixOS+Docker
Web server: N/A (but using Traefik reverse proxy)
Database: postgres
PHP version: N/A
ownCloud version: OCIS latest (image sha256: 8048918ad590a5d02218527abee0570e04e9172776bb90db2c6b83334565d106)
Updated from an older ownCloud or fresh install: Fresh install
Where did you install ownCloud from: docker hub
The content of config/config.php:
N/A (OCIS doesn’t have anything.php)
List of activated apps:
N/A
Are you using external storage, if yes which one: local
Are you using encryption: no
Are you using an external user-backend, if yes which one: no
Client configuration
Browser:
Operating system:
Logs
Web server error log
See above for relevant section
ownCloud log (data/owncloud.log)
N/A
Browser log
N/A
More information
If we inspect the app-provider
network configuration…
$ docker inspect ocis-app-provider-1 | jq -r '.[] | .NetworkSettings.Networks | .[] | .IPAddress'
172.26.0.5
We can see the problem: the subnet that app-provider
is trying to reach the ocis
container on is not the network that the app-provider
container is on. Sure enough, if we inspect the ocis
container:
$ docker inspect ocis-ocis-1 | jq -r '.[] | .NetworkSettings.Networks | .[] | .IPAddress'
172.26.0.4
172.27.0.3
172.18.0.12
We can see that, sure, the ocis
container is on the app-provider-net
network, but it’s also on the web
network, which is the subnet the app-provider
container is trying to reach it on. This suggests that either the mDNS/service registry system3 is only reporting the IP address of the web
network, or the client is only trying the first IP that it gets in response to the mDNS query and discarding any other networks. I don’t really know that much about how mDNS works. I did try to do a bit of spelunking in the code…didn’t find anything I understood though.
Even more frustrating, within the relevant containers, dig
returns the IP addresses of the common subnet for both containers
$ docker compose run -u 0 --entrypoint sh ocis
[+] Building 0.0s (0/0)
[+] Creating 1/0
✔ Container ocis-search-engine-1 Running 0.0s
[+] Building 0.0s (0/0)
# apk add --quiet bind-tools
# dig +short ocis
172.26.0.4
# dig +short app-provider
172.26.0.5
$docker compose run -u 0 --entrypoint sh app-provider
[+] Building 0.0s (0/0)
[+] Creating 2/0
✔ Container ocis-search-engine-1 Running 0.0s
✔ Container ocis-ocis-1 Running 0.0s
[+] Building 0.0s (0/0)
# apk add --quiet bind-tools
# dig +short ocis
172.26.0.4
# dig +short app-provider
172.26.0.5
See also
The pull request in the config repo: https://git.tams.tech/TWS/ocis-deployment/pulls/1
Footnotes/relevant links
since discourse didn’t let me post links I have to put them as a footnote and obfuscate them
1: https://git.tams.tech/TWS/traefik-config
2: https://github.com/owncloud/ocis/blob/3ba6229add9edb6dc99e8733272f15accdcdbbb3/deployments/examples/ocis_wopi/docker-compose.yml#L103-L129
3: https://github.com/owncloud/ocis/blob/b0ac9840dff00a2527b2e8df86bebcd12632104c/ocis/README.md