Syncing is out of sync, and admin account has an odd problem


#1

Apologies for a long-winded explanation. I am facing an OC server that is degrading/dying slowly over time. We don't know what caused it, and we're not even sure of what is happening.

I've checked previous threads, but the closest thing I can find dates back to 2014 with a "file blacklist feature", which prevented certain files from syncing when there was poor connectivity. This was for an old OC version, hopefully superseded now. We have very good connectivity and any large file network issue should be resolved within minutes, if any.

Full story below. Thanks for your patience:

Steps to reproduce
1. We have not done anything specific that we can pinpoint to. Several changes have occurred in the last few weeks, all of which could be a cause of the problems we are seeing
2. We setup a basic Cloudflare account recently to use with WordPress website (OC was setup and running smooth much earlier). Because OC URL shares the same domain root, OwnCloud web interface is also affected by CF. It didn't seem to break anything at the time. I cannot tell if CF affects desktop clients, but I am assuming it does since URLs are similar.
3. The hosting admin told us to set PHP at 5.6 instead of 5.4, because it was creating issues with WordPress plugins. He did that for us and we have not noticed any clear differences.
4. While trying to resolve the "security setup & warnings" that appear on the Admin webpage, I added the following lines to config.php, below the "end of config" line:

'trashbin_retention_obligation' => 'auto, 20',
'versions_retention_obligation' => 'auto, 10',
'check_for_working_htaccess' => true,
'log_rotate_size' => 10000000,
'enable_previews' => false,
'filelocking.enabled' => true,
'filelocking.ttl' => 600,
);

OC seemed to continue working normally, and the immediate effect was to create a new logfile as it was over 100MB
The warnings on admin page were still present after making these entries. I can't show you what the errors are because the script is now broken (see below). They were all related to https security and absence of mem cache, and have been present since day one (warnings, not errors), so I am assuming these warnings are not related to the issue described here.
4. Admin web page has a notice that I should upgrade to 9.1.5, which I have NOT done yet. The button to fetch the upgrade is not functioning (un-responsive). I am happy to try & force a manual upgrade if someone can confirm it is a good idea, despite the problems described below...

Expected behaviour
Tell us what should happen:
1) Files sync between clients and server, at a pace more or less linearly related to file size and internet line speed, in the chrono sequence in which they are created, modified or deleted
2) Admin webpage should display a dynamic report at login, under "Security & setup warnings"
3) Admin file pages should show all the directories and files which are shared with admin user account

Actual behaviour
Tell us what happens instead
1) OC was working fine for roughly 10 weeks. I have learned to monitor and clear the common "file lock" issue, and the DB was stable and smooth.
Recently, we started collaborating on a 12MB powerpoint file. Not huge, but not tiny either. As there were lots of modifications (on the same PC), the keyboard operator had to press "save" quite frequently to safeguard our efforts.
After a few hours of working, users went back to their own machines, and saw the same file synced on each PC, and assumed everything was fine. In fact, none of the synced copies were current, they were many "save" behind, but with no simple means to detect this. Upon opening, it was obvious the contents were wrong. After a quick discussion, it became clear OC network had not finished flushing the cache from source PC to server, and server to destination PCs. So we waited an hour. OC stopped syncing after a few minutes only (green indicator on desktop client). But we realized the last file synced was not the last version.
We went on each machine to check. The origin file on origin PC was correct, the last server version was up to date, but the destination PCs were not yet current. So we decide to make a copy of the original source on the source PC, to see if that would sync in a more clear-cut way. Copy/paste, and within 5 minutes everyone had the latest file on their machines - but the old copy was still not synced. We decided to sleep over it.
This situation has been going from bad to worse day by day. We now have about 10 copies of the file, all are out of sync between server and destination client - except the very last copy/paste which is good. Surprisingly, the desktop client now flashes the sync popup at random, to inform that it has synced the Nth earliest copy of the problem file...several days later after creation, and it is still NOT the last version visible on web client. So we have out of sync copies that still sync, much slower than more recent copies.

This is only for the files that we know about. We don't know when this started, and we don't know if other files are suffering from same problem.

Sometimes the desktop client will show errors such as

[...the item is not synced because of previous error.....different e-tag for resuming...]

I can't copy the exact messages as they disappear too fast, at random.

Pressing F12 on win desktop client currently produces a worrying log, with recurring entries such as

04-21 00:49:12:259 4052 OCC::Folder::slotRunEtagJob: * Trying to check "https://cerulean.asia/owncloud/remote.php/webdav/CC Business" for changes via ETag check. (time since last sync: 2877 s)
04-21 00:49:12:260 4052 OCC::FolderMan::slotRunOneEtagJob: Scheduling "https://cerulean.asia/owncloud/remote.php/webdav/CCCloud" to check remote ETag

with a long list of OC root directories below, with all the same error message. The time counter keeps increasing by 6 seconds every 6 seconds...A lot of lines...!

2) I tried to diagnose what was going on, and added a few lines in config.php, as indicated above.
This has not done any visible change to the system. The DB had only 3 lines that were "file-locked", which I cleared.
Today, as I compare again the files on web client / admin account, I see that the Admin user CANNOT see any file or directory (normally Admin has access to and owns everything, shared with regular users).
The web client works normally, except that there are no files visible! If I login as a regular user, I can see files and directories, normally.
Something must be wrong with the Admin user account, but we have not touched this at all and no one plays with the DB, except me to delete file locks once a week.

3) Digging further, I went onto the web admin page (server status), and the "security & setup" section shows a never-ending "wait" icon, instead of churning fro a few seconds and producing a report of the usual warnings. Even after waiting an hour, there is no warning report produced.
As a minor point, the button to upgrade to 9.1.5 is not functional. I am putting this on the account of the previous script which must still be running, and somehow preventing the button code form executing properly.

Server configuration
Operating system: Linux
Web server: Apache
Database: MySQL
PHP version: 5.6
ownCloud version (see ownCloud admin page): 9.1.4
Updated from an older ownCloud or fresh install: not sure what was the first install, but definitely 9.1.X
Special configuration (external storage, external authentication, reverse proxy, server-side-encryption): n/a/

ownCloud log (data/owncloud.log)

Please paste possible errors in the following code block, see https://central.owncloud.org/t/how-to-find-webserver-or-oc-logfile-enable-php-logfile/808 for more info

The first 100MB log file contains a million entries identical to this:

{"reqId":"WLJLnufwZNZ5Ddq0gXaejQAAAUw","remoteAddr":"119.76.173.83","app":"PHP","message":"PHP Startup: Unable to load dynamic library '\/opt\/alt\/php56\/usr\/lib64\/php\/modules\/snmp.so' - \/opt\/alt\/php56\/usr\/lib64\/php\/modules\/snmp.so: cannot open shared object file: No such file or directory at Unknown#0","level":3,"time":"2017-02-26T03:29:34+00:00","method":"PROPFIND","url":"\/owncloud\/remote.php\/webdav\/Pateo%20Archives","user":"God"}

The new 17kb server logfile (after I modified config.php) is normal, with just a few "normal" entries, except one line:

{"reqId":"WPdbGUvdQGhNk1DzVvGHwgAAAEI","remoteAddr":"2400:8901::f03c:91ff:fee2:109a","app":"cron","message":"An exception occurred while executing 'UPDATE `oc_appconfig` SET `configvalue` = ? WHERE (`appid` = ?) AND (`configkey` = ?) AND (`configvalue` <> ?)' with params [\"2\", \"backgroundjob\", \"lastjob\", \"2\"]:\n\nSQLSTATE[HY000]: General error: 2013 Lost connection to MySQL server during query","level":4,"time":"2017-04-19T12:42:05+00:00","method":"GET","url":"\/owncloud\/cron.php","user":"--"}

It doesn't look good, but I have no idea what it means or what to do about it :frowning:
MySQL is running fine and I can run SQL scripts.

Integrity status for oC9+

Login as admin user into your ownCloud and access
https://cerulean.asia/owncloud/index.php/settings/integrity/failed
No errors have been found.

Conclusion: I don't even know where to start.
I can't describe the problem better than "syncing old files out of sync but new files in sync" plus "the admin user account has some strange problems with web admin and web file list.
My users are loosing confidence in OC at frightening speed.
The desktop clients are all showing green stale status, which is highly irritating to end-users who can see that OC has no idea that the config is very ill.

I feel very bad about all this. I would expect OC to at least flag that there is an unknown condition that requires immediate admin attention, and I would expect that OC desktop clients should refuse to continue to sync any new files until that condition is removed, to avoid propagating bad data.

I'm happy to test any idea and very grateful to anyone who can help.
Gus


#2

Forgot to add, the admin page states that "Last cron job execution: seconds ago" so that portion is fine


#3

The first step is to get rid of Cloudflare.

CF can be used in a standard web application like Wordpress but not in such a high advances web application like ownCloud using various things like WebDAV and advanced headers and all the other techniques like chunking where CF is messing around with the responses.

https://doc.owncloud.org/server/latest/admin_manual/search.html?q=cloudflare

Most issues are probably gone if you don't use CF.

Additionaly issues might be caused by some strange / not default PHP configuration. This can be seen in the logfile like:

PHP Startup: Unable to load dynamic library

which shows that there are issues within your PHP environment itself.


#4

Thank you for the good advice.
Stopping CF and clearing all the cache has resolved the web client /admin account issues. I am back to where I was before, for that part.

1) I am not 100% clear on the OC sync status:
Can I safely assume that the file replication errors will fix themselves for all affected files and clients, over the next few hours?
If not, how can I force a sync on all clients? (the server copies are good)

Regarding the other problems listed above and your replies, if you feel the discussion is off-topic then I will create new posts:
2) The Win desktop client log (F12) is still showing the same errors as yesterday. Here is a sample of what loops every 30 seconds, even 10 mins after stopping CF. I have no idea what it means or what to do about it:

04-21 15:22:07:753 4052 OCC::Folder::slotRunEtagJob: * Trying to check "https://cerulean.asia/owncloud/remote.php/webdav/CC Business" for changes via ETag check. (time since last sync: 4745 s)  <<== MY EDIT: NOTE THE TIME!!!
04-21 15:22:07:754 4052 OCC::FolderMan::slotRunOneEtagJob: Scheduling "https://cerulean.asia/owncloud/remote.php/webdav/CCCloud" to check remote ETag
04-21 15:22:07:755 4052 OCC::AbstractNetworkJob::start: !!! OCC::RequestEtagJob created for "https://cerulean.asia/owncloud" + "/CCCloud" "OCC::Folder"
04-21 15:22:08:109 4052 OCC::FolderMan::slotRunOneEtagJob: Scheduling "https://cerulean.asia/owncloud/remote.php/webdav/Pateo Project Archives" to check remote ETag
04-21 15:22:08:111 4052 OCC::AbstractNetworkJob::start: !!! OCC::RequestEtagJob created for "https://cerulean.asia/owncloud" + "/Pateo Project Archives" "OCC::Folder"
04-21 15:22:08:375 4052 OCC::FolderMan::slotRunOneEtagJob: Scheduling "https://cerulean.asia/owncloud/remote.php/webdav/CC Business" to check remote ETag
04-21 15:22:08:376 4052 OCC::AbstractNetworkJob::start: !!! OCC::RequestEtagJob created for "https://cerulean.asia/owncloud" + "/CC Business" "OCC::Folder"

Any ideas?

3) The new, 2nd OC log file does not show any PHP errors like before (maybe this is why the hosting admin asked me to switch to PHP 5.6?), but the new log has one new and bad entry, only:

{"reqId":"WPm3x9-2W5JGEgsbDaeSVwAAAMc","remoteAddr":"2400:8901::f03c:91ff:fee2:109a","app":"core","message":"Invalid request to occ controller. Details: \"Web executor is not allowed to run from a host 2400:8901::f03c:91ff:fee2:109a\"","level":2,"time":"2017-04-21T07:42:00+00:00","method":"POST","url":"\/owncloud\/index.php\/occ\/config:list","user":"--"}

Is this someone trying to hack into my OC server, without success? if yes, security is working so we can ignore.

4) I am a bit puzzled by the OC server warnings that have re-appeared after stopping CF:

The "X-XSS-Protection" HTTP header is not configured to equal to "1; mode=block". This is a potential security or privacy risk and we recommend adjusting this setting.
The "X-Content-Type-Options" HTTP header is not configured to equal to "nosniff". This is a potential security or privacy risk and we recommend adjusting this setting.
No memory cache has been configured. To enhance your performance please configure a memcache if available.

The .htaccess file under OC has those values set correctly (below / default settings). I added the line at the bottom, which successfully removed the corresponding warning. And I've cleared all the cache that I now about.
Is this an OC or an Apache issue?

<IfModule mod_headers.c>
  <IfModule mod_setenvif.c>
    <IfModule mod_fcgid.c>
       SetEnvIfNoCase ^Authorization$ "(.+)" XAUTHORIZATION=$1
       RequestHeader set XAuthorization %{XAUTHORIZATION}e env=XAUTHORIZATION
    </IfModule>
    <IfModule mod_proxy_fcgi.c>
       SetEnvIfNoCase Authorization "(.+)" HTTP_AUTHORIZATION=$1
    </IfModule>
  </IfModule>
  <IfModule mod_env.c>
    # Add security and privacy related headers
    Header set X-Content-Type-Options "nosniff"
    Header set X-XSS-Protection "1; mode=block"
    Header set X-Robots-Tag "none"
    Header set X-Frame-Options "SAMEORIGIN"
    Header set X-Download-Options "noopen"
    Header set X-Permitted-Cross-Domain-Policies "none"
    SetEnv modHeadersAvailable true
  </IfModule>
  # Add cache control for CSS and JS files
  <FilesMatch "\.(css|js)$">
    Header set Cache-Control "max-age=7200, public"
  </FilesMatch>
</IfModule>
<IfModule mod_php5.c>
  php_value upload_max_filesize 513M
  php_value post_max_size 513M
  php_value memory_limit 512M
  php_value mbstring.func_overload 0
  php_value always_populate_raw_post_data -1
  php_value default_charset 'UTF-8'
  php_value output_buffering 0
  <IfModule mod_env.c>
    SetEnv htaccessWorking true
  </IfModule>
</IfModule>
<IfModule mod_php7.c>
  php_value upload_max_filesize 513M
  php_value post_max_size 513M
  php_value memory_limit 512M
  php_value mbstring.func_overload 0
  php_value default_charset 'UTF-8'
  php_value output_buffering 0
  <IfModule mod_env.c>
    SetEnv htaccessWorking true
  </IfModule>
</IfModule>
<IfModule mod_rewrite.c>
  RewriteEngine on
  RewriteRule .* - [env=HTTP_AUTHORIZATION:%{HTTP:Authorization}]
  RewriteRule ^\.well-known/host-meta /public.php?service=host-meta [QSA,L]
  RewriteRule ^\.well-known/host-meta\.json /public.php?service=host-meta-json [QSA,L]
  RewriteRule ^\.well-known/carddav /remote.php/dav/ [R=301,L]
  RewriteRule ^\.well-known/caldav /remote.php/dav/ [R=301,L]
  RewriteRule ^remote/(.*) remote.php [QSA,L]
  RewriteRule ^(?:build|tests|config|lib|3rdparty|templates)/.* - [R=404,L]
  RewriteCond %{REQUEST_URI} !^/.well-known/acme-challenge/.*
  RewriteRule ^(?:\.|autotest|occ|issue|indie|db_|console).* - [R=404,L]
</IfModule>
<IfModule mod_mime.c>
  AddType image/svg+xml svg svgz
  AddEncoding gzip svgz
</IfModule>
<IfModule mod_dir.c>
  DirectoryIndex index.php index.html
</IfModule>
AddDefaultCharset utf-8
Options -Indexes
<IfModule pagespeed_module>
  ModPagespeed Off
</IfModule>
#### DO NOT CHANGE ANYTHING ABOVE THIS LINE ####
ErrorDocument 403 /owncloud/core/templates/403.php
ErrorDocument 404 /owncloud/core/templates/404.php
## Added by Gus on 17 April 2017
<IfModule mod_headers.c>
  Header always set Strict-Transport-Security "max-age=15552000; includeSubDomains"
</IfModule>

5) Is it safe to upgrade to 9.1.5 now, or should I wait until all the above is fixed?

6) My PHP configuration is out of the box, I don't know how to change anything about PHP. Where should I look? Ask the hosting admins? Force another PHP version (5.4) and back to 5.6 to overwrite any settings? Ask another support group (Apache?)? Other?

One suggestion: since the web admin page is able to query the web server and figure out that some html settings are not ideal, the script could also search for the string "cloudflare" present anywhere in the code returned by the server, and add a warning in the status list that CF is injecting code?

Thank you very much for the help and patience.