Linux 2.2.4 client consistently fails on large (1m+) file trees

help

#1

I'm running FC25 with the linux sync client 2.2.4. I have around 360GB, 1m+ files to sync. I think everything has sync'd over to owncloud over time, but as I had added another 30-40GB to sync this past year, I've noticed that the sync client never completes anymore (and I am concerned that it never did now!).

I always receive an error during its tree scan phase. Turning on debugging yields multiple log files each ending with an errno 5 (then it closes the sqlite DB). For 3-4 tries, I get the errno 5 error but the sync client continues (with another log file) and after 3-4 of these errors (and hence 3-4 log files each 500-1GB in size), the sync client errors out and does not continue to sync.

I have this issue on two linux clients (my 2 laptops). The server logs for owncloud and php do not show any errors that I could find although I am upping the logging level more. I had recently increased php memory size and some other php parameters but I still get the same behavior. I have also blown away the sqlite local cache file before in hopes of fixing this issue.

At this point, it appears that I can no longer trust the sync and I will need to find another owncloud-like solution but I was wondering if this is a common problem for the linux client? I have searched for similar situations with owncloud, linux and large trees but I have not found much on this topic.

The owncloud log files are quite verbose. The errno 5 "interruption" appears on random files that the client is scanning and there appears to be no pattern file wise. I have run the sync client manually many times to generate logfiles to check this.

Thoughts?


#2

How much RAM do you have on the Linux desktop? You are using the 64bit client, right? Please monitor the RAM usage during the sync run ... ideally on both sides but the server side should be fine.


#3

It's a 16GB i7 laptop. I'll keep an eye out. It is the 64 bit client.

Right now I am trying to run owncloudcmd on smaller parts of my directory tree.

With logging on the client, I am seeing sync issues that make the server and client inconsistent.

The smaller owncloudcmd finish more often without errors than the full sync client, but I am seeing files that I have removed from my laptop, observed the "Remove" instruction in the log, but the server still keeps the directory/file. This does not happen all the time, just on what seems to be random directory although they do have a large number of files in it.

I did get a new error though, "Failed to query the 'inode' for file ...." and it lists a directory that I have removed via "rm -rf ..." before the sync.

I also see that the 2nd time I run the identical owncloudcmd that it again requests the removal of files that I have deleted. When I check the "raw" owncloud files on my server, I see that the files did not get deleted at all which is why they still appear in the "browser" view of the directory I am testing with.

Once in a while one of the folders I have deleted, and that has a "remove" instruction in the log, actually gets removed on the server, but this is only after multiple syncs. It's as if the server may get the remove instruction but it does not consistently execute it.

I have the feeling there is some type of concurrency/parallelism issue going on and if the client "fails" early, perhaps some "pending" operations are not completed on the server. Perhaps there is some type of queue at work as well that continues processing even after the client completes and I am seeing delayed actions that are confusing me. If that's true, there could be a race condition on deletes that take a long time to process and running the sync client too quickly on the same directory where those deletes are occurring.


#4

It's definitely a latency/queue'ish condition problem. I ran some more tests.

Generally, the deletes take a very long time. I ran some deletes from a browser view of my account. I had 10-15 directories with thousands of files each (roughly). The "delete" of those folders took over an hour. This may because of my software raid and slow disk server setup (6TB drives at 5400prm on a software raid). I have the trash in enable as well so everything gets moved to the trash bin.

During the browser request, the browser had a spinning icon for the folders to delete. I could see the folders slowly getting deleted in batches on the owncloud server by looking at the actual data directory. A few directories at a time would be "removed" and periodically the browser would catch up and show those directories "gone."

During this time, I ran an owncloudcmd sync on that directory. A file that had been removed from the owncloud server disk but not from the web display started to appear again on my local laptop. I also cancelled the web spinning icons by refreshing the browser and noticed that it left some directories that had been deleted on the disk in the web display. Trying to "redelete" these directories resulting in an error in the browser but they there when removed from the display so the at the web browser listing was correct in the end.

So it appears that delete requests on "slow to delete" directories can create conditions where inconsistency can result. Also, this suggests that if the standard client "errors" out for any reason (say due to an inconsistency) it could result in more inconsistency. I never lost a file in all of this, but I did have inconsistent states of the client and server that could not be straightened out even after multiple sync attempts and required manual intervention. This also explained why some files would "reappear" even after I thought I had deleted them. It's also quite possible the sync client exits too early and does not allow the server operations to complete properly, much like refreshing the browser in the middle of a really long delete request. The client does seem to try and stay running for long operations but another operation from the web client or an error could interrupt it. Also, the web client (powered by the owncloud db) appears to run X minutes behind the actual file operations e.g. A folder I deleted on the laptop and sync' to the server did not show up as deleted in the browser UI for several minutes even though the owncloud filesystem on the server showed the folder gone.

The only real serious issue for me is that all of this could result in a file on my laptop that is not on the server and I could have a hard time tracking down the specifics. The client UI display fairly useless information to me when an error occurs although it is now clear that when I get the "cannot open dir" error in the UI client, it must because of an inconsistent state between my laptop, the owncloud database (which probably powers the web interface) and the owncloud filesystem files.

I reviewed a few other dirs that had large numbers of files and indeed, there were files that even after several months had not been synced to the server because of failures in the sync process earlier in the filesystem scan. The client probably needs some random noise injected into it so it does not follow the same "scan" path each time so even if there are errors greater coverage on the sync can be obtained.

The size and nature of the files that I sync may not be a good match for owncloud and my hardware/software configuration or usage of the web/laptop clients.


#5

Do you use sqlite on the server (real database is recommended)?

How did you upload all the files in the first place? If you just drop them manually in the data-folder (not via oc-client, web-interface or webdav), ownCloud doesn't notice new files and only detects them by your "random" noise in these folders:


#6

Placed using the sync client to start with. But it took a few tries to make get them there. Using mysql.

For one of my dirs with a lot of files, the files appear in the server data Dir but not all of them appear in the web UI even after waiting now for an hour. The server datadir was initially missing a bunch of files, but I did an owncloudcmd to sync them. They appear in the server datadir but not in the web ui. I had refreshed the browser a few times while the sync was running so I wonder if that toasted whatever needs to get updated in order for them to appear in the web ui. There's even a folder in the server datadir that I deleted, saw it deleted from the datadir, then it reappeared in the web ui and data dir.


#7

A bit more on the inconsistency between my laptop, the web UI and the server's data files.

Some of my directories have files that start with . (hidden files).

After a long session of trying to understand why the files that are listed on my laptop and on the server data files (and after running files:scan) did not match what was in the web UI, I opened the web ui in three different browsers. All browser reported in small text below the filelist X dirs and Y files. Which looked correct. But the file list in the web UI did not show X dirs and Y files. Each browser displayed something different. Chrome displayed only 3 folders, ephiphany (gnome browser) show 5 folders and several files and firefox showed something different.

Huh?

I was wondering why this was so I thought for fun I would turn on "show hidden files" in the web ui (in the settings, lower left for me). Setting this in all the browser caused the web ui to show all the folders and files consistently. I think when a hidden file sorts to the top of the file list and showing hidden files is off, the rest of the files are not displayed consistently.


#8

The default mysql settings are probably not the best to run ownCloud. Did you do some optimizations (check out the forum, there are tools like tuning-primer.sh). Check you logfiles as well, they should show you if you have problems regarding your database or any other issue.

If you see a difference of the files in the file system and the webinterface, you can as well check all the files manually:
sudo -u www-data php /path/to/owncloud/occ files:scan --all
where www-data is the webserver user. Probably different in FC25.

Try to separate issues. If you manage to reproduce your problems with hidden files and file listing on a new user account with few files and folders (and where everything is properly indexed), report this problem separately to the bugtracker on github.com/owncloud/core/issues.


#9

That's a good point. I've not looked at the db layer, will do that. Have run the scan before but I'm going to do it again a few more times. I'll see if I can reproduce the web ui issue reliably but showing hidden files resolved display issues in all three browsers immediately, which was good.