MOVE: ownClouds worst nightmare!

Imagine you convinced your family early on to use ownCloud to store their personal data. They have been using the instant upload feature of the mobile apps to sync pictures taken with the smartphone for years. It probably contains thousands or tens of thousands of memorable family shots and events. Having them all in one folder on the desktop is kinda unhandy, so someone decides to use the explorer to move all of them to a /Pictures/Family/My Smartphone/ sub folder. The ownCloud desktop client actually detects the change as a move operation and correctly sends a single MOVE WebDAV request to the server.

This is where your nightmare begins. Let me tell you some of the ways this can go horribly wrong.

What could possibly be so problematic with a move operation you ask? On your desktop, it was an atomic operation, right? So, why does it take so long for ownCloud? For ownCloud the MOVE is an atomic operation as well. At least on a posix file system. But in addition to moving the files in the file system ownCloud also needs to update the database. If the source and target path are located on the same virtual storage ownCloud can optimize these to a single UPDATE for the move in the db and another UPDATE of the mtime and etag in the parent folders.
What is a virtual storage? You can think of it as a partition on your hard disk. ownCloud creates a tree of mount points for every user. And a share is represented as a new virtual storage as well. Even if the source and target folder reside on the same partition ownCloud has to update its metadata for all affected files and change the storage id in the database.

We could add optimizations to reduce the number of UPDATEs needed for that by recursively descending the tree and

  1. doing a single UPDATE per folder using the parent id or
  2. collecting all affected file ids and using the IN clause (with a chunk size of 999 for sqlite compatibility) or
  3. using database platform specific SQL that is meant to manipulate hierarchical trees.

Even if we were to use database platform specific SQL one of the two operations can go wrong: either the move on the disk or the database operation. In that case we have to roll back the other operation. You also have to decide which operation to execute first to properly implement a rollback.

Even if we assume that both operations will complete successfully (in a distributed system … right … I know … stay with me), a misconfiguration of the PHP timeouts can easily kill the PHP script execution and leave you without a way to actually do the rollback if one of the operations takes too long.

Why would the move in the file system be a problem for ownCloud? Because the move operation might become a copy and delete if source and target path are located on different NFS servers. Now a simple move can easily take minutes. Maybe your LDAP server or your database or even a firewall somewhere in between does not like an idle TCP connection and silently kills it. Or your PHP script is killed by selinux or apparmor.

Scared, yet? What if I tell you that updating a file involves copying a file to the versions storage? Copy may not be as bad as move, but the timeouts are still dangerous. What if I tell you that deleting a file involves a move operation to the trash?

Now, to make this less scary: If anything goes wrong, you can scan the files with occ and the files will reappear. All that can happen is that you lose the metadata that is associated with files because the file id changed. That means shares, tags and comments may have to be recreated from a backup. You do have a database backup, right? RIGHT?

Oh and in case the move of that virtual machine image across NFS servers flips a bit you can always restore it from your backup, right? RIGHT?

Enough with the horrors. Whet can we do about it?

The problem is that the filecache is not a cache, breaking the atomicity of MOVE operations. We should store the file id inside the storage. Or even use native storage capabilities to implement shares, comments, tags, trash and versions. That is what reva is about: integrating tightly with the storage. We will still need a cache to speed up some of the operations, like looking up a file by its id. But that can then be a real cache, not a misnamed index in a database.

Sleep well!