Server side basic deduplication with fslint and hard links - any risks?




I've experimented with my home ownCloud 9 setup using fslint (findup -m) to replace duplicates of files in owncloud/data/(*)/files/ with hardlinks.

Testing looks good, and everything behaves as I hoped - due to ownCloud's copy-on-write routine when a file is uploaded, if one of the duplicates is modified, they rightly exist as two distinct files server-side, the hard link is broken and the file that is modified on the client gets a new inode number on the server, and the other copy remains unmodified. Awesome - exactly what I want.

But before I go and do this in a production environment, are there any good reasons not to do this?



manually modifications within the datadir are unsupported since ever in ownCloud. As most people probably will never say "yeah, do this" you need to test it thoroughly on your own if these hardlinks are causing issues.

I'd rather go with a filesystem which can do or support file deduplication.


Ok thanks, yeah I understand.

Basically I'm relying on ownCloud doing its "copy-on-write" routine when a file is modified on a client and uploaded to the server - so with hard links this will effectively be "copy-on-write link breaking" (to borrow the term from VServer).

Testing so far hasn't revealed any problems, I'm continuing to test (while the site I need to do this for continues to fill its disc...!)

It's either this or btrfs. At least hardlinking doesn't involve moving masses of data around due to a reformat.


An update ... after testing myself revealed no issues, I've run fslint on the ownCloud server-side folders of three other users (together as they have a lot of the same large files between them).

I'll run with that for three months or so and report back whether I hit any issues. Started in early December 2016. So far, so good.

I've ruled out ZFS' deduplication as its inband solution is too memory-intensive although may move ZFS it in time due to its other features and combine it with the above if it works. I haven't done any testing on btrfs (which does out-of-band dedupe).

edit: everything I read about btrfs says it is slow! Will probably try to avoid it.