I'm a student and PHP developer from Slovakia (Central Europe). As my master thesis project I chose to improve the ownCloud sync process. As I've read some time ago, ownCloud always syncs the whole modified file and not only the different parts of the file.
My questions for more experienced ownCloud developers are: * Do you think it's possible to improve the file sync process? If so, do you think diff synchronisation is the right thing to do? * Where (in the code) is the file synchronisation algorithm implemented?
Development of ownCloud sync performance optimization is currently oriented on bundling multiple small files create/update/mkdir/move/delete into bundled requests. This makes sense for synchronisation of many small files in one sync or in the sync over the bigger latency. This work is currently in progress on both client and server side. You can read about that here:
The above improvement is called Bundling. If you are interested, I could append you a presentation I did about that for ownCloud conference. It is already implemented and under the tests now. This way you could have a sense on how to make a improvement to the sync algorithm.
About the improvement you are talking about, it is called Delta Sync. However, it is not easy to find a use case for delta sync, since e.g. documents are xml basicaly, it means each time you change them, their content is changing. It is nearly impossible to find a case for a file that deltasyncing will make sense (and it is very expensive)
What we are currently looking at, is to implement improvement called Dynamic Chunking (Dont mess up with deduplication using dynamic chunking). Basicaly, concept is that, in current implementation big files e.g. 100MB are being chunked into smaller pieces called 10MB chunks. The problem here is that this is fixed value, and it is not appropriate on all types of the networks (WiFi versus FiberToTheHome versus LAN). For WiFi it makes sense for small chunks, while for fast networks, it makes sense to have very big chunks.
I am not sure how familiar are you with TCP, but there the concept it is the same. It is called Additive increase/multiplicative decrease (AIMD) congestion control. This behaviour is also called Probing For Bandwidth,
A longer discussion about delta syncing is available at the client issue tracker. The largest use case for most people requesting such a delta sync where backups, videos, truecrypt containers and so on.
@gabbi7 If you want to give it a try, read all the conversations at ownCloud about that and get the list of candidates (file types) for delta sync, along with granuality of the delta e.g. 10% file size hashes.
You will need to add this to capabilities on the server the same way I did it for bundling (this is why I mention please have a look on my job, since I did same thing) \/
On the client, we are already aware which file are NEW, and which files are to UPDATE: (In my case I am detecting that the file is new and bundling is supported, in your case that it is UPDATE and file is in deltasync capabilities file types) \/
You would need to create new job DeltaUpload, which will first do GET to server DeltaSync Plugin, giving you list of hashes at specific granuality for a specific file, and while you get it, you do hashing also on the client. If you see that there is no particular change or change is not exceeding 50% of the file, you continue with normal Upload. If change is smaller than a threshold (this should be obtained also in capabilities), you continue with deltasync
When asynchronously it is finished, you continue and do POST/POSTs(since file can be chunked).Multipart saying what file and what offset to update. \/