Improving data-sync process

expert
client-development

#1

Hi,

I'm a student and PHP developer from Slovakia (Central Europe). As my master thesis project I chose to improve the ownCloud sync process. As I've read some time ago, ownCloud always syncs the whole modified file and not only the different parts of the file.

My questions for more experienced ownCloud developers are:
* Do you think it's possible to improve the file sync process? If so, do you think diff synchronisation is the right thing to do?
* Where (in the code) is the file synchronisation algorithm implemented?

Thanks.


GSoC 2017 - Allow remote-delta on file synchronization
GSoC 2017: Fast LAN Sync
#2

Development of ownCloud sync performance optimization is currently oriented on bundling multiple small files create/update/mkdir/move/delete into bundled requests. This makes sense for synchronisation of many small files in one sync or in the sync over the bigger latency. This work is currently in progress on both client and server side. You can read about that here:

The above improvement is called Bundling. If you are interested, I could append you a presentation I did about that for ownCloud conference. It is already implemented and under the tests now. This way you could have a sense on how to make a improvement to the sync algorithm.

About the improvement you are talking about, it is called Delta Sync. However, it is not easy to find a use case for delta sync, since e.g. documents are xml basicaly, it means each time you change them, their content is changing. It is nearly impossible to find a case for a file that deltasyncing will make sense (and it is very expensive)

What we are currently looking at, is to implement improvement called Dynamic Chunking (Dont mess up with deduplication using dynamic chunking). Basicaly, concept is that, in current implementation big files e.g. 100MB are being chunked into smaller pieces called 10MB chunks. The problem here is that this is fixed value, and it is not appropriate on all types of the networks (WiFi versus FiberToTheHome versus LAN). For WiFi it makes sense for small chunks, while for fast networks, it makes sense to have very big chunks.

I am not sure how familiar are you with TCP, but there the concept it is the same. It is called Additive increase/multiplicative decrease (AIMD) congestion control. This behaviour is also called Probing For Bandwidth,

https://en.wikipedia.org/wiki/TCP_congestion_control#Additive_increase.2Fmultiplicative_decrease

I would look for something similar, but for synchronisation of files. :slight_smile:


#3

Summarizing above thing, you should not focus on improving sync by definition. The improvement should follow Design Thinking paradigm.

Starrt from Empathize, not Implement :>


#4

A longer discussion about delta syncing is available at the client issue tracker. The largest use case for most people requesting such a delta sync where backups, videos, truecrypt containers and so on.


#5

@gabbi7 If you want to give it a try, read all the conversations at ownCloud about that and get the list of candidates (file types) for delta sync, along with granuality of the delta e.g. 10% file size hashes.

You will need to add this to capabilities on the server the same way I did it for bundling (this is why I mention please have a look on my job, since I did same thing)
\/


github.com/owncloud/core/pull/25760/files#diff-920833889854d937ebeda804d3eb5b19R1

On the client, we are already aware which file are NEW, and which files are to UPDATE:
(In my case I am detecting that the file is new and bundling is supported, in your case that it is UPDATE and file is in deltasync capabilities file types)
\/

github.com/owncloud/client/pull/5155/files#diff-7e5082f89a138020f2b1d37fc97d17dbR386

You would need to create new job DeltaUpload, which will first do GET to server DeltaSync Plugin, giving you list of hashes at specific granuality for a specific file, and while you get it, you do hashing also on the client. If you see that there is no particular change or change is not exceeding 50% of the file, you continue with normal Upload. If change is smaller than a threshold (this should be obtained also in capabilities), you continue with deltasync

When asynchronously it is finished, you continue and do POST/POSTs(since file can be chunked).Multipart saying what file and what offset to update.
\/

github.com/owncloud/client/pull/5155/files#diff-498a298e0777de544a6af0e8d6b09a46R900

I would first give a try only with big files, which are already chunked, so that you actualy do delta sync on chunks :>


#6

Klaas had a blog article on that a while ago. You already got other tips.
A design and a prototype would certainly be great achivements!

Looking forward to hear more from you!


#7

About deltasync in ownCloud on CS3 conference last year:

http://www.video.ethz.ch/events/2016/cs3/day1/bb22b7ee-febf-4b25-a101-08ce462de736.html

http://www.video.ethz.ch/events/2016/cs3/day1/0f26775b-d638-4863-a793-595d42ba6d03.html


#8

I hope this feature gets added


#9

@konkwest It is under formal protocol specification and consultation with customers now.