Upload, chunk and big files

Hello everyone,

after reading the big file upload documentation I’ve noticed that the upload behavior was not exactly as described. My goal is to improve the overall upload process speed and understand exactly what can be tweaked.

Environment setup

  • NGinx 1.16.1
  • PHP7.3 FPM
  • Redis for local cache and locking
  • MySQL 5.7
  • upload_max_filesize and post_max_size both at 2MB (for testing) // max_execution_time at 10s
  • ownCloud 10.3.2 installed from a tar.gz
  • Master Key encryption
  • The datadirectory is on NFS

Definition

Let’s define a few words:

  • home datadirectory is the directory where your files are stored based on the datadirectory configuration path, for the user toto, it might be /var/owncloud/data/toto/files
  • upload datadirectory is the directory where your files are uploaded when chunking based on the datadirectory configuration path, for the user toto, it might be /var/owncloud/data/toto/uploads, once the upload is over, the file is moved in the home datadirectory once the chunks are put together.

Testing and file upload behavior

These tests have been made from both the web interface, and the desktop client (as of now version 2.6.1) and gave me the same results.

  • When uploading a file < 10MB, there is no chunk and the file is uploaded right in your home datadirectory, at first it’s suffixed with .ocTransferIdxxx.part then renamed with the name on your local machine. It means you don’t use twice the size of the uploaded file.

  • When uploading a file >= 10MB, chunking kicks in. The file is sliced in 10 MB chunks and each chunk is uploaded in one file named only with integers such as 1521355039 for the Desktop Client or web-file-upload-4b9c392260a3ce2c0287017cf41e3033-1580807743651 from the web interface, within upload datadirectory. Once the upload is over, processing files occur. This step moves the uploaded file from upload datadirectory to the home datadirectory. By the way, the file is suffixed with .ocTransferIdxxxx.part when copied in home datadirectory. This time you use up to twice the size of the uploaded file.

  • Uploading 100MB without tmpfs (test made 3 times):

overall upload time: ends at 29s
overall processing files time: ends at 43s => 14s

Observations

  • It seems ownCloud is not using sys_temp_dir or upload_tmp_dir but its own tmp directory here named upload datadirectory, in my case it goes right in the real data directory wich is simply an NFS share, which is kinda slow.

  • I’ve noticed that the upload_max_filesize or post_max_size do not matter wether you upload from the web interface or from your desktop client. I set 2MB and I could upload 1GB without any errors.

  • Moreover it’s never triggered the 10s PHP timeout either. Well I guess I upload a chunk in less than 10s.

  • If an upload fails, the upload datadirectory could keep the failed uploaded files and never get erased. I didn’t check yet the doc to clean this.

Further testing

After reading some documentation, I noticed you could define the dav.chunk_base_dir which is actually the upload datadirectory !

So I created a tmpfs to increase the upload speed of chunked files, notice it won’t improve the upload of < 10MB files since they don’t go through the same process.

  • Uploading 100MB with tmpfs (test made 3 times):

overall upload time: ends at 26s
overall processing files time: ends at 34s => 8s

In my case, the improvement is mostly in the copy from the upload datadirectory to the home datadirectory since it doesn’t go twice through the NFS. A “small” improvement is noticed on the upload of 3s. The overall speed improvement is ~10s for a 100MB file.

Observations

  • It’s possible to improve the overall upload process with a tmpfs BUT it’s not really a solution to set 200GB of RAM for this (2 x 4 GB x 25 users = 200 GB required temp space), a bit too much IMHO.

Conclusion / Questions

I’m still confused about the big file upload :smile:. I don’t say I did, but it’s possible I misconfigured so if anyone is interested, please do some testing, I’d like some feedback.

  • What’s the deal with dav.chunk_base_dir and sys_temp_dir/upload_tmp_dir ?

  • When is it interesting to put high value for upload_max_filesize or post_max_size in ownCloud ? Has anyone noticed that too ? For example, in my case I set 2MB for both, and when uploading a 100MB, I got 10 HTTP requests to upload the 10MB chunks. PHP should have stopped processing. Btw if you try uploading 1GB when you only have a tmpfs of 512MB, the upload process goes on but fails at the very end (that’s normal but ownCloud should detect that the dav.chunk_base_dir is too small).

  • How to clean up the upload datadirectory/ dav.chunk_base_dir of failed upload attempts ?

  • Is it possible to increase the upload and processing files speed ?

Thanks and sorry it’s quite long.

1 Like

Is it possible to increase the upload and processing files speed:
Using the client you can increase the chunk size. If there is only one chunk there is no processing time needed. At least in my observation.
https://doc.owncloud.com/desktop/advanced_usage/configuration_file.html

2 Likes

Hello @cresse

thanks, that’s interesting although it seems you can’t change the chunk size on the server side so for web users it’s still 10MB.

1 Like

When you upload a file, apache (and nginx, I guess) usually stores the file in memory if it’s small. If it’s big (more than 2MB maybe?) they use temporal files to store the files.
Once apache have the whole uploaded file, it give PHP control over it, so ownCloud can move it to wherever it wants.
In case of chunks, the process is the same for apache, it just consider the chunk as the whole file.

There is occ dav:cleanup-chunks command for that. You can set it up in cron to run it periodically according to your needs.

Other than that, it’s nearly impossible to provide a proper way to cleanup, mainly because the client could reupload the failed chunk at any time, but the server doesn’t have any idea about when: it could be in 5 minutes, in 30 minutes or 8 hours, or more

2 Likes

Thanks for the occ dav:cleanup-chunks command, well actually when running it gives me an error:

In Directory.php line 345:
                                   
  [Sabre\DAV\Exception\Forbidden] 

I’ll check what’s going on.

Well, it depends, there could be a checksum control when there is a new upload of the same file at the same location to clean the old chunks (in case the previous upload has failed. Also there is a difference between the directory for files upload through the web interface or the Desktop client. The Desktop client keeps tracks of the failing upload or of what’s left to upload while the web interface can’t, thus at least the Desktop client could notify the server to clean up.

1 Like