Issue with versioning on Google Drive

Hello,

I've got an issue using Google Drive (GD) as external storage. Everytime when modifying a file stored at GD using the OC web interface as well as using the OC Windows SyncClient, the file disappears from the OC web client and gets deleted afterwards by the Win client on the local PC drive.

After some investigations I found out, that the file still exists on Google drive, but two times with the same file name. It seems to me that OC breaks the versioning management of GD. The activity log in GD shows, that the modified file was uploaded with a temporarily filename and were renamed after upload was completed.

I'm not 100% sure if I maybe configured the new App connection somehow wrong, as the available tutorials for connection GD are not 100% consistant.

I checked the local storage as well as Dropbox for the same issue, but there everything works fine.

Steps to reproduce
1. Not sure, what to describe here. The issue is 100% reproducible for any file, which already exists on GD and were modified using OC
2. Any new created file, which does not exist on GD, is uploaded correctly to GD
3. When modifying a new uploaded file, the same issue affects this file afterwards

Expected behaviour
I would expect, that OC supports the versioning of GD, which would mean OC is just adding an new version to the existing file. If this is not possible for whatever resons, I would expect that the old file on GD is deleted before renaming the uploaded temp file.

Actual behaviour
After Changing a file in OC web client or on a Win PC locally, which is connected to OC using the Win OC SyncClient, an additional file with the same file name is created in the GD storage. If OC finds two or more files with the same name in the same GD folder, no file is shown in the OC web client and local files on the PC are deleted by the OC SyncClient.

Server configuration
Operating system: Linux
Web server: Apache
Database: MySQL
PHP version: 5.2.17
ownCloud version (see ownCloud admin page): 9.1.0
Updated from an older ownCloud or fresh install: NO
ownCloud log (data/owncloud.log, see https://central.owncloud.org/t/how-to-find-webserver-or-oc-logfile-enable-php-logfile/808):
I just renamed the log file and reproduced the issue described above. The new created logfile is showing only an issue with a looked file on Dropbox, but nothing related to GD

Special configuration (external storage, external authentication, reverse proxy, server-side-encryption): GD, Dropbox and local storage used, nothing special on top of this

Integrity status for oC9+

No errors have been found.

ownCloud doesn't support Google Drive's versioning.

The problem with Google Drive's versioning is that it creates several files with the exact same file name. ownCloud cannot work with a remote folder containing several files with the exact same names, so the GDrive external storage module will ignore such files. This is why you see them disappear.

Now in the case of versioning, one could say that maybe ownCloud should only access the latest version of a file. Maybe. But it's not that easy: it seems that GDrive also allows you to simply upload several files with the same name even if they aren't really related or versions of each others. In this case, how to decide which file to use ?

Hi PVince81,

I uderstand, what you're saying. But in this case the OC interface to GD definitely makes no sense as it cannot be used for changing existing files. Or can I configure GD to disable this behaviour? I do not need versioning, but I do need R/W access.

I'm wondering that none elese has reported this as an issue.

Rgds,
Armin

Using GDrive read-write from inside ownCloud works fine.

I guess the reason we didn't see more reports is because people are using GDrive only as a container to extend their OC storage space.

In your use case you want to be able to edit files on GDrive and for some reason it creates duplicates.
Last time I checked, editing office documents wouldn't create duplicates, so not sure.

It might be possible to add a switch to the GDrive storage in OC to make it use the most recent file when duplicates are found and hope all goes well.

Hi PVince81,

that maybe would be a solution. Not sure, but maybe another option would be not to use the temp name and right upload the file with the original file name. What I identified is, that you do not see multiple files in the Google Drive web interface. They just generates a new version when uploading the file with the same file name. OC uses the temp file name until the upload is completed and then renaming, which - for my understanding - is the root cause for breaking this.

I'm not very familiar with the OC php code, but I'm slightly experienced with php at all. If you can give me a hint where I can maybe temporarily disable this use of the temp name, I could test, if this would be a potential solution.

Is the temp name generation for GD independant from the other interfaces or are all interfaces using the same functions and if I would change this for GD, then I probably would get another issue with Dropbox?

Rgds,
Armin

Hi PVince81,

I already found the position in the php code, which generates this
temp name. In reality this is not a temp name, it's a .part name. It
depends on the function needsPartFile() in Connector/Sabre/File.php. I
just changed this function for test purpose, that it returns false
forever.

After changing this, everything works fine. When changing e.g. a
short text file in OC, it's generating a new version on GD. That means
the GD versioning still works fine with OC and the issue is the
ocTransferID...part name extension on the file name and renaming
afterwards. Versioning breaks if you upload the file with a different
file name and renaming it afterwards. Then it's not part of the
versioning container and a seperate file is with the same name is
generated on GD.

I'm still not sure, if I'm the the only one with this problem and I
think this could be a bug in the current version. Because I already get
this problem when testing with tiny text files created and modified on
OC only. It already disappears after it was modified once without
working on in the Web Client of GD directly. I also tested with
different file extensions and it never worked. So not sure, why this
function generates a 'true' for tiny files at all. sounds to me more
like a bug.

As the function needsPartFile() seems to be used for all connections,
I'm not sure about the intention of this function and potential side
effects of my modification. Therefore, I turned it back to the original
version. It would be good if an expert of the team could have a look
onto this in detail. The comments in that function look like, that
someone already had in mind to make this somehow dependant on the
storage:

// TODO: in the future use ChunkHandler provided by storage
// and/or add method on Storage called "needsPartFile()"

Thanks a lot for your support

Cheers,
Armin

Thanks for your testing. The purpose of the part file is to be able to overwrite the target file with a single rename at the end. If OC would write directly to the final file, that file could potentially be accessed outside of OC and would see a partially written content. So the idea is to first write the full file and then rename it to quickly overwrite the original one (atomic rename). The partial content thing might depend on the storage type, not sure if GDrive is clever enough to prevent this when not working with part files.

I'm surprised that editing even simple text files in OC don't properly overwrite the target file. Last time I tested this a few months ago it worked. Either GDrive changed the behavior or some option is missing in the final rename. So maybe you're right, it could be a bug in GDrive storage impl on the OC side.

If GDrive is clever enough to not expose a partially written file until it is done to the user, then part files could be disabled for GDrive on the storage level (need to look into the code to see how to do this).

Are you able to test with needsPartFile set to false and then upload/overwrite a huge file in GDrive, and while the transfer is happening, download that file off GDrive directly to see if you get a partial file or the old file. Considering that GDrive already has some kind of versioning, it is very likely that it is clever enough. Then based on this information we can look into disabling part files + final rename for GDrive only.

O.k., I've just tested with the modified needsPartFile as false. I tested three different use cases:

  1. Uploading a new, non existing file 1MB as well as 10MB using the OwnCloud Windows SyncClient V2.2.2. In parallel I had the GD web client open. The new file appears in the web client first time in the same moment, when the SyncClient completed the job.

  2. Uploading a small text file. Afterwards I changed this in my local Windows sync folder. During synching with the SyncClient I tried to access the file in the GD web client and got the first version of the file on and on until the job was completed.

  3. Same as 2., but then uploading the new replacement via the OC web client instead of the SyncClient. Same behaviour.

Looks like everything is working fine with these settings and you will never see a partial file during upload.

Maybe one correction on my previous post: When using the embedded text editor of OC it in deed works fine when changing modifying the file. It went wrong using the upload function of the OC web client. So forget about this.

Can I leave the File.php with this modification are would that have a side effect with the local storage or with Dropbox?

Rgds,
Armin

Thanks for testing.

The web UI upload and text editor might not make use of part files as far as I remember.

If you leave the change in File.php it will also affect your local storage, meaning that you might observe partially written files when they are not done yet. However, for the local storage, there is file locking in place that prevents OC to open/download a partially written file. The problem with part file is really only with external storages. So you might be fine.
Not sure about Dropbox though, it is likely that it works like GDrive.

I raised https://github.com/owncloud/core/issues/25826 in core to make it possible for storages to disable part files. Once this API is availalble, we could make GDrive and Dropbox disable them.

and for the versioning bug, raised https://github.com/owncloud/core/issues/25827