oC 10 filecache issues on big filesets


#1

Hi,

I am trying to deploy an oC instance back at work but...in like two days our DB almost exploded (the image isn't so far from reality).

In fact, we want to use it solely to map our shared FS which is about 1Pb.

I quickly stopped it while it was indexing the whole thing, purged everything, deactivated versioning, launched it anew and now, it don't grow as fast as before but still..

19-Feb-12_15 : 7625244672 : 7.2G
20-Feb-09_24 : 10359930880 : 9.7G
22-Feb-09_30 : 12398362624 : 12G
23-Feb-09_37 : 15267266560 : 15G
28-Feb-11_01 : 15976103936 : 15G
01-Mar-09_03 : 17599299584 : 17G
02-Mar-10_08 : 19218300928 : 18G
02-Mar-13_50 : 20019412992 : 19G
05-Mar-09_00 : 23920115712 : 23G

With just 5 users on it who have created one or two shares each then stopped using this oC instance, oc_filecache table kept growing.. +4Gb during last week-end without anyone logged even once.

The thing is, i have 500+ users so.. occ files:scan --all can't be used, it would take definitely way too long to gain not so much (actually trying it on my 5 test users and it's already too long).

I guess making some cron with a "truncate table" wouldn't be recommended ?

Have anyone informations / advices / a rope ? (sorry, bad joke but i'm kinda desperate about the next meeting with my boss on that topic).

Thanks in advance !

Regards,
Laurent.


#2

Hi, can you explain what exactly your use case is for ownCloud, i.e. what are you trying to do with it.

Also a few information about your setup, the Version of ownCloud and the log file would be helpful.


#3

Of course, sorry about that.

I use oC 10.0.4.

We have, like told before, 1Pb of shared storage which is available only for our people (LDAP login, local or vpn connection).

I need oC, not for a normal use (drive-like, sync, etc), but to share with the non-LDAP customers and/or short time partners.

About owncloud.log, it contains only that (i just ran through the whole thing) :

{"reqId":"ohSgZBveghRYJ4Q09FBb","level":3,"time":"2018-03-02T16:31:47+00:00","remoteAddr":"139.165.108.132","user":"--","app":"PHP","method":"GET","url":"\/owncloud\/cron.php","message":"Undefined index: size at \/var\/www\/owncloud\/lib\/private\/Files\/Cache\/Scanner.php#424"}
{"reqId":"ohSgZBveghRYJ4Q09FBb","level":3,"time":"2018-03-02T16:31:47+00:00","remoteAddr":"139.165.108.132","user":"--","app":"PHP","method":"GET","url":"\/owncloud\/cron.php","message":"Undefined index: size at \/var\/www\/owncloud\/lib\/private\/Files\/Cache\/Scanner.php#421"}
{"reqId":"ohSgZBveghRYJ4Q09FBb","level":3,"time":"2018-03-02T16:31:47+00:00","remoteAddr":"139.165.108.132","user":"--","app":"PHP","method":"GET","url":"\/owncloud\/cron.php","message":"Undefined index: size at \/var\/www\/owncloud\/lib\/private\/Files\/Cache\/Scanner.php#424"}


#4

Hi,

Even when files located on external storages do not have any metadata information associated to them, i.e. versions, tags, comments, etc., these files are still referenced in the oc_filecache table.

This is because every external storage mount has its own entry in the oc_storages and oc_mounts tables.

This means that indexing of those files is necessary and cannot be avoided.

Possible solution: Disable the external storage in ownCloud. We can provide you with the information on how to delete it from your database if it is feasible for you.


#5

Like i said, its only purpose is to share things which ARE on an external storage.

With what you told me in mind so... is there an average amount of cachesize ?
Like "you have mounted a drive with 100Gb of data, so you'll got 10% of it for your oc_filecache table".

If i can predict (or close) how much our petabyte will generate it would be just what's needed (give some informations to my boss).


#6

Sorry, I have no Idea.

I have asked my colleague but it might take some time until he will have time to answer it.

But in short - many files cause the size of the file cache to get bigger.

In my imagination it's like, if you have a telephone book, with all the numbers of people in one city. The book is very thick, but I has not the actual people inside, it just has the "meta data" ( telephone number and name).


#7

Do you require that those users access with their own account to the shared storage? Could you use a service account to connect to the storage, so all users access to the shared storage with that service account instead of using their own?

Although the real storage is shared, each user can have his own view of the storage (some files might not be visible, or depending on the user, the real path might be different). This means that each user will have his own metadata for the storage.
If you don't want this, as admin, you can set up a storage configuration and set a fixed account for all the users to access through it (or maybe just some of them). The obvious drawback is that you won't be able to track who created or modifed a file in the external storage because everyone will be using the same account.


#8

Sadly...it is a must have, i have to keep each sub fileset of the shared drive with restrictions.

From /home/mass/ we split in ./themes/ which are sub-splitted in ./teams/, each of those having different credentials (500+ users, 10+ themes, 30+ teams can be quite a pain to handle).