Elasticsearch indexing problem


On our local Owncloud-Server we have several SMB-Shares connected and successfully installed Elasticsearch. The SMB-Shares are connected through a unique user with access to the shares. Everything is working fine, the Windows-Domain-Users are authenticating with LDAP on a Domaincontroller. We have installed Elasticsearch and the Fulltextsearch Plugin and all is running.
After a long time indexing for the first User we got this message: Lost connection to LDAP-Server.

The index is builded and all seems fine. But we are not able to finish the indexing, cause it allways stops after the first LDAP-User in our User-List. How to solve this?

Yes, there are about 7 million files on the SMB-Shares, it takes time, I know. But it will never finish, if it’s allways stopping after the first LDAP-User on our Owncloud.

root@owncloud10:~# curl -s ‘http://localhost:9200/_cat/indices?v’ | head -5
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open .geoip_databases ZvX1QzZASZ2x7y705hv2eg 1 0 34 29 31.3mb 31.3mb
green open oc-ochp4vti2nql-relv2 cPud-FS4Qg-KiALzfNk9TA 1 0 0 0 227b 227b
green open oc-ochp4vti2nql vOCLf6enQHGRMl_FqcPnYg 1 0 2530948 399711 129.1gb 129.1gb

As a workaround, try to create the index one user at a time. If that works, specially for the user causing trouble, you can create a bash script like

for user in userlist; do
  ./occ search:index:create $user

Thank you @jvillafanez . I’ll first try to build the index for the first user. There must be about 5 million files in the share. If you are firm with the fulltextsearch. There are about 100 users, and all see the same smb-shares. How is the index builded? Does the fulltext-search index all the shares for every user again, even if the shares are still indexed for another user? I don’t understand the technique, how this is handled by the fulltext-search. The users don’t add own files to their personal folders, but only to the smb-shares.

Files are indexed based on the internal ownCloud fileid.

The fileid depends on how the storage is configured.
If all the users are using the same smb account, the fileids will be the same, so there won’t be duplicates in elasticsearch.
If the users are using different accounts, the fileids will be different (we can’t ensure that all users will see the same files), so there will be duplicates in elasticsearch.

Note that, if I remember correctly, external storages aren’t indexed by default, and you need to enable such option.