Desktop client unrecoverably falls back to user re-login prompt when OAuth2 refresh fails — multiple paths to finishFailedRefresh, no proactive token refresh

Environment

  • Server: ownCloud 10.16.0.0 (community), Debian 13 / nginx / PHP 7.4-fpm / MariaDB 10.11

  • Client: owncloud-client v7.2.0 (Linux, Windows, macOS — same behavior)

  • Apps: oauth2 (default), twofactor_totp (enforced), brute_force_protection, user_external (IMAP backend)

  • Topology: clients reach the server through a reverse proxy (nginx-based, trusted_proxies configured); ~589 users; subpath URL (/owncloud)

Related prior thread

This problem has been discussed before in the context of a server-side workaround: OAuth valid token time. The discussion there ended at “patch the constant in the server code”; this post picks up where that one stopped and looks at why the desktop client cannot recover even when the server-side refresh token is still valid.

Symptom

After deploying OAuth2 with the default apps/oauth2/lib/Db/AccessToken.php EXPIRATION_TIME = 3600, desktop clients prompt users to re-authenticate approximately every hour during active use. With 2FA TOTP enforced, every prompt requires both password and a 6-digit code — practically unusable at scale.

The community workaround from the prior thread above is to patch EXPIRATION_TIME = 604800 (7 days) in the server code. This works (we confirmed it) but it is a workaround on the server side that hides client-side problems rather than fixing them.

This post is an attempt to document those client-side problems precisely, with code references, so they can be tracked and fixed upstream.

Investigation — code paths in owncloud/client that lead to a user-visible prompt

All references are to owncloud/client@master (tag v7.2.0).

1. invalid_grant / invalid_request clears the refresh token unconditionally.

src/gui/creds/oauth.cpp:540-545:

if (!errorString.isEmpty()) {
    if (errorString == QLatin1String("invalid_grant") || errorString == QLatin1String("invalid_request")) {
        newRefreshToken.clear();
    } else {
        qCWarning(lcOauth) << "Error while refreshing the token:" << errorString << ...;
    }
}

A refreshFinished is then emitted with an empty refresh token, and Credentials::handleRefreshSuccess reacts:

if (refreshToken.isEmpty()) {
    qCWarning(lcCredentials) << "Refresh job succeeded but refreshToken is empty -> log out";
    finishFailedRefresh();
    return;
}

invalid_grant is reasonable to honour as logout, but invalid_request is not the same class of error — it usually means a malformed request, a transient routing issue, or the server momentarily not recognising the request shape. Treating it as “permanently log this user out” is too aggressive.

2. Network errors during refresh: 3 retries × 30 s, then log out.

src/gui/creds/credentials.cpp:37-38:

constexpr int TokenRefreshMaxRetries = 3;
constexpr std::chrono::seconds TokenRefreshDefaultTimeout = 30s;

handleRefreshError retries for HostNotFoundError, TimeoutError, OperationCanceledError, TemporaryNetworkFailureError, ConnectionRefusedError, …, but caps at 3 attempts. After that:

if (_tokenRefreshRetriesCount >= TokenRefreshMaxRetries) {
    qCWarning(lcCredentials) << "Too many failed refreshes" << _tokenRefreshRetriesCount << "-> log out";
    finishFailedRefresh();
    ...
}

90 seconds of transient backend trouble (502/504 from a reverse proxy, php-fpm pool exhaustion, captive portal recovery, VPN reconnect) is enough to permanently log a user out, with no exponential back-off and no later self-healing.

3. slotInvalidCredentials has zero grace once the refresh token is gone.

src/gui/accountstate.cpp:528-549:

if (creds->ready())
    creds->invalidateToken();
if (creds->refreshAccessToken())
    return;
creds->askFromUser();
setState(AskingCredentials);

Credentials::refreshAccessToken returns false immediately if _refreshToken.isEmpty(). So once any of the failure paths above has wiped the refresh token, every subsequent 401 jumps straight to the login prompt — no retry, no waiting, no chance to recover when the network/server is back.

4. OpenIdConfig is fetched on every refresh and is required to succeed.

AccountBasedOAuth::refreshAuthentication is gated on fetchWellKnown. If well-known endpoint discovery fails (subpath mismatch after a server-side config change, a reverse-proxy 404, IDP downtime), the refresh request never even runs and we hit path 2.

Root cause hypothesis (combined)

The default 1-hour EXPIRATION_TIME for OAuth2 access tokens means every active client enters the refresh flow at least once per hour. At our scale (589 users), that is a roughly uniform stream of refresh requests against /oauth2/api/v1/token.

Any individual refresh has a non-zero probability of hitting one of the failure paths above:

  • a momentary 5xx from the reverse proxy or backend

  • a slow php-fpm response that exceeds the 30 s timeout

  • brute_force_protection rate-limiting the shared corporate egress IP under load

  • a captive portal / VPN reconnect at exactly the wrong moment

  • a race when the client fires two concurrent refresh requests (file activity + connection validator)

Once one of those flips a user into finishFailedRefresh, the refresh token is wiped and from that point on the client cannot self-heal — every subsequent 401 goes to askFromUser instantly (path 3).

Multiplied by 168 hourly refresh cycles per user per week, even a low per-refresh failure rate produces a near-constant trickle of users getting prompted.

Why the 7-day server-side patch hides the problem

Extending EXPIRATION_TIME to 604800 reduces the number of refresh attempts per user by 168× during a normal work week. Statistically, far fewer users hit any of the failure paths, and the refresh that does happen tends to occur at client start time (a more controlled moment than mid-sync).

It does not fix:

  • The “first failure is permanent” property of finishFailedRefresh clearing the refresh token.

  • The lack of distinction between recoverable (network error, invalid_request) and unrecoverable (invalid_grant, explicit token revocation) errors.

  • The lack of a proactive refresh ahead of expiry, which would also smooth out the synchronised flash-crowd on the token endpoint.

Proposed improvements (client side)

  1. Proactive refresh at e.g. expires_in - 5 min instead of waiting for a 401. Eliminates the lock-step expiration of all clients and lets the client retry well before the user notices.

  2. Differentiate error classes: only invalid_grant (or explicit invalid_token introspection) should clear the refresh token. invalid_request, network errors, 5xx, and timeouts should keep the refresh token and back off.

  3. Exponential back-off after the 3-retry budget, instead of going straight to logout. Hold the refresh token, keep retrying every few minutes silently, only ask the user after e.g. an hour of sustained failure.

  4. Persist refresh token across finishFailedRefresh for slotInvalidCredentials — give the next 401 at least one full refresh attempt before prompting, even if the previous attempt failed.

  5. Single-flight refresh (mutex/promise) so concurrent jobs share one refresh in-flight rather than racing.

Workaround we are running

Code patch on the server:

sed -i 's|public const EXPIRATION_TIME = 3600;|public const EXPIRATION_TIME = 604800;|' \
/var/www/owncloud/apps/oauth2/lib/Db/AccessToken.php
systemctl restart php7.4-fpm

Idempotent re-apply script needed because occ upgrade overwrites the patch.

Refresh tokens are unbounded in oc_oauth2_refresh_tokens (no expires column), so the 7-day access token is fully usable — clients still refresh silently in the background once a week.

Question to maintainers

Is there an appetite to address points 1–5 in owncloud/client? The 7-day workaround is widespread but it is a server-side mask over real client-side fragility, and admins without code-patching access to the oauth2 app are stuck.

Happy to file a GitHub issue against owncloud/client with the same content if that is preferred over the forum.