AEGIS — Ankorstore’s plaform authentication system

Jonathan Vuillemin
Ankorstore Tech Blog
11 min readMar 10, 2023

--

📘 Introduction

Ankorstore scaled up dramatically in late ’21 and ’22, resulting in a burgeoning of teams, projects, and features.

To manage this growth from the platform’s perspective, we needed to consider solutions to avoid incurring obvious and expensive technical debt.

Specifically, one such example was authentication: we needed to avoid building, maintaining, and integrating several authentication systems for our different services.

Thus, the AEGIS project was born with the idea of creating a common, unified, and easy-to-use platform authentication system.

You can find more information about AEGIS naming origins on Wikipedia.

⚙️ Technical overview

Before diving into the technical details of AEGIS, let’s review some of the core concepts to help with our foundational understanding.

☁️ Cloud native

AEGIS has been made with (and for) cloud-native technologies.

It was built:

  • mainly using Go
  • to operate on a Kubernetes cluster
  • to specifically work with an Envoy based service mesh (like Istio).

📦 Modular

AEGIS is not a typical monolithic API authentication gateway.

Instead, it is a collection of platform components, working together, including an auth proxy, an auth agent, an auth server, and a set of Envoy filters.

  • Auth proxy is the heart of AEGIS. It is a lightweight gRPC application written in Go, responsible for providing external processing to the Envoy ext-proc filter, which is applied to the Istio ingress gateway.
  • Auth agent is also a lightweight HTTP application written in Go. It offers shared authentication endpoints for web applications, and handles Oauth2 flows in their behalf.
  • Auth server is an HTTP application written in PHP. It serves internal and external OAuth2 flows, and exposes JSON Web Key Set (JWKS) endpoints.

🚀 Performances / scalability

AEGIS was designed to be fast, resilient, and highly scalable.

  • Auth proxy is involved in every request made to the platform (applied at ingress gateway level). As it is a stateless Go application, it is very resilient and scalable, allowing quick reactions to network traffic peaks.
  • Auth agent is involved in authentication state changes (e.g. going from guest to user). This component is also built with Go for the same reasons.
  • Auth server is built with PHP, but boosted by Roadrunner.

🧪 Advanced Istio / Envoy usage

AEGIS uses one of the most compelling features of an Envoy-based service mesh: the ability to intercept and modify traffic.

This approach enables common logic, such as authentication, to be offloaded from the applications, and to be handled and shared at the platform level through the Envoy listeners / filters mechanism.

Envoy JWT authn filter

The Envoy JWT authn filter is particularly useful for offloading JWT authentication logic from backend applications when placed on Istio proxy sidecars (Envoy).

  • 1 — The client performs OAuth2 flow against an authorization server to obtain an access token (JWT)
  • 2 — The client performs a call to a protected application endpoint by providing the JWT (authorization header)
  • 3 — The application’s proxy sidecar (Envoy) intercepts the request and validates the JWT against the JWKS fetched and cached from the auth server, as well as additional configured criteria (issuer, audience, etc.) If the validation fails, the proxy immediately sends a 401 response to the client, without making a call to the application. Otherwise, the request, along with the validated JWT, is propagated to the application container.

Envoy ext-proc filter

The Envoy ext-proc filter can be particularly handy when placed at the Istio ingress gateway (Envoy) level. It allows delegation of request and response mutations to an external gRPC server (also known as an external processor).

This filter uses a bidirectional (BiDi) gRPC connection, in which Envoy (acting as gRPC client) streams request/response parts (headers, body, trailers) to the external processor (gRPC server).

The external processor is responsible for deciding how to mutate or interrupt the request/response (such as modifying a header, altering the body, or even sending an immediate response with a specific status code), which is then streamed back to Envoy to be applied before propagating the request/response.

In other words, a gRPC application provides mutations that enable you to shape or interrupt requests and responses according to your needs.

  • 1 — The client sends a request to an application endpoint.
  • 2 — The Istio ingress gateway, which uses the Envoy ext-proc filter, intercepts the request and streams its parts (headers, body, trailers) to an external processor through a BiDi gRPC.
  • 3 — The external processor decides to stream back request mutations to be applied by Envoy before propagating the request to the application.
  • 4 — The mutated request is sent to the application.
  • 5 — After processing, the application sends a response.
  • 6 — The Istio Ingress Gateway (with ext-proc filter) intercepts the response and streams response parts (headers, body, trailers) using the same BiDi gRPC connection, which is kept alive.
  • 7 — The external processor decides to stream back response mutations, which Envoy will apply before propagating the response to the client.
  • 8 — The mutated response is sent to the client.

You can find a simple ext-proc demo project (Envoy + Go external processor) on GitHub.

🛡️ Authentication mechanisms

AEGIS provides two authentication mechanisms:

  • Classic OAuth2 authentication (for APIs)
  • Token handler OAuth2 authentication (cookie-based, for web applications)

🔒 Classic OAuth2 authentication

This authentication mechanism consists of using the well-known and robust OAuth2 standard.

This is mainly designed to protect backend application API endpoints in a “classical” way by using JWT access tokens granted by the AEGIS auth server during OAuth2 flows.

Workflow

  • 1 — The client performs an OAuth2 flow against the AEGIS authentication server to obtain a JWT access token.
  • 2 — The client performs a call to an application 1 endpoint, providing the JWT access token as a request authorization header.
  • 3 — The Istio ingress gateway, with Envoy ext-proc filter, intercepts the request and streams the request headers, including the JWT access token, to the AEGIS auth proxy using a BiDi gRPC.
  • 4 — The AEGIS auth proxy attempts to validate the JWT access token (using cached JWKS from AEGIS auth server). If validation fails, the auth proxy sends an order to the Envoy front proxy to reply immediately with a 401 (no application calls). If validation is successful, the auth proxy allows the request to continue through the Envoy filter chain.
  • 5 — The request is propagated to the application 1 pod, where the proxy sidecar JWT authn filter validates the JWT access token. If validation fails, the Envoy sidecar immediately replies with a 401 error and does not call application 1. If validation is successful, the request is forwarded to the application 1 container.
  • 6 — Application 1 makes an internal call to Application 2 and forwards the JWT access token as header. On the Application 2 pod, the proxy sidecar JWT authn filter will validate the JWT access token. If validation fails, the Envoy sidecar will immediately reply with a 401 error and no call to Application 2 will be made. If validation is successful, the request will be forwarded to the Application 2 container.
  • 7 — Application 1 finishes processing and sends a response.
  • 8 — The Istio ingress gateway propagates the response to the client (AEGIS auth proxy is inactive in this case).

Benefits

There are several benefits to using this mechanism:

  • It relies on the OAuth2 standard, making it greatly interoperable with external clients.
  • Backend applications are authentication agnostic, as they don’t have to issue, maintain, or validate tokens.
  • Backend applications just need to read an already validated JWT payload to get authentication context, such as token claims like user id.
  • If a request is invalid regarding authentication, backend applications won’t even be called since AEGIS will handle the 401 responses, allowing preservation of resources for valid requests on the backend side.
  • Backend applications are also protected from internal calls.

🍪 Token handler OAuth2 authentication

This authentication mechanism is actually the most interesting one of AEGIS.

It relies on the token handler pattern: web applications do not handle authentication concerns directly. Instead, they rely on an authentication agent placed on the platform side, which negotiates the OAuth2 flows on their behalf.

This mechanism uses one of the most common and robust ways to carry authentication data: cookies.

As a result, web applications receive security cookies that they just need to forward with each call to benefit from an authenticated context.

In AEGIS’s case, the security cookies are:

  • HTTP secure (requires HTTPS)
  • HTTP only (to prevent client-side scripting)
  • encrypted (server-side, using AES, by AEGIS auth proxy)
  • requiring the usage of CSRF tokens (as per OWASP recommendations) for non-GET, HEAD, or OPTIONS requests
  • containing JWT access tokens (as well as other contextual information) obtained from OAuth2 flows

Behind the scenes, this mechanism relies on the robustness of OAuth2, thanks to the AEGIS auth agent handling OAuth2 flows on behalf of the web application.

Workflows

Below are simplified details about some of the most relevant AEGIS token handler workflows.

Security cookie verification

AEGIS offers a guest mode, in which web applications are also protected by security cookies. However, these cookies are not linked to a specific user.

The security cookie verification workflow applies to all incoming requests, whether from a guest or user, that have security cookies and are destined for a protected application endpoint.

  • 1 — The browser calls a protected application 1 endpoint using a guest or user security cookie (and a CSRF token if non-GET, HEAD, or OPTIONS method).
  • 2 — The Istio ingress gateway, with the Envoy ext-proc filter, intercepts the request and streams the request headers to the AEGIS auth proxy (external processor), using a BiDi gRPC.
  • 3 — AEGIS auth proxy validates the CSRF token, decrypts the security cookie (AES private key), extracts from the cookie and validates the JWT access token using the AEGIS auth server JWKS to finally add the JWT access token to the Authorization request header. If any of these steps fail, an immediate 401 response is sent to the browser. Otherwise the request is propagated.
  • 4 — The request is propagated to the application 1 pod. Here, the proxy sidecar JWT authn filter validates the JWT access token. If validation fails, the sidecar immediately replies with a 401 error and does not call the application 1 container. If validation is successful, the request is forwarded to the application 1 container.
  • 5 — The application 1 makes an internal call to the application 2 and forwards the JWT access token header. On the Application 2 pod, the proxy sidecar JWT authn filter validates the JWT access token. If validation fails, the sidecar immediately replies with a 401 error and does not call the application 2 container. If validation is successful, the request is forwarded to the application 2 container.
  • 6 — The application 1 finishes processing and sends a response.
  • 7 — The Istio ingress gateway propagates the response to the client (AEGIS auth proxy is inactive in this case).

User login / logout

The AEGIS auth agent is available throughout the platform via Istio virtual services. This provides shared authentication endpoints at the platform level that any web application can use.

Backend applications are not involved in this process, which allows them to remain authentication-agnostic.

  • 1 — When a user logs in or out, the browser sends a request to the AEGIS authentication agent. This request includes the security cookie (either guest or user), the related CSRF token, and the user’s credentials (if attempting to log in).
  • 2 — The Istio ingress gateway, with Envoy ext-proc filter, intercepts the request and streams the request headers to the AEGIS auth proxy (external processor) using a BiDi gRPC.
  • 3 — AEGIS auth proxy validates the CSRF token if needed, decrypts the security cookie (AES private key), extracts from the cookie and validates the JWT access token using AEGIS auth server JWKS to finally add the JWT access token to the Authorization request header If any of these steps fail, an immediate 401 response is sent to the browser. Otherwise, the request is propagated.
  • 4 — The request is sent to the AEGIS auth agent, along with a validated JWT as request header.
  • 5 — The AEGIS auth agent requests a new user JWT access token from the AEGIS auth server using OAuth2. In case of user login, a user token is requested (using provided user credentials). In case of user logout, a guest token is requested.
  • 6 — The AEGIS auth agent generates a response with the new JWT in a dedicated response header.
  • 7 — The Istio ingress gateway, with the Envoy ext-proc filter, intercepts the response and streams the response headers to the AEGIS auth proxy using the same BiDi gRPC connection (which is kept alive).
  • 8 — The AEGIS auth proxy retrieves the new JWT, encrypts it using AES (private key owner) along with some other contextual data to create a new security cookie, and adds it to the response Set-Cookie header. A new CSRF token is also added in a dedicated header.
  • 9 — The Istio ingress gateway propagates the response to the browser with a new security cookie and a related CSRF token.

Benefits

There are several benefits to using this mechanism:

  • It relies on the common and robust usage of secured cookies (+ CSRF).
  • AEGIS auth agent endpoints are shared on platform level and can be used by any web applications.
  • Web applications are authentication agnostic: they do not need to fetch, store, or provide tokens on calls, but can instead just use the provided cookies.
  • Backend applications are authentication agnostic as well: they do not need to issue, maintain, or validate tokens. They only need to read the already validated JWT payload to get the authentication context (token claims like user ID, etc.).
  • Backend applications won’t even be called if a request is invalid regarding authentication. AEGIS will handle the 401 responses, allowing preservation of resources for valid requests on the backend side.
  • Backend applications are also protected from internal calls.

📕 Conclusion

AEGIS was designed to completely offload authentication concerns from both frontend and backend sides. It also avoids technical debt from maintaining several authentication systems

Since AEGIS operates at the platform level, its resources (auth proxy, agent, and server) are available to any application that runs on the platform service mesh, making them authentication agnostic and able to focus on their business logic.

In addition to classic JWT-based authentication, AEGIS uses cookies (via token handler), which is one of the most robust, secure, and battle-tested ways to hold authentication details. This allows AEGIS to rely on the robustness of OAuth2 behind the scenes.

Regarding performances, AEGIS takes only a few microseconds to a few milliseconds (including both request and response mutations) to validate regular security cookies. This makes it almost unnoticeable for our applications.

AEGIS is helping us in our technical growth, allowing us to focus on our main goal: producing value for our customers 🚀.

--

--