SPEKTRA Edge IAM Authentication

Understanding the SPEKTRA Edge IAM authentication.

The module for authentication is in the SPEKTRA Edge repository, file iam/auth/authenticator.go. It also provides an Authentication function (grpc_auth.AuthFunc) that is passed to the grpc server. If you see any main.go of API Server runtime, you should find code like:

grpcserver.NewGrpcServer(
  authenticator.AuthFunc(),
  commonCfg.GetGrpcServer(),
  log,
)

This function is used by grpc interceptors, so authentication is done outside any server middleware. As a result, the context object associated with the request will contain the AuthToken object, as defined in iam/auth/types/auth_token.go, before any server processes the call.

Authenticator Tasks

Authenticator uses HTTP headers to find out who is making a call. The primary header in use is authorization. It should contain the Bearer <AccessToken> value, and we want this token part.

For now, ignore x-goten-original-auth and additional access tokens, which will be described in distributed authorization process section.

Usually, we expect just a single token in authorization, based on which authentication happens.

Authenticator delegates access token identification to a module called AuthInfoProvider. It returns the Principal object, as defined in iam/auth/types/principal.go. Under the hood, it can be ServiceAccount, User, or Anonymous.

We will dive into it in the AuthInfoProvider for authentication section below, but for now let’s touch another important topic regarding authentication.

When AuthInfoProvider returns principal data, Authenticator also needs to validate all claims are as expected, see addKeySpecificClaims. Of these claims, the most important part is the audience. We need to protect against cases, where someone gets an access token and tries to pose as a given user in some other service. For this reason, look at the expected audience for User:

claims.Audience = []string{a.cfg.AccessTokenAudience}

This AccessTokenAudience is equal to the audience specific to SPEKTRA Edge: https://apis.edgelq.com for example, configured for some of our services. If this expected audience does not match what is actually in the claims, it may be because someone got this token and uses this in different services. It’s like us: we have access tokens from users, so if we knew where else the user has an account, we could try to log in somewhere else. To prevent issues, we check the audience. However, note that audience is one global value for all SPEKTRA Edge platforms. So, one service on SPEKTRA Edge can then connect to another service in SPEKTRA Edge and use this access token, successfully posing as a user. As long as services on SPEKTRA Edge can trust each other, it is not an issue. For untrusted third party services, it may be a problem: if a user sends a request to them, they potentially can take it and use it in other SPEKTRA Edge services. In the future, we may provide additional claims, but it means that the user will need to ask jwks provider for an access token for a specific SPEKTRA Edge service, and perhaps for each of them, or groups of them.

In API Server config, see Authenticator settings, field accessTokenAudience.

A bit easier situation is with ServiceAccounts, when they send a request to service, the audience contains the endpoint of the specific service they call. Therefore, if they send requests to devices, devices won’t be able to send requests to let’s say applications. The problem may be API keys, which are global for the whole SPEKTRA Edge, but it’s the user’s choice to use this method, which was insisted as less “complicated”. It should be fine if this API key is used strictly anyway.

In API Server config, see Authenticator settings, field serviceAccountIdTokenAudiencePrefixes. This is a list of prefixes from which the audience can start.

AuthInfoProvider for Authentication

AuthInfoProvider is a common module for both authenticator and authorizer. You can see it in the SPEKTRA Edge repo, file iam/auth/auth_info_provider.go. For authenticator, only one method counts: It is GetPrincipal.

Inside GetPrincipal of AuthInfoProvider we still don’t get the full principal. The reason is, that getting principal is a tricky thing: if AuthInfoProvider is running on the IAM Server, then it may use a local database. If it is part of a different server, then it will need to ask the IAM Server to give principal data. Since it can’t fully get principal, it does what it can:

  • First, we check the ID of the key from the authorization token.
  • If the ID is equal to the one ServiceAccountKey ID Server instance uses it means that it is requesting itself. Perhaps it is a controller trying to connect to the Server instance. If this is the case, we just return “us”. This is a helpful trick when a service is bootstrapping for the first time. API Server may not be listening on the port, or the database may have missing records.
  • Mostly, however, AuthInfoProvider is one giant cache object and this includes storage of principals. Caching principals locally, with some long-term cache, significantly lowers pressure on IAM and reduces latencies.

AuthInfoProvider uses the PrincipalProvider interface to get actual instances. There are two providers:

  • LocalPrincipalProvider in iam/auth/internal/local_principal_provider.go
  • RemotePrincipalProvider in iam/auth/internal/remote_principal_provider.go

Local providers must be used only by IAM Servers, others must use the remote option. Let’s start with the remote. If you check GetPrincipal of RemotePrincipalProvider, you can see that it just connects to the IAM service, and uses GetPrincipal method, which is defined in the API skeleton file. for ServiceAccount type, it however first needs to fetch a project resource, to figure out in which regions ServiceAccountKey is available.

It is worth mentioning, that services are not supposed to trust each other, this also means IAM does not necessarily trust services requesting access to user information, even if they have a token. Perhaps access to the authorization token should be enough for IAM to return user information, but in GetPrincipalRequest we also require information which service is asking. IAM will validate if this service is allowed to see the given principal.

You should jump into different parts of the code, see the GetPrincipal implementation in file iam/server/v1/authorization/authorization_service.go. The file name may be a bit misleading, but this service has actions used for both authentication and authorization, it may be worth moving to a different API group, and making a deprecation of the current action declaration. But it’s a side note for me to correct, generally.

Implementation of GetPrincipal will stay, so you should see what happens under the hood.

In IAM Server, GetPrincipal uses PrincipalProvider that it gets from AuthInfoProvider! Therefore, AuthInfoProvider on a different server than IAM will try to use cache - in case of a miss, it will ask the remote PrincipalProvider. RemotePrincipalProvider will send GetPrincipalRequest to IAM, which then checks LocalPrincipalProvider, so we will land in LocalPrincipalProvider anyway.

Before jumping into LocalPrincipalProvider see the rest of the GetPrincipal server implementation. Inside, we are checking User or ServiceAccount data, and iterate over the metadata.services.allowed_services slice. If it contains the service provided in GetPrincipalRequest, it means that this service is allowed to see the given principal, so we can just return it safely. This field is automatically updated when User/ServiceAccount gets access to a new service (or has access revoked). We work in this principle: If a User/ServiceAccount is a participant in a service, they must be able to see each other.

Now, you can jump into the GetPrincipal code for LocalPrincipalProvider. It has separate paths for users and service accounts, but generally, it is similar. We are getting User or ServiceAccount from the local database if possible (not all regions may have ServiceAccount). If we need to make any “Save” (users!), it has to be on the primary region, because this is where users are supposed to be saved.

Distributed Authorization Process

Imagine User X asked devices.edgelq.com to assign ServiceAccount to some Device. It sends a request to the Devices service, with an authorization token containing a User X access token. Devices will successfully authenticate and authorize the user. However, ServiceAccount belongs to IAM, therefore devices.edgelq.com will need to ask iam.edgelq.com to provide ServiceAccount. When devices.edgelq.com sends a request to iam.edgelq.com, the header authorization will not have the access token of the user. It will have an access token of ServiceAccount that is used by devices.edgelq.com. This will be always true, the authorization token must contain the token of the entity sending the current request. However, devices.edgelq.com may store original access token with x-goten-original-auth header. It is an array of tokens. In theory authorization token also may have many, but it does not work on all Ingresses.

In EnvRegistry we have Dial* methods with and without the FCtx suffix. Those with suffixes copy and paste all HTTP headers with the x- prefix. They also copy authorization into x-goten-original-auth. If the latter is already present, it will be appended. The current authorization is cleared and space for new is added.

It is up to the service to decide if they want to forward HTTP headers or not. There is some work needed from EnvRegistry though, caller should be able to customize what and how headers are passed from the current context to the next, but for current needs it is sufficient.

Authorization in this context has issues with audience claim though, when we forward authorization tokens to different service entirely, the audience may not be the one we expect.

By default, we use just Dial without FCtx. We have two known cases where it is used:

  • MultiRegion routing

    It is when requests need to be provided to other regions or split across many regions.

  • When Constraint Store sends EstablishReferences to another service

    This is because we have references in saved resources to other services. The problem here is that we assume Service may not be allowed to establish references (lack of attach checks). The user may have attach permissions though, so we send two authorization tokens.

Of the two cases above, Authorization and Audience validation work well for the first one, because we forward within service. EstablishReferences is a more difficult topic, we will need probably to ensure that the Service has always attach permissions, without relying on the user. We will need however to refactor attach permissions, so there is just one per resource type. With this, we need to fix conditions, so they can apply to attach checks. Right now they simply don’t work.