SPEKTRA Edge IAM Authentication
The module for authentication is in the SPEKTRA Edge repository, file
iam/auth/authenticator.go
. It also provides an Authentication
function (grpc_auth.AuthFunc
) that is passed to the grpc server.
If you see any main.go
of API Server runtime, you should find code
like:
grpcserver.NewGrpcServer(
authenticator.AuthFunc(),
commonCfg.GetGrpcServer(),
log,
)
This function is used by grpc interceptors, so authentication is done
outside any server middleware. As a result, the context object associated
with the request will contain the AuthToken
object, as defined in
iam/auth/types/auth_token.go
, before any server processes the call.
Authenticator Tasks
Authenticator uses HTTP headers to find out who is making a call.
The primary header in use is authorization
. It should contain
the Bearer <AccessToken>
value, and we want this token part.
For now, ignore x-goten-original-auth
and additional access tokens,
which will be described in distributed authorization process
section.
Usually, we expect just a single token in authorization
, based on which
authentication happens.
Authenticator delegates access token identification to a module called
AuthInfoProvider
. It returns the Principal
object, as defined in
iam/auth/types/principal.go
. Under the hood, it can be ServiceAccount,
User, or Anonymous.
We will dive into it in the AuthInfoProvider for authentication section below, but for now let’s touch another important topic regarding authentication.
When AuthInfoProvider returns principal data, Authenticator also needs
to validate all claims are as expected, see addKeySpecificClaims
.
Of these claims, the most important part is the audience. We need to
protect against cases, where someone gets an access token and tries
to pose as a given user in some other service. For this reason, look at
the expected audience for User:
claims.Audience = []string{a.cfg.AccessTokenAudience}
This AccessTokenAudience is equal to the audience specific to SPEKTRA Edge:
https://apis.edgelq.com
for example, configured for some of our services.
If this expected audience does not match what is actually in the claims,
it may be because someone got this token and uses this in different
services. It’s like us: we have access tokens from users, so if we knew
where else the user has an account, we could try to log in somewhere else.
To prevent issues, we check the audience. However, note that audience
is one global value for all SPEKTRA Edge platforms. So, one service on
SPEKTRA Edge can then connect to another service in SPEKTRA Edge and use
this access token, successfully posing as a user. As long as services on
SPEKTRA Edge can trust each other, it is not an issue. For untrusted third
party services, it may be a problem: if a user sends a request to them, they
potentially can take it and use it in other SPEKTRA Edge services. In the
future, we may provide additional claims, but it means that the user will
need to ask jwks provider for an access token for a specific SPEKTRA Edge
service, and perhaps for each of them, or groups of them.
In API Server config, see Authenticator settings, field accessTokenAudience
.
A bit easier situation is with ServiceAccounts, when they send a request to service, the audience contains the endpoint of the specific service they call. Therefore, if they send requests to devices, devices won’t be able to send requests to let’s say applications. The problem may be API keys, which are global for the whole SPEKTRA Edge, but it’s the user’s choice to use this method, which was insisted as less “complicated”. It should be fine if this API key is used strictly anyway.
In API Server config, see Authenticator settings, field
serviceAccountIdTokenAudiencePrefixes
. This is a list of prefixes
from which the audience can start.
AuthInfoProvider for Authentication
AuthInfoProvider is a common module for both authenticator and authorizer.
You can see it in the SPEKTRA Edge repo, file iam/auth/auth_info_provider.go
.
For authenticator, only one method counts: It is GetPrincipal.
Inside GetPrincipal
of AuthInfoProvider
we still don’t get the full
principal. The reason is, that getting principal is a tricky thing: if
AuthInfoProvider is running on the IAM Server, then it may use a local
database. If it is part of a different server, then it will need to ask
the IAM Server to give principal data. Since it can’t fully get principal,
it does what it can:
- First, we check the ID of the key from the authorization token.
- If the ID is equal to the one ServiceAccountKey ID Server instance uses it means that it is requesting itself. Perhaps it is a controller trying to connect to the Server instance. If this is the case, we just return “us”. This is a helpful trick when a service is bootstrapping for the first time. API Server may not be listening on the port, or the database may have missing records.
- Mostly, however, AuthInfoProvider is one giant cache object and this includes storage of principals. Caching principals locally, with some long-term cache, significantly lowers pressure on IAM and reduces latencies.
AuthInfoProvider uses the PrincipalProvider
interface to get actual
instances. There are two providers:
- LocalPrincipalProvider in
iam/auth/internal/local_principal_provider.go
- RemotePrincipalProvider in
iam/auth/internal/remote_principal_provider.go
Local providers must be used only by IAM Servers, others must use
the remote option. Let’s start with the remote. If you check
GetPrincipal
of RemotePrincipalProvider
, you can see that it
just connects to the IAM service, and uses GetPrincipal
method, which
is defined in the API skeleton file. for ServiceAccount type, it however
first needs to fetch a project resource, to figure out in which regions
ServiceAccountKey is available.
It is worth mentioning, that services are not supposed to trust each other, this also means IAM does not necessarily trust services requesting access to user information, even if they have a token. Perhaps access to the authorization token should be enough for IAM to return user information, but in GetPrincipalRequest we also require information which service is asking. IAM will validate if this service is allowed to see the given principal.
You should jump into different parts of the code, see the GetPrincipal
implementation in file iam/server/v1/authorization/authorization_service.go
.
The file name may be a bit misleading, but this service has actions used
for both authentication and authorization, it may be worth moving to
a different API group, and making a deprecation of the current action
declaration. But it’s a side note for me to correct, generally.
Implementation of GetPrincipal will stay, so you should see what happens under the hood.
In IAM Server, GetPrincipal uses PrincipalProvider
that it gets from
AuthInfoProvider! Therefore, AuthInfoProvider on a different server than
IAM will try to use cache - in case of a miss, it will ask the remote
PrincipalProvider. RemotePrincipalProvider will send GetPrincipalRequest
to IAM, which then checks LocalPrincipalProvider, so we will land in
LocalPrincipalProvider anyway.
Before jumping into LocalPrincipalProvider see the rest of the GetPrincipal
server implementation. Inside, we are checking User or ServiceAccount data,
and iterate over the metadata.services.allowed_services
slice. If it
contains the service provided in GetPrincipalRequest, it means that this
service is allowed to see the given principal, so we can just return it
safely. This field is automatically updated when User/ServiceAccount gets
access to a new service (or has access revoked). We work in this principle:
If a User/ServiceAccount is a participant in a service, they must be able
to see each other.
Now, you can jump into the GetPrincipal code for LocalPrincipalProvider
.
It has separate paths for users and service accounts, but generally, it is
similar. We are getting User or ServiceAccount from the local database
if possible (not all regions may have ServiceAccount). If we need to make
any “Save” (users!), it has to be on the primary region, because this is
where users are supposed to be saved.
Distributed Authorization Process
Imagine User X asked devices.edgelq.com to assign ServiceAccount to
some Device. It sends a request to the Devices service, with an
authorization
token containing a User X access token. Devices will
successfully authenticate and authorize the user. However, ServiceAccount
belongs to IAM, therefore devices.edgelq.com will need to ask
iam.edgelq.com to provide ServiceAccount. When devices.edgelq.com
sends a request to iam.edgelq.com, the header authorization
will not
have the access token of the user. It will have an access token of
ServiceAccount that is used by devices.edgelq.com. This will be
always true, the authorization
token must contain the token of the entity
sending the current request. However, devices.edgelq.com may store
original access token with x-goten-original-auth
header. It is an array
of tokens. In theory authorization
token also may have many, but it does
not work on all Ingresses.
In EnvRegistry we have Dial*
methods with and without the FCtx
suffix.
Those with suffixes copy and paste all HTTP headers with the x-
prefix.
They also copy authorization
into x-goten-original-auth
. If the latter
is already present, it will be appended. The current authorization
is
cleared and space for new is added.
It is up to the service to decide if they want to forward HTTP headers or not. There is some work needed from EnvRegistry though, caller should be able to customize what and how headers are passed from the current context to the next, but for current needs it is sufficient.
Authorization in this context has issues with audience claim though, when we forward authorization tokens to different service entirely, the audience may not be the one we expect.
By default, we use just Dial without FCtx. We have two known cases where it is used:
-
MultiRegion routing
It is when requests need to be provided to other regions or split across many regions.
-
When Constraint Store sends EstablishReferences to another service
This is because we have references in saved resources to other services. The problem here is that we assume Service may not be allowed to establish references (lack of attach checks). The user may have attach permissions though, so we send two authorization tokens.
Of the two cases above, Authorization and Audience validation work well for the first one, because we forward within service. EstablishReferences is a more difficult topic, we will need probably to ensure that the Service has always attach permissions, without relying on the user. We will need however to refactor attach permissions, so there is just one per resource type. With this, we need to fix conditions, so they can apply to attach checks. Right now they simply don’t work.