This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
SPEKTRA Edge IAM Service Design
Understanding the SPEKTRA Edge IAM service design.
Service iam.edgelq.com plays one of the central parts of the SPEKTRA Edge
platform, in charge of authentication and authorization. It enables
multi-tenant and multi-service environments. By default, Goten does not
come with authentication and authorization feature, only IAM fills this
hole and allows services to work with each other without trust.
IAM Concepts, Actors, Role management, and Tenant management should be
already known from the user guide, customizations, and basic usage of
Authenticator, Authorizer, and Authorization middleware, from the
developer guide. This document dives more into details about what is
happening in the unique components provided by the IAM service. It is
assumed reader can understand the general and standard structure of
the IAM codebase at this point, located in the directory iam
in
the SPEKTRA Edge repository.
We will focus here not so much on IAM service but in big part on what
IAM provides. Authenticator and Authorizer are modules provided by IAM
but are linked in during each server compilation. Therefore, each API
server of any backend service has these in its runtime.
1 - SPEKTRA Edge IAM Principals
Understanding the SPEKTRA Edge IAM principals.
In IAM, we identify two types of principals that can be uniquely
identified and authenticated:
They have very different authentication methods. First, IAM does not
manage users that much. The third party is responsible for actual Users'
management. When a user sends a request to a service, it provides
an authorization token with some claims. API servers must use the jwks
endpoint, which provides a json web key set
, to verify the signature
of the access token. Verification ensures we can trust the claims stored
in the token. Inside claims, we have more details like User unique
identifier, which we can use to extract User resource from IAM.
As of now, we use Auth0 3rd party service for users. It is creating and
signing access tokens that SPEKTRA Edge receives. They are giving us a jwks
endpoint from which we get public keys for verification. Token signing,
rotation, and user list management are all handled by Auth0, although IAM
has several methods where that connect to Auth0 for management purposes.
When a user joins the system, it is not an IAM that is notified first.
The request goes to Auth0 where data is created. Record in IAM is created
later on when the user starts interacting with SPEKTRA Edge. User resources
may get created in IAM during the first Authentication. It may also be
saved/updated when it gets RefreshUserFromIdToken
.
On the other side, ServiceAccounts are typically managed by SPEKTRA Edge,
or by any other entity that creates ServiceAccounts in IAM service. How
it is done:
-
ServiceAccount is created by some clients. Without ServiceAccountKey
though, it is not usable.
-
ServiceAccountKey is created by a client. Regarding public-private
key, there are 2 options:
-
Client generates both private and public key pair. It sends
CreateServiceAccountKey with a public key only, so IAM never
sees the private key. The client is fully responsible for
securing it. It is a recommended procedure.
-
It is possible to ask IAM to create a public-private key pair
during ServiceAccountKey creation. In this case, IAM saves only
the public key in the database, private key is returned in response.
In this case, still client is still fully responsible for securing it.
This method allows to skip generation only.
In summary, ServiceAccounts are the responsibility of the clients, who
need to secure private keys. Those private keys are then later used to
create access tokens. During authentication, the backend service will
grab the public key from IAM.
2 - SPEKTRA Edge IAM Authentication
Understanding the SPEKTRA Edge IAM authentication.
The module for authentication is in the SPEKTRA Edge repository, file
iam/auth/authenticator.go
. It also provides an Authentication
function (grpc_auth.AuthFunc
) that is passed to the grpc server.
If you see any main.go
of API Server runtime, you should find code
like:
grpcserver.NewGrpcServer(
authenticator.AuthFunc(),
commonCfg.GetGrpcServer(),
log,
)
This function is used by grpc interceptors, so authentication is done
outside any server middleware. As a result, the context object associated
with the request will contain the AuthToken
object, as defined in
iam/auth/types/auth_token.go
, before any server processes the call.
Authenticator Tasks
Authenticator uses HTTP headers to find out who is making a call.
The primary header in use is authorization
. It should contain
the Bearer <AccessToken>
value, and we want this token part.
For now, ignore x-goten-original-auth
and additional access tokens,
which will be described in distributed authorization process
section.
Usually, we expect just a single token in authorization
, based on which
authentication happens.
Authenticator delegates access token identification to a module called
AuthInfoProvider
. It returns the Principal
object, as defined in
iam/auth/types/principal.go
. Under the hood, it can be ServiceAccount,
User, or Anonymous.
We will dive into it in the AuthInfoProvider for authentication
section below, but for now let’s touch another important topic regarding
authentication.
When AuthInfoProvider returns principal data, Authenticator also needs
to validate all claims are as expected, see addKeySpecificClaims
.
Of these claims, the most important part is the audience. We need to
protect against cases, where someone gets an access token and tries
to pose as a given user in some other service. For this reason, look at
the expected audience for User:
claims.Audience = []string{a.cfg.AccessTokenAudience}
This AccessTokenAudience is equal to the audience specific to SPEKTRA Edge:
https://apis.edgelq.com
for example, configured for some of our services.
If this expected audience does not match what is actually in the claims,
it may be because someone got this token and uses this in different
services. It’s like us: we have access tokens from users, so if we knew
where else the user has an account, we could try to log in somewhere else.
To prevent issues, we check the audience. However, note that audience
is one global value for all SPEKTRA Edge platforms. So, one service on
SPEKTRA Edge can then connect to another service in SPEKTRA Edge and use
this access token, successfully posing as a user. As long as services on
SPEKTRA Edge can trust each other, it is not an issue. For untrusted third
party services, it may be a problem: if a user sends a request to them, they
potentially can take it and use it in other SPEKTRA Edge services. In the
future, we may provide additional claims, but it means that the user will
need to ask jwks provider for an access token for a specific SPEKTRA Edge
service, and perhaps for each of them, or groups of them.
In API Server config, see Authenticator settings, field accessTokenAudience
.
A bit easier situation is with ServiceAccounts, when they send a request
to service, the audience contains the endpoint of the specific service
they call. Therefore, if they send requests to devices, devices won’t
be able to send requests to let’s say applications. The problem may be
API keys, which are global for the whole SPEKTRA Edge, but it’s the user’s
choice to use this method, which was insisted as less “complicated”. It
should be fine if this API key is used strictly anyway.
In API Server config, see Authenticator settings, field
serviceAccountIdTokenAudiencePrefixes
. This is a list of prefixes
from which the audience can start.
AuthInfoProvider for Authentication
AuthInfoProvider is a common module for both authenticator and authorizer.
You can see it in the SPEKTRA Edge repo, file iam/auth/auth_info_provider.go
.
For authenticator, only one method counts: It is GetPrincipal.
Inside GetPrincipal
of AuthInfoProvider
we still don’t get the full
principal. The reason is, that getting principal is a tricky thing: if
AuthInfoProvider is running on the IAM Server, then it may use a local
database. If it is part of a different server, then it will need to ask
the IAM Server to give principal data. Since it can’t fully get principal,
it does what it can:
- First, we check the ID of the key from the authorization token.
- If the ID is equal to the one ServiceAccountKey ID Server instance
uses it means that it is requesting itself. Perhaps it is a controller
trying to connect to the Server instance. If this is the case, we just
return “us”. This is a helpful trick when a service is bootstrapping
for the first time. API Server may not be listening on the port, or
the database may have missing records.
- Mostly, however, AuthInfoProvider is one giant cache object and this
includes storage of principals. Caching principals locally, with some
long-term cache, significantly lowers pressure on IAM and reduces latencies.
AuthInfoProvider uses the PrincipalProvider
interface to get actual
instances. There are two providers:
- LocalPrincipalProvider in
iam/auth/internal/local_principal_provider.go
- RemotePrincipalProvider in
iam/auth/internal/remote_principal_provider.go
Local providers must be used only by IAM Servers, others must use
the remote option. Let’s start with the remote. If you check
GetPrincipal
of RemotePrincipalProvider
, you can see that it
just connects to the IAM service, and uses GetPrincipal
method, which
is defined in the API skeleton file. for ServiceAccount type, it however
first needs to fetch a project resource, to figure out in which regions
ServiceAccountKey is available.
It is worth mentioning, that services are not supposed to trust each other,
this also means IAM does not necessarily trust services requesting access
to user information, even if they have a token. Perhaps access to the
authorization token should be enough for IAM to return user information,
but in GetPrincipalRequest we also require information which service is
asking. IAM will validate if this service is allowed to see the given
principal.
You should jump into different parts of the code, see the GetPrincipal
implementation in file iam/server/v1/authorization/authorization_service.go
.
The file name may be a bit misleading, but this service has actions used
for both authentication and authorization, it may be worth moving to
a different API group, and making a deprecation of the current action
declaration. But it’s a side note for me to correct, generally.
Implementation of GetPrincipal will stay, so you should see what happens
under the hood.
In IAM Server, GetPrincipal uses PrincipalProvider
that it gets from
AuthInfoProvider! Therefore, AuthInfoProvider on a different server than
IAM will try to use cache - in case of a miss, it will ask the remote
PrincipalProvider. RemotePrincipalProvider will send GetPrincipalRequest
to IAM, which then checks LocalPrincipalProvider, so we will land in
LocalPrincipalProvider anyway.
Before jumping into LocalPrincipalProvider see the rest of the GetPrincipal
server implementation. Inside, we are checking User or ServiceAccount data,
and iterate over the metadata.services.allowed_services
slice. If it
contains the service provided in GetPrincipalRequest, it means that this
service is allowed to see the given principal, so we can just return it
safely. This field is automatically updated when User/ServiceAccount gets
access to a new service (or has access revoked). We work in this principle:
If a User/ServiceAccount is a participant in a service, they must be able
to see each other.
Now, you can jump into the GetPrincipal code for LocalPrincipalProvider
.
It has separate paths for users and service accounts, but generally, it is
similar. We are getting User or ServiceAccount from the local database
if possible (not all regions may have ServiceAccount). If we need to make
any “Save” (users!), it has to be on the primary region, because this is
where users are supposed to be saved.
Distributed Authorization Process
Imagine User X asked devices.edgelq.com to assign ServiceAccount to
some Device. It sends a request to the Devices service, with an
authorization
token containing a User X access token. Devices will
successfully authenticate and authorize the user. However, ServiceAccount
belongs to IAM, therefore devices.edgelq.com will need to ask
iam.edgelq.com to provide ServiceAccount. When devices.edgelq.com
sends a request to iam.edgelq.com, the header authorization
will not
have the access token of the user. It will have an access token of
ServiceAccount that is used by devices.edgelq.com. This will be
always true, the authorization
token must contain the token of the entity
sending the current request. However, devices.edgelq.com may store
original access token with x-goten-original-auth
header. It is an array
of tokens. In theory authorization
token also may have many, but it does
not work on all Ingresses.
In EnvRegistry we have Dial*
methods with and without the FCtx
suffix.
Those with suffixes copy and paste all HTTP headers with the x-
prefix.
They also copy authorization
into x-goten-original-auth
. If the latter
is already present, it will be appended. The current authorization
is
cleared and space for new is added.
It is up to the service to decide if they want to forward HTTP headers or
not. There is some work needed from EnvRegistry though, caller should be
able to customize what and how headers are passed from the current context
to the next, but for current needs it is sufficient.
Authorization in this context has issues with audience claim though, when
we forward authorization tokens to different service entirely, the audience
may not be the one we expect.
By default, we use just Dial without FCtx. We have two known cases where
it is used:
-
MultiRegion routing
It is when requests need to be provided to other regions or split
across many regions.
-
When Constraint Store sends EstablishReferences to another service
This is because we have references in saved resources to other
services. The problem here is that we assume Service may not be
allowed to establish references (lack of attach checks). The user
may have attach permissions though, so we send two authorization
tokens.
Of the two cases above, Authorization and Audience validation work well
for the first one, because we forward within service. EstablishReferences
is a more difficult topic, we will need probably to ensure that the Service
has always attach permissions, without relying on the user. We will need
however to refactor attach permissions, so there is just one per resource
type. With this, we need to fix conditions, so they can apply to attach
checks. Right now they simply don’t work.
3 - SPEKTRA Edge IAM Authorization
Understanding the SPEKTRA Edge IAM authorization.
Authorization happens in its dedicated server middleware, see any generated
one, like
https://github.com/cloudwan/edgelq/blob/main/devices/server/v1/device/device_service.pb.middleware.authorization.go.
As Authorization middleware is assumed to be after multi-region routing,
we can assume that IAM Service from local region holds all resources required
to execute authorization locally, specifically: RoleBindings, Roles,
Conditions.
Note that IAM itself does not execute any Authorization, Middleware is
generated for each service. We have an Authorizer
a module that is
compiled with all API servers for all services. What IAM provides is
a list of Roles, RoleBindings, and Conditions. Other services are
allowed to get them, but evaluation happens on the proper server side.
The authorizer rarely needs to ask IAM for any data, if possible, it is
I/O less. It relies on the RAM cache to store IAM resources internally.
Therefore, checks are evaluated typically fast. More problematic are
resource field conditions. If we have them, we will need to get current
resources from database. For attach permissions, we may need to fetch
them from other services.
Authorization middleware is generated per each method, but the pattern
is always the same:
- We create BulkPermissionCheck object, where we collect all permissions
we want to check for this action. It is defined in the
iam/auth/types/bulk_permission_check.go
file.
- Authorizer module, defined in
iam/auth/authorizer.go
file, checks if
passed BulkPermissionCheck is all good and authenticated user is authorized
for asked permissions. Some checks may be optional, like read checks
for specific fields.
When we collect permissions for BulkPermissionCheck, we add:
- Main permission for a method. Resource name (or parent) is taken from
the request object, as indicated by
requestPaths
in the specification,
or customized via proto file.
- If we have some writing request (like Create or Update), and we are
setting references to other resources, we need to add attach permission
checks. Resource names are taken from referenced objects, not
referencing the resource the user tries to write to.
- Optional read/set permissions if some resource fields are restricted.
For authorization object strings we pass either collection names or
specific resources.
Every permission must be accompanied by some resource or collection name
(parent). Refer to the IAM user specification. In this document, we map
specifications to code and explain details.
Within the Authorizer module, defined in the iam/auth/authorizer.go
,
we are splitting all checks by main IAM scopes it recognizes: Service,
Organization, Project, or System. Next, we delegate permission checks
to AuthInfoProvider
. It generates a list of PermissionGrantResult
relevant for all PermissionCheck
instances. The relationship between
these two types is many-to-many. A single Grant (assigned via RoleBinding)
can hold multiple permissions, and a user may have many RoleBindings, each
with different Grants: More than one Grant may be giving access to
the same permission.
If AuthInfoProvider notices that some PermissionCheck has unconditional
PermissionGrantResult, it skips the rest. However, if there are conditions
attached, there is a possibility that some will fail while others succeed.
It makes a reason why we need multiple PermissionGrantResult per single
PermissionCheck, if at least one is successful, then PermissionCheck passes.
It works like an OR operator. Conditions in a single PermissionGrantResult
must be evaluated positively.
Therefore, once AuthInfoProvider matches PermissionGrantResult
instances
with PermissionCheck
ones, we must evaluate conditions (if any). One
popular condition type we use is ResourceFieldCondition
. To evaluate
this kind, we fetch resources from the local database, other services,
and other regions. To facilitate this check as much as possible,
the authorizer iterates through all possible conditions and collects
all resources it needs to fetch. It fetches in bulk, connecting to
other services if necessary (attach permissions cases). For this reason,
we put a reference field to the PermissionCheck
object, it will contain
resolved resources, so all conditions may have easy access to it in case
they need it. If the service receives a PermissionDenied error when
checking other services, then PermissionDenied is forwarded to the user
with information that the service cannot see resources itself. It may
indicate an issue with missing the metadata.services.allowed_services
field.
On their own, conditions are simple, they execute fast, without any I/O
work. We just check requests/resolved resources and verify whether
specified conditions apply, according to IAM Role Grant conditions.
AuthInfoProvider for Authorizer
AuthInfoProvider gets only a set of checks grouped by IAM Scope
(A project, an organization, a service, or a system if none of
the before). As per IAM specification, the service scope inherits
all RoleBindings from the project that owns the service. If we need
to validate permissions in the project scope, we must also accept
RoleBindings from the parent organization (if set), and full ancestry
path. RoleBindings in system scope are valid in all scopes. Moreover,
even the principal may have multiple member IDs (native one with email,
then domain, then allAuthenticatedUsers, allUsers). This creates
lots of potential RoleBindings to check. Furthermore, we should be
aware that Authorizer is part of all API servers! As SPEKTRA Edge provides
a framework for building 3rd party services, they can’t trust each
other. Therefore, AuthInfoProvider of any service it runs on can only
ask for RoleBindings that it is allowed to see (according to
metadata.services.allowed_services
).
IAM Controller is copying organization-level RoleBindings to child
sub-organizations and projects, but we don’t copy (at least yet)
RoleBindings from service project to a service. We also don’t copy
system-level RoleBindings to all existing projects and organizations.
It should typically stay that way, because system-level role bindings
are rather internal, and should not leak to organization/project admins.
The module for copying RoleBindings is in file
iam/controller/v1/iam_scope/org_rbs_copier.go
. It also handles changes
in the parent organization field.
During authorization, AuthInfoProvider must list and fetch all
RoleBindings per each memberId/IAM Scope combination. It must
also only fetch role bindings relevant to the current service.
We first try to get from the local cache, in case of a miss,
we ask IAM. This is why in CheckPermissions
we grab all possible
RoleBindings. We filter out RoleBindings by subScope or role ID
later on. We try to strip all unnecessary fields, to ensure
AuthInfoProvider can hold (RAM-based cache!) as much data as possible.
Additionally, we try to use integer identifiers for roles and
permission names.
To hold RoleBindings per member ID, we may need like, two KiBs
of data on average. If we cache principal, let’s say four. Using
one MiB we could hold data for 256 principals. 256 MiB can hold then
65K of principals. Let’s divide by two for a safety margin. As
a result, we can expect 256 MiB to hold tens of thousands of active
users. This is why AuthInfoProvider caches all RoleBindings principal
can have in each scope. We extract data from IAM only when the cache
expires, for new principals, or when the server starts up for the first
time. This is why GetAssignments
(method of RoleBindings store) is
looking like it looks.
When we have all RoleBindings for relevant members and relevant IAM scope,
then we can iterate PermissionCheck (object + permission) against all
assignments. If many assignments match the given PermissionCheck, then
PermissionCheck will have multiple Results
(variable).
RoleBindings (converted to RoleAssignment
for slimmer RAM usage) are
matched with permissions if:
- they have
owned_objects
which match the object name in the
PermissionCheck
.
- if the above fails, we check if the Role pointed by RoleBinding
has any Grants containing permissions specified in
PermissionCheck
.
- if there are any Grants, we need to check if subScope matches
(if it is specified). PermissionCheck contains iam scope and sub-scope
forming a full object name. It allows us to have granularity on
specific resources.
- if we find a Grant matching PermissionCheck, we store it in Results,
note Grant can carry conditions, but we haven’t evaluated them yet.
Thanks to the cache, I/O work by AuthInfoProvider is practically
non-existent, typically it can quickly provide list of assigned
permissions with a list of conditions.
ConditionChecker for Authorizer
Each PermissionCheck can have multiple results, which can contribute to
allowed Permissions. If the result item has no conditions, then we can
assume permissions are granted. If it has, then all conditions must be
evaluated successfully, so we iterate in the Authorizer code.
ConditionChecker is implemented in file
iam/auth/internal/condition_checker.go
. We have 3 condition types:
- checking by resource field, function
checkByResourceField
- checking by request field, function
checkByRequestField
- checking by CEL condition, function
checkByCELCondition
(will be retired though).
Resource conditions are the most popular, and for good reason, they are
simple and can handle at least full CRUD, and often custom functions too.
For example, suppose we want to assign certain users access to devices
if the field path satisfies metadata.annotations.key = value
:
- CreateDeviceRequest will be forbidden if this field path with a given
value is not specified in the resource body.
- UpdateDeviceRequest will be forbidden if we are trying to update this
field path to a different value or if the current resource stored in
the database does not match.
- DeleteDeviceRequest checks if the Device in the database matches.
- Get/BatchGetDevice(s) are extracted from the database and the condition
is checked
- WatchDevice also is checked when the stream starts, we grab resources
from the database and evaluate them.
- ListDevices and WatchDevices have a Filter field, so we don’t need to
grab anything from DB.
- If there are custom methods, we can still get resources from DB and
check if the field path is fine.
We also support attach permissions with resource field conditions, if
necessary, we fetch resources from other services. Fetching is done
before condition evaluations.
A smaller weakness is the need to have extra checks in the database.
The object may be stored in Redis though, giving perhaps a faster
answer, but still goes through the network stack. Perhaps another
RAM-based cache can be used for storage, but invalidation may be
a problem if we want to include List queries. For resource updates,
we need to invalidate the previous and new state, and Firestore watch
shows us only the new state. Mongo may be more beneficial in this case,
especially if we consider the fact that it has active watches for all
collections (!!!). It may work for collections especially non-frequently
updated.
Checks by request are simpler and aimed at custom methods typically.
Checks by CEL condition are so far being less and less used in v1,
but may still have some special use cases if yaml (protobuf) declaration
is not enough. They use conditions with bodies specified in the
iam.edgelq.com/Condition resource. ConditionChecker uses
AuthInfoProvider to grab Conditions from IAM.
4 - SPEKTRA Edge IAM Cache Invalidation
Understanding the SPEKTRA Edge IAM cache invalidation.
AuthInfoProvider relies on RAM cache for low latency processing.
The problem is with invalidation. To achieve a long-living cache,
we need real-time invalidation straight from the database.
This is why each “store” module in AuthInfoProvider has one or more
goroutines using real-time watch. When some object is updated, we may
need to update/invalidate the cache. In case of prolonged broken access
to IAM, it will invalidate the whole cache and retry.
Invalidation of principals is done using the WatchPrincipals method.
This allows IAM to ensure that only selected (allowed) principals are
seen by a service.
5 - SPEKTRA Edge Multi-Service Authorization
Understanding the SPEKTRA Edge multi-service authorization.
Main authorization happens when the user sends a request to a service,
the authorization is located on the front. However, sometimes a service
executing a request needs to send the next requests to other services.
One often example is EstablishReferences
call in Schema Mixin service.
It is assumed that services don’t trust each other, and it shows here
too. Even if let’s say device service allows UpdateDevice, then IAM
needs to check on its own if UpdateDevice can update the reference to
field spec.service_account
(field in Device resource, pointing as
ServiceAccount from IAM). We are using the fact that cross-region and
cross-service references establishment require a call to
EstablishReferences
.
We have even special authorization for that: see file
mixins/schema/server/v1/resource_shadow/resource_shadow_service_custom_auth.go
.
In this file, we check referenced resources and try to see if this is
allowed for service-making calls, or from the user originally making
the request. In the future, we may opt-out from the original user, and
require that the service has access to referenced resources.
It typically should be the case, ServiceAccount pointed by Device should
be owned by devices (metadata.services.owning_service
). The same goes
for logging or monitoring buckets. We may need proper permission attach
checks for resources first, and support for resource field conditions!
Other than that, service-to-service subsequent calls are treated
separately, and service verifies a service.
6 - SPEKTRA Edge E-mail Sender
Understanding the SPEKTRA Edge e-mail sender system.
Another 3rd party service we use apart from Auth0 is Sendgrid. You should
see its config in iam/config/apiserver.proto
. It is a second service
for emails, Auth0 itself is used for emails too, like verification accounts.
After all, Users are stored in the Auth0 service, IAM just gets copies.
However, invitations (ProjectInvitations and OrganizationInvitations) are
sent using Sendgrid. See iam/invitationpublisher
directory.
7 - SPEKTRA Edge Multi-Service Environment Safety
Understanding the SPEKTRA Edge multi-service environment safety.
IAM needs to ensure safety not only between tenants (Organizations,
Projects) but also between Services. For this reason RoleBindings are
also scoped per Service. There is however a problem with services we need
to solve:
- Organizations and Projects can enable services they use, and if
they do, they should be able to use these Services. IAM must
ensure that the organization/project admin cannot enable services
if they don’t have permission to. IAM must ensure the service does
not have access to organizations/projects which don’t enable a particular
service.
- If the Organization or Project enables Service, then the Service should
be able to access the Project/Organization.
- Ideally, the Service should be able to freely access all resources
a Project or Organization has, as long as those resources are defined
by that service. For example, service devices.edgelq.com must always
be able to access any instance of devices.edgelq.com/Device.
However, other services should have limited access to
devices.edgelq.com/Device collection. It’s called limited access
to Projects/Organizations: Project/org admins should be able to
regulate which resources are accessible.
- Some Services may be “private”.
- If services are using/importing each other, they need some limited
access to each other.
Private services are protected by attach permissions, so project/org
admins can’t just enable any Service, this requires updating the list
of references of Services after all.
Those things are fixed in IAM fixtures, see iam/fixtures/v1/iam_roles.yaml
.
First, see this role: services/iam.edgelq.com/roles/service-user
. This
gives access to Service data and the ability to attach it. If the project
admin is granted this role in a service scope, they can enable that service.
Then, the next thing, Service should be able to access all relevant
resources projects or organizations have, without specifying the exact
instance. This is why we have the IAM role
services/iam.edgelq.com/roles/base-edgelq-service
, which grants access
to all resources across orgs/projects, as long as certain conditions are
met. Note that we don’t give any create permissions, it would be wrong,
because the Service could start creating resources with the proper
metadata.services
field, without checking if the project/org even uses
the service. It is not an issue for non-creating permissions. To allow
services creating project/organization scope resources, we have
services/iam.edgelq.com/roles/service-to-project-access
and
services/iam.edgelq.com/roles/service-to-org-access
roles. RoleBindings
for these roles are created dynamically by the IAM controller when
the Project/Organization enables some service. This code is located
in iam/controller/v1/iam_scope
.
We also need to regulate service-to-service access. By default, this is
not allowed. However, if one service imports or uses another, we enable
their access to each other. Roles for these scenarios are in
iam/fixtures/v1/per_service_roles.yaml
. Roles with ID
importing-service-access
and imported-service-access
are granted to
importing and imported service, but note it is not symmetrical. It does
not need to be. For example, if one service imports another, then
EstablishReferences is only needed in one direction. Roles with ID
service-to-service-std-access
are used for minimal standard access.
All those RoleBindings regulating access between services, and between
services with projects/organizations, are called “Service RoleBindings”.
They are dynamically created by the IAM Controller when a service is
created/updated, or when an organization/project enables some service.
The module responsible for these RoleBindings is in file
iam/controller/v1/iam_scope/service_rbs_syncer.go
:
- makeDesiredRbsForSvc computes desired Service RoleBindings per
each Service.
- makeDesiredRbsForOrg computes desired Service RoleBindings per
each Organization.
- makeDesiredRbsForProject computes desired Service RoleBindings
per each Project.
Note the convention for mixin services, each service has its copy of
them, like services/<service>/permissions/resourceShadows.listMetaOwnees
.
This is because all services have their schema mixins. Note that
RoleBindings for those “per service roles” are located on the root scope,
see function makeDesiredRbsForSvc
in file
iam/controller/v1/iam_scope/service_rbs_syncer.go
. The reason is that
ResourceShadow is a “root” resource (name pattern is
resourceShadows/{resourceShadow}
, not something like
services/{service}/resourceShadows/{resourceShadow}
). Perhaps it could
have been like this, but it is some continuity from v1alpha2. Also, CLI
commands would become less intuitive. In order then to enable per-service
access, permissions are per-service. If we create
services/<service>/permissions/resourceShadows.listMetaOwnees
per service,
and create root scope RoleBinding containing this permission, in effect
it will be granted for specific services only, not for all.
8 - SPEKTRA Edge IAM Principal Tracking
Understanding the SPEKTRA Edge principal tracking.
ServiceAccounts are project-scoped resources, but in theory, they can be
granted roles in other projects and organizations too. Users are, in IAM
terms, global resources, not necessarily bound to any organizational entity.
They can however join any project or organization.
Members (Users or ServiceAccounts) are associated with
projects/organizations via RoleBinding resources. Organizational role
bindings are copied to downstream child projects/organizations by
the IAM Controller (iam/controller/v1/iam_scope/org_rbs_copier.go
).
If you visit iam/server/v1/role_binding/role_binding_service.go
, you
should note that, for each written/deleted RoleBinding we are managing
MemberAssignment resource. See iam/proto/v1/member_assignment.proto
for more details, it has described the role.
Generally, though, one instance of MemberAssignment is created per each
scope/member combination. This internal resource facilitates tracking
of members in organizational entities.
Members can see a list of their projects/organizations via
ListMyProjects/ListMyOrganization calls. To make such calls possible,
we needed to use MemberAssignment helper collection, we copy also many
project/organization fields directly to MemberAssignment instances.
Therefore, projects/organizations filter/orderBy/fieldMask/cursor objects
can be mostly translated to MemberAssignment ones. To make it work,
MemberAssignment is a regional, but globally synced resource (its copies
are spread through all IAM regions, period). Regional status ensures that
each region is responsible for tracking members in local
organizations/projects individually. IamDbController syncs all created
copies across all regions, so each region knows the full list of
projects/organizations where the given member participates.
In case project/organization fields change (like title), the IAM Controller
is responsible for propagating change to all MemberAssignment instances.
Implementation is in file iam/controller/v1/mem_assignments_meta_updater.go
.
9 - SPEKTRA Edge Principal Service Access
Understanding the SPEKTRA Edge principal service access.
RoleBinding is not only binding User/ServiceAccount with
a project/organization. It also binds a member with a service. For
example, a devices manager Role in project X would bind a given member
not only with project X but also with the devices.edgelq.com service
(and applications & secrets, since those are related). Each Role has
a populated field metadata.services
, which points to services where
Role is relevant. RoleBinding also has metadata.services
populated,
and it contains combined services from a Role and a parent object
(organization, project, or service).
When RoleBinding is created, IAM internally creates a MemberAssignment
instance per each unique combination of member/scope, and this
MemberAssignment will have a “scope” field pointing to the project or
organization. However, there is something more to it. IAM will also
create additional MemberAssignment objects where the “scope” field points
to a Service! Those Service-level MemberAssignment instances are used
to track in which services the given Member (User or ServiceAccount)
is participating.
IAM Controller has a dedicated module
(iam/controller/v1/iam_scope/service_users_updater.go
), which ensures
that the field metadata.services
is in sync (for Users, ServiceAccounts,
and ServiceAccountKeys). It does it by watching MemberAssignment changes
and making “summary” of services in use. If it notices that some
user/service account has inaccurate data, it will issue an update request
to IAM. Each region of IAM is responsible for watching local members,
but they will access all MemberAssignment instances since those are
synced globally.
Making sure the field metadata.services
of all Users/ServiceAccounts is
synced has double functionality:
- It ensures that a given member can access Service-related data.
- It ensures that the given Service can access member data (via GetPrincipal).
If you check the file iam/fixtures/v1/iam_role_bindings.yaml
, you should
notice special RoleBinding roleBindings/services-participant
. It is
a root-level RoleBinding given to all authenticated members, granting
services/iam.edgelq.com/roles/selected-service-user
role. This role is
a multi-service one. If you see its contents in
iam/fixtures/v1/iam_roles.yaml
, you should see it gives many read/attach
permissions to a holder in a specified list of services. In the RoleBinding
yaml definition, note that this list of services comes from a principal
metadata object! This is how principals get automatic access to a Service.
Role services/iam.edgelq.com/roles/selected-service-user
is similar to
services/iam.edgelq.com/roles/selected-user
. The latter one should be
used on the service level to provide access to that single service, to
someone who has no other access there. The former has an internal purpose,
gives access to many services at once, and will only be assigned to
members who already have some access to specified services. It just ensures
they can access service meta-information.