SPEKTRA Edge IAM Authorization
Authorization happens in its dedicated server middleware, see any generated one, like https://github.com/cloudwan/edgelq/blob/main/devices/server/v1/device/device_service.pb.middleware.authorization.go.
As Authorization middleware is assumed to be after multi-region routing, we can assume that IAM Service from local region holds all resources required to execute authorization locally, specifically: RoleBindings, Roles, Conditions.
Note that IAM itself does not execute any Authorization, Middleware is
generated for each service. We have an Authorizer
a module that is
compiled with all API servers for all services. What IAM provides is
a list of Roles, RoleBindings, and Conditions. Other services are
allowed to get them, but evaluation happens on the proper server side.
The authorizer rarely needs to ask IAM for any data, if possible, it is I/O less. It relies on the RAM cache to store IAM resources internally. Therefore, checks are evaluated typically fast. More problematic are resource field conditions. If we have them, we will need to get current resources from database. For attach permissions, we may need to fetch them from other services.
Authorization middleware is generated per each method, but the pattern is always the same:
- We create BulkPermissionCheck object, where we collect all permissions
we want to check for this action. It is defined in the
iam/auth/types/bulk_permission_check.go
file. - Authorizer module, defined in
iam/auth/authorizer.go
file, checks if passed BulkPermissionCheck is all good and authenticated user is authorized for asked permissions. Some checks may be optional, like read checks for specific fields.
When we collect permissions for BulkPermissionCheck, we add:
- Main permission for a method. Resource name (or parent) is taken from
the request object, as indicated by
requestPaths
in the specification, or customized via proto file. - If we have some writing request (like Create or Update), and we are setting references to other resources, we need to add attach permission checks. Resource names are taken from referenced objects, not referencing the resource the user tries to write to.
- Optional read/set permissions if some resource fields are restricted. For authorization object strings we pass either collection names or specific resources.
Every permission must be accompanied by some resource or collection name (parent). Refer to the IAM user specification. In this document, we map specifications to code and explain details.
Within the Authorizer module, defined in the iam/auth/authorizer.go
,
we are splitting all checks by main IAM scopes it recognizes: Service,
Organization, Project, or System. Next, we delegate permission checks
to AuthInfoProvider
. It generates a list of PermissionGrantResult
relevant for all PermissionCheck
instances. The relationship between
these two types is many-to-many. A single Grant (assigned via RoleBinding)
can hold multiple permissions, and a user may have many RoleBindings, each
with different Grants: More than one Grant may be giving access to
the same permission.
If AuthInfoProvider notices that some PermissionCheck has unconditional PermissionGrantResult, it skips the rest. However, if there are conditions attached, there is a possibility that some will fail while others succeed. It makes a reason why we need multiple PermissionGrantResult per single PermissionCheck, if at least one is successful, then PermissionCheck passes. It works like an OR operator. Conditions in a single PermissionGrantResult must be evaluated positively.
Therefore, once AuthInfoProvider matches PermissionGrantResult
instances
with PermissionCheck
ones, we must evaluate conditions (if any). One
popular condition type we use is ResourceFieldCondition
. To evaluate
this kind, we fetch resources from the local database, other services,
and other regions. To facilitate this check as much as possible,
the authorizer iterates through all possible conditions and collects
all resources it needs to fetch. It fetches in bulk, connecting to
other services if necessary (attach permissions cases). For this reason,
we put a reference field to the PermissionCheck
object, it will contain
resolved resources, so all conditions may have easy access to it in case
they need it. If the service receives a PermissionDenied error when
checking other services, then PermissionDenied is forwarded to the user
with information that the service cannot see resources itself. It may
indicate an issue with missing the metadata.services.allowed_services
field.
On their own, conditions are simple, they execute fast, without any I/O work. We just check requests/resolved resources and verify whether specified conditions apply, according to IAM Role Grant conditions.
AuthInfoProvider for Authorizer
AuthInfoProvider gets only a set of checks grouped by IAM Scope
(A project, an organization, a service, or a system if none of
the before). As per IAM specification, the service scope inherits
all RoleBindings from the project that owns the service. If we need
to validate permissions in the project scope, we must also accept
RoleBindings from the parent organization (if set), and full ancestry
path. RoleBindings in system scope are valid in all scopes. Moreover,
even the principal may have multiple member IDs (native one with email,
then domain, then allAuthenticatedUsers, allUsers). This creates
lots of potential RoleBindings to check. Furthermore, we should be
aware that Authorizer is part of all API servers! As SPEKTRA Edge provides
a framework for building 3rd party services, they can’t trust each
other. Therefore, AuthInfoProvider of any service it runs on can only
ask for RoleBindings that it is allowed to see (according to
metadata.services.allowed_services
).
IAM Controller is copying organization-level RoleBindings to child
sub-organizations and projects, but we don’t copy (at least yet)
RoleBindings from service project to a service. We also don’t copy
system-level RoleBindings to all existing projects and organizations.
It should typically stay that way, because system-level role bindings
are rather internal, and should not leak to organization/project admins.
The module for copying RoleBindings is in file
iam/controller/v1/iam_scope/org_rbs_copier.go
. It also handles changes
in the parent organization field.
During authorization, AuthInfoProvider must list and fetch all
RoleBindings per each memberId/IAM Scope combination. It must
also only fetch role bindings relevant to the current service.
We first try to get from the local cache, in case of a miss,
we ask IAM. This is why in CheckPermissions
we grab all possible
RoleBindings. We filter out RoleBindings by subScope or role ID
later on. We try to strip all unnecessary fields, to ensure
AuthInfoProvider can hold (RAM-based cache!) as much data as possible.
Additionally, we try to use integer identifiers for roles and
permission names.
To hold RoleBindings per member ID, we may need like, two KiBs
of data on average. If we cache principal, let’s say four. Using
one MiB we could hold data for 256 principals. 256 MiB can hold then
65K of principals. Let’s divide by two for a safety margin. As
a result, we can expect 256 MiB to hold tens of thousands of active
users. This is why AuthInfoProvider caches all RoleBindings principal
can have in each scope. We extract data from IAM only when the cache
expires, for new principals, or when the server starts up for the first
time. This is why GetAssignments
(method of RoleBindings store) is
looking like it looks.
When we have all RoleBindings for relevant members and relevant IAM scope,
then we can iterate PermissionCheck (object + permission) against all
assignments. If many assignments match the given PermissionCheck, then
PermissionCheck will have multiple Results
(variable).
RoleBindings (converted to RoleAssignment
for slimmer RAM usage) are
matched with permissions if:
- they have
owned_objects
which match the object name in thePermissionCheck
. - if the above fails, we check if the Role pointed by RoleBinding
has any Grants containing permissions specified in
PermissionCheck
. - if there are any Grants, we need to check if subScope matches (if it is specified). PermissionCheck contains iam scope and sub-scope forming a full object name. It allows us to have granularity on specific resources.
- if we find a Grant matching PermissionCheck, we store it in Results, note Grant can carry conditions, but we haven’t evaluated them yet.
Thanks to the cache, I/O work by AuthInfoProvider is practically non-existent, typically it can quickly provide list of assigned permissions with a list of conditions.
ConditionChecker for Authorizer
Each PermissionCheck can have multiple results, which can contribute to allowed Permissions. If the result item has no conditions, then we can assume permissions are granted. If it has, then all conditions must be evaluated successfully, so we iterate in the Authorizer code.
ConditionChecker is implemented in file
iam/auth/internal/condition_checker.go
. We have 3 condition types:
- checking by resource field, function
checkByResourceField
- checking by request field, function
checkByRequestField
- checking by CEL condition, function
checkByCELCondition
(will be retired though).
Resource conditions are the most popular, and for good reason, they are
simple and can handle at least full CRUD, and often custom functions too.
For example, suppose we want to assign certain users access to devices
if the field path satisfies metadata.annotations.key = value
:
- CreateDeviceRequest will be forbidden if this field path with a given value is not specified in the resource body.
- UpdateDeviceRequest will be forbidden if we are trying to update this field path to a different value or if the current resource stored in the database does not match.
- DeleteDeviceRequest checks if the Device in the database matches.
- Get/BatchGetDevice(s) are extracted from the database and the condition is checked
- WatchDevice also is checked when the stream starts, we grab resources from the database and evaluate them.
- ListDevices and WatchDevices have a Filter field, so we don’t need to grab anything from DB.
- If there are custom methods, we can still get resources from DB and check if the field path is fine.
We also support attach permissions with resource field conditions, if necessary, we fetch resources from other services. Fetching is done before condition evaluations.
A smaller weakness is the need to have extra checks in the database. The object may be stored in Redis though, giving perhaps a faster answer, but still goes through the network stack. Perhaps another RAM-based cache can be used for storage, but invalidation may be a problem if we want to include List queries. For resource updates, we need to invalidate the previous and new state, and Firestore watch shows us only the new state. Mongo may be more beneficial in this case, especially if we consider the fact that it has active watches for all collections (!!!). It may work for collections especially non-frequently updated.
Checks by request are simpler and aimed at custom methods typically.
Checks by CEL condition are so far being less and less used in v1, but may still have some special use cases if yaml (protobuf) declaration is not enough. They use conditions with bodies specified in the iam.edgelq.com/Condition resource. ConditionChecker uses AuthInfoProvider to grab Conditions from IAM.