SPEKTRA Edge IAM Authorization

Understanding the SPEKTRA Edge IAM authorization.

Authorization happens in its dedicated server middleware, see any generated one, like https://github.com/cloudwan/edgelq/blob/main/devices/server/v1/device/device_service.pb.middleware.authorization.go.

As Authorization middleware is assumed to be after multi-region routing, we can assume that IAM Service from local region holds all resources required to execute authorization locally, specifically: RoleBindings, Roles, Conditions.

Note that IAM itself does not execute any Authorization, Middleware is generated for each service. We have an Authorizer a module that is compiled with all API servers for all services. What IAM provides is a list of Roles, RoleBindings, and Conditions. Other services are allowed to get them, but evaluation happens on the proper server side.

The authorizer rarely needs to ask IAM for any data, if possible, it is I/O less. It relies on the RAM cache to store IAM resources internally. Therefore, checks are evaluated typically fast. More problematic are resource field conditions. If we have them, we will need to get current resources from database. For attach permissions, we may need to fetch them from other services.

Authorization middleware is generated per each method, but the pattern is always the same:

We create BulkPermissionCheck object, where we collect all permissions we want to check for this action. It is defined in the iam/auth/types/bulk_permission_check.go file.
Authorizer module, defined in iam/auth/authorizer.go file, checks if passed BulkPermissionCheck is all good and authenticated user is authorized for asked permissions. Some checks may be optional, like read checks for specific fields.

When we collect permissions for BulkPermissionCheck, we add:

Main permission for a method. Resource name (or parent) is taken from the request object, as indicated by requestPaths in the specification, or customized via proto file.
If we have some writing request (like Create or Update), and we are setting references to other resources, we need to add attach permission checks. Resource names are taken from referenced objects, not referencing the resource the user tries to write to.
Optional read/set permissions if some resource fields are restricted. For authorization object strings we pass either collection names or specific resources.

Every permission must be accompanied by some resource or collection name (parent). Refer to the IAM user specification. In this document, we map specifications to code and explain details.

Within the Authorizer module, defined in the iam/auth/authorizer.go, we are splitting all checks by main IAM scopes it recognizes: Service, Organization, Project, or System. Next, we delegate permission checks to AuthInfoProvider. It generates a list of PermissionGrantResult relevant for all PermissionCheck instances. The relationship between these two types is many-to-many. A single Grant (assigned via RoleBinding) can hold multiple permissions, and a user may have many RoleBindings, each with different Grants: More than one Grant may be giving access to the same permission.

If AuthInfoProvider notices that some PermissionCheck has unconditional PermissionGrantResult, it skips the rest. However, if there are conditions attached, there is a possibility that some will fail while others succeed. It makes a reason why we need multiple PermissionGrantResult per single PermissionCheck, if at least one is successful, then PermissionCheck passes. It works like an OR operator. Conditions in a single PermissionGrantResult must be evaluated positively.

Therefore, once AuthInfoProvider matches PermissionGrantResult instances with PermissionCheck ones, we must evaluate conditions (if any). One popular condition type we use is ResourceFieldCondition. To evaluate this kind, we fetch resources from the local database, other services, and other regions. To facilitate this check as much as possible, the authorizer iterates through all possible conditions and collects all resources it needs to fetch. It fetches in bulk, connecting to other services if necessary (attach permissions cases). For this reason, we put a reference field to the PermissionCheck object, it will contain resolved resources, so all conditions may have easy access to it in case they need it. If the service receives a PermissionDenied error when checking other services, then PermissionDenied is forwarded to the user with information that the service cannot see resources itself. It may indicate an issue with missing the metadata.services.allowed_services field.

On their own, conditions are simple, they execute fast, without any I/O work. We just check requests/resolved resources and verify whether specified conditions apply, according to IAM Role Grant conditions.

AuthInfoProvider for Authorizer

AuthInfoProvider gets only a set of checks grouped by IAM Scope (A project, an organization, a service, or a system if none of the before). As per IAM specification, the service scope inherits all RoleBindings from the project that owns the service. If we need to validate permissions in the project scope, we must also accept RoleBindings from the parent organization (if set), and full ancestry path. RoleBindings in system scope are valid in all scopes. Moreover, even the principal may have multiple member IDs (native one with email, then domain, then allAuthenticatedUsers, allUsers). This creates lots of potential RoleBindings to check. Furthermore, we should be aware that Authorizer is part of all API servers! As SPEKTRA Edge provides a framework for building 3rd party services, they can’t trust each other. Therefore, AuthInfoProvider of any service it runs on can only ask for RoleBindings that it is allowed to see (according to metadata.services.allowed_services).

IAM Controller is copying organization-level RoleBindings to child sub-organizations and projects, but we don’t copy (at least yet) RoleBindings from service project to a service. We also don’t copy system-level RoleBindings to all existing projects and organizations. It should typically stay that way, because system-level role bindings are rather internal, and should not leak to organization/project admins. The module for copying RoleBindings is in file iam/controller/v1/iam_scope/org_rbs_copier.go. It also handles changes in the parent organization field.

During authorization, AuthInfoProvider must list and fetch all RoleBindings per each memberId/IAM Scope combination. It must also only fetch role bindings relevant to the current service. We first try to get from the local cache, in case of a miss, we ask IAM. This is why in CheckPermissions we grab all possible RoleBindings. We filter out RoleBindings by subScope or role ID later on. We try to strip all unnecessary fields, to ensure AuthInfoProvider can hold (RAM-based cache!) as much data as possible. Additionally, we try to use integer identifiers for roles and permission names.

To hold RoleBindings per member ID, we may need like, two KiBs of data on average. If we cache principal, let’s say four. Using one MiB we could hold data for 256 principals. 256 MiB can hold then 65K of principals. Let’s divide by two for a safety margin. As a result, we can expect 256 MiB to hold tens of thousands of active users. This is why AuthInfoProvider caches all RoleBindings principal can have in each scope. We extract data from IAM only when the cache expires, for new principals, or when the server starts up for the first time. This is why GetAssignments (method of RoleBindings store) is looking like it looks.

When we have all RoleBindings for relevant members and relevant IAM scope, then we can iterate PermissionCheck (object + permission) against all assignments. If many assignments match the given PermissionCheck, then PermissionCheck will have multiple Results (variable).

RoleBindings (converted to RoleAssignment for slimmer RAM usage) are matched with permissions if:

they have owned_objects which match the object name in the PermissionCheck.
if the above fails, we check if the Role pointed by RoleBinding has any Grants containing permissions specified in PermissionCheck.
if there are any Grants, we need to check if subScope matches (if it is specified). PermissionCheck contains iam scope and sub-scope forming a full object name. It allows us to have granularity on specific resources.
if we find a Grant matching PermissionCheck, we store it in Results, note Grant can carry conditions, but we haven’t evaluated them yet.

Thanks to the cache, I/O work by AuthInfoProvider is practically non-existent, typically it can quickly provide list of assigned permissions with a list of conditions.

ConditionChecker for Authorizer

Each PermissionCheck can have multiple results, which can contribute to allowed Permissions. If the result item has no conditions, then we can assume permissions are granted. If it has, then all conditions must be evaluated successfully, so we iterate in the Authorizer code.

ConditionChecker is implemented in file iam/auth/internal/condition_checker.go. We have 3 condition types:

checking by resource field, function checkByResourceField
checking by request field, function checkByRequestField
checking by CEL condition, function checkByCELCondition (will be retired though).

Resource conditions are the most popular, and for good reason, they are simple and can handle at least full CRUD, and often custom functions too. For example, suppose we want to assign certain users access to devices if the field path satisfies metadata.annotations.key = value:

CreateDeviceRequest will be forbidden if this field path with a given value is not specified in the resource body.
UpdateDeviceRequest will be forbidden if we are trying to update this field path to a different value or if the current resource stored in the database does not match.
DeleteDeviceRequest checks if the Device in the database matches.
Get/BatchGetDevice(s) are extracted from the database and the condition is checked
WatchDevice also is checked when the stream starts, we grab resources from the database and evaluate them.
ListDevices and WatchDevices have a Filter field, so we don’t need to grab anything from DB.
If there are custom methods, we can still get resources from DB and check if the field path is fine.

We also support attach permissions with resource field conditions, if necessary, we fetch resources from other services. Fetching is done before condition evaluations.

A smaller weakness is the need to have extra checks in the database. The object may be stored in Redis though, giving perhaps a faster answer, but still goes through the network stack. Perhaps another RAM-based cache can be used for storage, but invalidation may be a problem if we want to include List queries. For resource updates, we need to invalidate the previous and new state, and Firestore watch shows us only the new state. Mongo may be more beneficial in this case, especially if we consider the fact that it has active watches for all collections (!!!). It may work for collections especially non-frequently updated.

Checks by request are simpler and aimed at custom methods typically.

Checks by CEL condition are so far being less and less used in v1, but may still have some special use cases if yaml (protobuf) declaration is not enough. They use conditions with bodies specified in the iam.edgelq.com/Condition resource. ConditionChecker uses AuthInfoProvider to grab Conditions from IAM.