Registering your Service to the SPEKTRA Edge platform

How to register your service to the SPEKTRA Edge platform.

While goten provides a framework for building services, SPEKTRA Edge provides a ready environment with a set of common, pre-defined set of services. This document describes a selected set of specific registrations needed by the developer, other services can and should typically be used with the standard API approach.

Integration with SPEKTRA Edge is practically enforced/recommended on multiple levels:

  • Your service needs to register itself in meta.goten.com, otherwise it can’t simply work.
  • Your resources model must be organized around the following top resources:
    • meta.goten.com/Service
    • iam.edgelq.com/Organization
    • iam.edgelq.com/Project
      • For multi-tenants, you need to have your Project resource in the api-skeleton.
  • You need to follow authentication & authorization model of iam.edgelq.com.
  • Although you may skip it, it is highly recommended to use audit.edgelq.com to Audit the usage of service, and monitoring.edgelq.com to track its usage. Your service activities in core SPEKTRA Edge are monitored by those services.
  • If a service needs to control the amount of resources, limits.edgelq.com is highly recommended.

The above list contains mandatory or highly recommended registrations, but practically all services are at your disposal. SPEKTRA Edge provides also edge devices with their own OS, where you can deploy your agent applications. Hardware and containers are managed using services devices.edgelq.com and applications.edgelq.com.

Service with a high level of registration example: https://github.com/cloudwan/inventory-manager-example

This provides more insights into how custom services can be integrated with core SPEKTRA Edge services.

Fixtures controller

Before jumping into SPEKTRA Edge registration, one common element of all registrations is the fixtures controller.

The fixtures controller is responsible for creating & updating resources in various services that are needed for:

  • Correct operation of an SPEKTRA Edge Service. Example: Service iam.edgelq.com needs a list of permissions from each service, that describe what users can do in a given service. If permissions are not maintained in IAM, then SPEKTRA Edge will have trouble helping with Authorization. It would render the Service non-operable as a result. As part of the bootstrapping Service, Permission fixtures must be submitted by interested Service.

  • Correct operation of a Project or Organization.

    Example: The user who created a given Project/Organization automatically gets an administrator RoleBinding resource in the created Project or Organization. Without it, the creator of a Project/Organization would not be able to access their entity. It would render it non-operable.

Some fixtures are a bit more dynamic. For example, when an existing Project is enabling some particular service, then a given Service automatically gets RoleBinding in a project, which allows the Service to manage its resources that are associated with the Service. Without it, Service would not be able to provide services to a project, rendering it non-operable.

Those cases are handled by the fixtures controller, by convention, the fixtures controller is part of controller runtime.

Be aware, that the fixtures controller not only keeps in sync by creating/updating resources. It also detects if there is UNNEEDED fixture that is not defined, but exists, it is then deleted. This is necessary to clean up the garbage, as, in proper conditions, it also has the potential to make the Service/Project/Organization non-operable and full of errors.

The Fixtures Controller works in this way: It computes a DESIRED set of resources. Then it uses CRUD to get the observed state, and compares it with desired, finally executes a set of Create/Update/Delete calls as necessary. If there is a dynamic change in the desired state, the controller computes & executes a new set of commands. If there is a dynamic change in the observed state, the fixtures controller will attempt to fix it.

Fixtures are a set of YAML files in the fixtures directory. They are either completely static or templated (have <VARIABLE> elements). Templated fixtures are created < FOR EACH > Project, or Organization, or Service - typically, but not limited to. Those “for each” fixture provide a source of dynamic updates to the desired state.

Fixtures are built into the controller image during compilation. Then config file decides the rest, like how variables are resolved. See basic fixtures controller config in: https://github.com/cloudwan/inventory-manager-example/blob/master/config/controller.proto.

For fixtures, for every resource type, it is necessary to include an access package for related resources. For example, see https://github.com/cloudwan/inventory-manager-example/blob/master/cmd/inventorymanagercontroller/main.go, and the fine import types needed by the fixture controller! This list must include all resource types fixtures the controller can create OR requires (via forEach directive in the config file).

During each registration, we will explain all discuss various fixtures:

  • For IAM registration, we will define some static fixtures
  • For adding projects, we will show examples of “synchronized collections”, dynamic fixture examples.
  • For Monitoring registration, we will show some static and dynamic (per project) fixtures.
  • For Logging registration, we will show again some per-project fixtures.
  • For Limits, we use plans as static fixtures.

What your service is authorized to do

Your service uses IAM ServiceAccount which will have its own assigned RoleBindings. For any SPEKTRA Edge-based service, you will be allowed to:

  • Do anything in your service namespace: services/{service}/. Your ServiceAccount will be marked as owner, so you will be able to do anything there. This applies to Service resources in all services, including core SPEKTRA Edge.
  • Do anything in the root scope (/), AS LONG AS permissions are related to your service. So for example, if your ServiceAccount wants to execute some Create for a resource belonging to your service, it will be able to. But not in other services, and especially not in core SPEKTRA Edge services.
  • ServiceAccount can do some things in projects/$SERVICE_PROJCT_ID, depending on the role you are assigned to when making the initial service reservation as described in the preparation section.
  • For core SPEKTRA Edge services, you will be able to have read access to all resources in the root scope (/), as long as they will satisfy the following filter condition: metadata.services.allowedServices CONTAINS $YOUR_SERVICE.
  • For core SPEKTRA Edge services, you will be able to have write access to all resources in the root scope (/), as long as they satisfy the following filter condition: metadata.services.owningService == $YOUR_SERVICE.
  • You will be able to create resources in projects that enable your service in its enabled_services field. But they will have to specify metadata.services.owningService = $YOUR_SERVICE if we talk about core SPEKTRA Edge service’s resources.

These rough permissions above must be remembered when you start making requests from your service. Those limitations are reflected in various examples (for example, when you create ServiceAccount for a project, you need to specify proper metadata.services).

IAM registration

Introduction

Service iam.edgelq.com handles all actor collections (Users, ServiceAccounts, Groups), tenants (Organizations, Projects), permission related (Permissions, Roles), finally binds actors with permissions within tenant scopes (RoleBinding).

The only tenant-type resource not in iam.edgelq.com is Service, which resides in meta.goten.com. It is still treated as a kind of tenant from the IAM point of view.

The primary point of registration between IAM and any SPEKTRA Edge-based service is permission-related. Permissions are generated for all services, for each API method (any API group). The typical format of permission is the following: services/$SERVICE_NAME/permissions/$COLLECTION_NAME.$ACTION_VERB. If some method has no resource configured in the API skeleton (no opResourceInfo value!), then permission has a name of this format: services/$SERVICE_NAME/permissions/$ACTION_VERB.

Variable $SERVICE_NAME is naturally a service name in domain format, $COLLECTION_NAME is a lowerPluralCamelJson format of resource collection (examples: role bindings, devices…), finally $ACTION_VERB is equal to the value of verb of the method in the api-skeleton file. For example, the action CreateRoleBinding operates on the roleBindings collection, the verb is create, and the service, where the action is defined, is iam.edgelq.com. Therefore, the permission name is services/iam.edgelq.com/permissions/roleBindings.create.

Another popular permission type is the “attach” kind. Even if the permission holder can create/update a resource if that the resource has references to different ones, then authorization must also validate actor can create a reference relationship. For example, the caller can create a RoleBinding thanks to the services/iam.edgelq.com/permissions/roleBindings.create permission, but reference to a Role requires that holder also has permission services/iam.edgelq.com/roles!attach.

You should be already familiar with the IAM model, using its README.

What is provided during generation

During Service code generation, the IAM protoc plugin analyzes a service and collects all permissions that need to exist. It creates a file in the fixtures directory, with the name <service_short_name>.pb.permissions.yaml. Apart from that, it also generates Authorization middleware for your server specifically.

Authorization middleware extracts WHAT for each call:

  • Collection (typically parent field) for collection type methods (isCollection = true in API-skeleton)
  • Resource name (typically name field) for single resource non-collection type methods (isCollection and isPlural = false)
  • Resource names (typically names field) for plural non-collection methods (BatchGet examples! isCollection is false, isPlural true).

To get this WHAT, it uses by default values provided in the API skeleton: Param opResourceInfo.requestPaths in an Action declaration. Note CRUD has implicit built-ins. It gets authenticated principal from the current context object (associated with the call) and attaches permission related to the current call. It uses the generic Authorizer component to verify if the request should pass or be denied.

Minimal registration required from developers

This whole registration is almost out of the box. The minimal elements to do are:

  • Developers need to create an appropriate main.go file for the server, with Auth-related modules. In the constructor for the main service server handlers, Authorization middleware must be added to the chain, all according to the example InventoryManager.
  • Developers are highly recommended to write their role fixtures per their service (Static fixture). Roles are necessary to bind users with permissions. Roles should be well-thought-out. Inventory manager has basic roles for users and specific limited role examples for agent application, with access to clearly defined resources within tenant project. Although there is a fixture called <service_short_name>.pb.default.roles.yaml provided, they are very limited and usually a “bad guess”. Usually, we create a file called <service_short_name>_roles.yaml for manually written ones.
  • Developers must configure at the minimum two fixture files: <service_short_name>_roles.yaml (or <service_short_name>.pb.default.roles.yaml), then <service_short_name>.pb.permissions.yaml.

Fixture controller registration requires two parts. First, in the main.go file for a controller, it is required to import github.com/cloudwan/edgelq/iam/access/v1/permission and github.com/cloudwan/edgelq/iam/access/v1/role. Those packages contain modules that are imported by the fixtures controller framework provided by Goten/SPEKTRA Edge. The fixtures controller analyzes YAML files and tries to find in the global registry associated types, without it, a program will crash.

Second, in a config file of the controller, you need to define fixture file paths. You can copy-paste them from the inventory manager example, like:

fixtureNodes:
  global:
    manifests:
    - file: "/etc/lqd/fixtures/v1/inventory_manager.pb.permissions.yaml"
      groupName: "inventory-manager.edgelq.com/Permissions/CodeGen"
      parent: "services/inventory-manager.edgelq.com"
    - file: "/etc/lqd/fixtures/v1/inventory_manager_roles.yaml"
      groupName: "inventory-manager.edgelq.com/Roles"
      parent: "services/inventory-manager.edgelq.com"

It will be mentioned in the deployment document, but by convention, the fixtures directory is placed in the /etc/lqd path.

Two notes:

  1. groupName is mandatory and generally should be unique. This helps in case there is more than one fixture file for the same resource type, to ensure they don’t clash. Still, resource names also must be unique.
  2. The parent field is mandatory in this particular case too, here, the fixtures controller gets a guarantee that all Roles and Permissions have the same parent resource called exactly services/inventory-manager.edgelq.com (in this case). Note that a Service has only access to scopes it owns. Without this parent value specified, we would get PermissionDenied error. We will also get a PermissionDenied error if, in the fixture file, we would attempt to create a Role or Permission with a different parent.

Using this example, we should clarify yet another thing: The Fixtures controller not only creates/updates resources that are defined in the fixtures. It also DELETES those that are not defined within fixtures. This is why we have groupName and parent. For example, if there was a Role, which groupName is equal toinventory-manager.edgelq.com/Roles, and its parent is equal to services/inventory-manager.edgelq.com, and it would not exist within the fixture file as defined by /etc/lqd/fixtures/v1/inventory_manager_roles.yaml, it WOULD BE DELETED. This is why params groupName or parents play an important role here, and why we would get PermissionDenied without parents. The fixtures controller always gets the observed state to compare against the desired one. This observed state is obtained using regular CRUD, and this is why we need to specify a parent for Roles/Permissions, the service will not be authorized if it tries to get resources from ALL services.

So far we explained the mandatory part of IAM registration. The first common additional registration, although a very small one, is to declare some actions of a Service public. An example is here: https://github.com/cloudwan/inventory-manager-example/blob/master/fixtures/v1/inventory_manager_role_bindings.yaml

We are granting some public role to all authenticated users, regardless of who they are (but they are users of our service). This requires a separate entry in fixtures and import in main.go for RoleBinding (access packages).

More advanced IAM registration

In this topic, there are two things extra that are offered:

  1. IAM provides a way to OVERRIDE generated Authorization middleware. Developers can define additional protobuf files with special annotations in their proto/$VERSION directory, that will be merged on generated/assumed defaults.
  2. Some fields in resources can be considered sensitive from a reading or writing perspective. Developers can define custom IAM permissions that are required to be owned to write to/read from them. Permissions and protected fields can be defined in protobuf files.

Starting from the first part, overriding Authorization defaults. By convention, we create an authorization.proto file along with others. Some simple examples:

Example service provides a first basic example: To disable Authorization altogether for a given action, you just need to provide a skip_authorization annotation flag for a specific method, in a specific API group. Since this example is a little too simplified, examples for Audit and IAM were provided as being more interesting.

For example, take the ListActivityLogs method:

{
  name : "ListActivityLogs"
  action_checks : [ {
    field_paths : [ "parents" ]
    permission : "activityLogs.list"
  } ]
}

There is an important problem with this particular method: SPEKTRA Edge code-generation supports the collection, single resource, or multi-resource request types. However, in ListActivityLogsRequest we have a plural parents field because we are enabling users to query from multiple collections at once. This is a kind of isPluralCollection type. But such an annotation does not exist in api-skeleton. However, there is some level of enhancement: we can explicitly tell IAM to use the “parents” field path, and it will authorize all individual paths from this field. If the user does not have access to any of the parents, they will receive a PermissionDenied error.

There is also the possibility to provide multiple field paths (but only one will be used).

Another interesting case example, is CreateProject:

{
  name : "CreateProject"
  action_checks : [ {
    field_paths : [ "project.parent_organization" ]
    permission : "projects.create"
  } ]
}

In api-skeleton, Project and Organization are both “top” resources. Their name patterns are: projects/{project} and organizations/{organization}. Judging by these, the creation project should require permission on the system level and, the same for the organization. However, in practice we want projects to be final tenants and organizations’ intermediaries. Note that Organization and Project resources have a parent_organization field. Especially for organization resources, it is not possible to specify that the parent of the Organization is “Organization”. Name pattern cannot be like: organizations/{organization}/organizations/{organization}/.... Therefore, from a naming perspective, both projects and organizations are considered to be “top” resources. However, when it comes to creation, IAM Authorization middleware should make an exception, and take authorization scope object (WHERE) from a different field path, in the case of CreateProject, it must be project.parent_organization. This changes generated code of Authorization for CreateProject, and permission is required in the parent organization scope instead.

To declare sensitive fields in resources, it is necessary to use annotations.iam.auth_checks annotations. There are no current examples in InventoryManager, but there are some examples in secrets.edgelq.com:

As of now, there is:

option (annotations.iam.auth_checks) = {
  read_checks : [
    {permission : "mask_encrypted_data" paths : "enc_data"},
    {permission : "secrets.sensitiveData" paths : "data"}
  ]
};

Note you also need to include also edgelq/iam/annotations/iam.proto import in the resource proto file.

When the secret is being read, then additional permissions may be checked:

  • services/secrets.edgelq.com/permissions/mask_encrypted_data, if denied, field path enc_data will be cleared from response object.
  • services/secrets.edgelq.com/permissions/secrets.sensitiveData, if denied, field path data will be cleared from the response object.

Those read checks apply to all methods that contain resource bodies in response, therefore, even UpdateSecret or CreateSecret responses would have fields cleared. However, it will mostly be used to clear values from List/Search/Get/BatchGet responses.

Param set_checks are just like read_checks, but work in reverse.

Note that you can specify multiple paths.

Users are generally free to pick any permission name for set/read checks, but it is recommended to follow secrets.sensitiveData than mask_encrypted_data.

To have a full document about iam-related protobuf annotations, you can access it here: https://github.com/cloudwan/edgelq/blob/main/iam/annotations/iam.proto.

Adding projects (tenants) to the service

For multi-tenant cases, it is recommended to copy Project resources from iam.edgelq.com into 3rd party service. You need a Project resource declared yourself in api-skeleton. This copying, or syncing was already mentioned in some places in developer-guide, as collection synchronization.

Service based on SPEKTRA Edge should copy only these projects, which are enabling that particular service (in enabled_services list). Note that services based on SPEKTRA Edge can only filter projects/organizations that are using particular services themselves.

Once the project instance copy is in the service database, it is assumed that it is now able to use that service. If project removes service from allowed, then its copy is removed from the service database (garbage collecting).

An example of registration is in InventoryManager. Integration steps:

Let’s copy and paste part of the config and discuss it more:

fixtureNodes:
  global:
    manifests:
    - file: "/etc/lqd/fixtures/v1/inventory_manager_project.yaml"
      groupName: "inventory-manager.edgelq.com/Projects"
      createForEach:
      - kind: iam.edgelq.com/Project
        version: v1
        filter: "enabledServices CONTAINS \"services/inventory-manager.edgelq.com\""
        varRef: project
      withVarReplacements:
      - placeholder: <project>
        value: $project.Name.ProjectId
      - placeholder: <multiRegionPolicy>
        value: $project.MultiRegionPolicy
      - placeholder: <metadataLabels>
        value: $project.Metadata.Labels
      - placeholder: <metadataServices>
        value: $project.Metadata.Services
      - placeholder: <title>
        value: $project.Title

As always, we need to provide file and groupName variables. Note that the resource we are creating in this fixture belongs to our service: inventory-manager.edgelq.com/Project. Because it is ours, the service does not need an additional parent or filter to be authorized correctly, so those parameters are not necessary here.

We have some new elements though, first is the createForEach directive. It instructs to create fixtures defined in a mentioned file for each combination of input resources. In this case, we have one input resource, and its type is iam.edgelq.com/Project, in version v1. Our service cannot list all IAM projects, but it can list them if they enable our service, therefore we are passing the proper filter param. Besides, we should create project copies only for projects interested in our service anyway. Each instance of iam.edgelq.com/Project is remembered as project variable (as indicated by varRef).

When fixtures are evaluated from file /etc/lqd/fixtures/v1/inventory_manager_project.yaml per each iam project, we need to replace all variables, so the final YAML is produced. This example above should be relatively self-explanatory. You may note, however, that you can extract IDs from names, and take full objects (fixtures variables are not limited to primitives), maps, or slices.

There is however one more important aspect: Project admins cannot by default add your service to their enabled list. This is to prevent the attachment of a private service to a project, it may be against the service maintainer’s wishes. To allow someone to create/update a project/organization using your service, you will need to create a RoleBinding:

cuttle iam create role-binding \
  --service $YOUR_SERVICE \
  --role 'services/iam.edgelq.com/service-user' \
  --member $ADMIN_OF_ORGS_AND_PROJECTS

Provided user from now on can create new organizational entity that uses your service.

Audit registration

Overview

SPEKTRA Edge provides a LogsExporter component, which is part of observability. It records selected API calls (unary and streams), and submits them to audit.edgelq.com. All activity or resource change logs are classified as service, organization, or project scoped. Out of these 3, service logs are default, if the method call was not classified as neither project nor organization.

Scope classification is relatively simple: When a unary request arrives, the logs exporter analyzes the request, extracts resource name(s) and collection, and decides what is the scope of the request (project, organization, or service). Resource change logs are submitted just before the transaction is concluded, if logs could not have been sent, the transaction fails. This is to ensure that we always track resource change logs at least. Activity logs are submitted in a manner of seconds after the request finishes, which allows some degree of lost messages. In practice, it does not happen often.

For streams, Audit examines client and server messages before deciding how activity logs should look like.

Resource change logs are submitted based on transaction lifespan regardless of grpc method streaming kinds.

Minimal registration

The audit requires minimal effort from developers to include in its default form. They just need to put a little initialization in the main.go file for a server runtime, as in the example InventoryManager service. You can see it in https://github.com/cloudwan/inventory-manager-example/blob/master/cmd/inventorymanagerserver/main.go.

Find the following strings:

  • NewAuditStorePlugin is necessary to add to a store handle. It is a plugin that observes changes on DB.
  • InitAuditing is necessary to initialize the Audit Logs exporter that your server will use. You need to pass all relevant handlers (code-generated).

Audit handlers are code generated based on method annotations (therefore, the API skeleton decides normally). There are the following defaults:

  • Api-skeleton annotations opResourceInfo.requestPaths and opResourceInfo.responsePaths are used to determine what field paths in request/response objects contain values that would be interesting from an Audit point of view.
  • Audit by default focuses on auditing all writing calls. It checks the api-skeleton annotation withStoreHandle in each action. If the transaction type is SNAPSHOT or MANUAL, then the call will be logged, not otherwise.
  • By default, activity log types will be always classified as some kind of writes. Other kinds require manual configuration.

From this point on Audit will work, and service developers will be able to query for logs from their service. Let’s discuss list of possible customizations.

Customizations on the proto files level

Generally, full proto customizations can be found here: https://github.com/cloudwan/edgelq/blob/main/audit/annotations/audit.proto You will need to include the edgelq/audit/annotations/audit.proto import to use any audit annotations.

The most common customization is the categorization of write activities for a resource. Activity logs have categories: Operations, Creations, Deletions, Spec Updates, State Updates, Meta Updates, Internal, Rejected, Client and Server errors, Reads.

Note that write categories are quite a few: creations, deletions, and three different update kinds. Creations and deletions are easy to classify, but updates are not so much. When a resource is updated, the Audit Logs exporter examines a different object and determines which fields changed, and which not. To determine the update kind, it needs to know which fields are related to spec, which state, and which are meta. This has to be defined within the resource protobuf definition.

It is like the following:

message ResourceName {
  option (ntt.annotations.audit.fields) = {
    spec_fields : [
      "name",
      "spec_field",
      "other_spec_field"
    ]
    state_fields : [ "some_state_field" ]
    meta_fields : [ "metadata", "other_meta" ]
    hidden_fields : [ "sensitive_field", "too_big_field" ]
  };
}

We must classify all fields. Normally, we put “name” as a spec, and “metadata” as a meta field. Other choices are up to the developer. On top of spec/state/meta, we also can hide some fields from Audit at all (especially if they are sensitive, or big and we want to minimize log sizes).

Note that hidden_fields can also be defined for any messages, including request/response objects. Some example from SPEKTRA Edge: https://github.com/cloudwan/edgelq/blob/main/common/api/credentials.proto. See annotations for ServiceAccount, we are hiding private key objects for example, as this would be too sensitive to include in Audit logs. Be aware of what is being logged!

You can define field specifications on the resource level, or any nested object too.

Going back to update requests: Spec update takes importance over the state, then state over meta. Therefore, if we detect update that modifies one meta, two state, and one spec field, the update is classified as spec update.

Another part of customization developers may find useful, is to ability to attach labels to activity/resource change logs. Those logs can be queried (filtered) by service, method name, API version, resource name/type on which method operates (or which changed), category, and request ID… However, you can notice that resource change and activity logs also have a “labels” field, which is a generic map of strings. This can hold any labels that were defined by developers. Most common way of defining labels can be in request/response objects:

message ActionNameRequest {
  option (ntt.annotations.audit.fields) = {
    labels : [
      { path : "field_a", key: "label_a" },
      { path : "field_b", key: "label_b" }
    ]
    
    promoted_labels : [
      { label_keys : [ "label_a" ] }
    ]
  };
  
  string field_a = 1;
  
  string field_b = 2;
}

With this, you can start querying Activity logs like:

{parents: ["projects/tenant1"], filter: "service.name = \"custom.edgelq.com\" AND labels.field_a = \"V1\""}

This query above will also be optimized (index will be created, according to the promoted_labels value).

Note that each promoted label set require also service name and parent to be indexed!

Apart from field customization, developers can customize how Audit Logs Exporter handles method calls. We are typically creating the file auditing.proto in the proto/$VERSION directory for a given service. There we declare file-level annotation ntt.annotations.audit.service_audit_customizations.

Examples in SPEKTRA Edge:

Starting with the device’s service, for example for ProvisioningPolicyService, method ProvisionDeviceViaPolicy. As of now, we have annotations like:

{
  name : "ProvisionDeviceViaPolicy"
  activity_type : WriteType
  response_resource_field_paths : [ "device.name" ]
}

Method ProvisionDeviceViaPolicy has in api-skeleton:

actions:
- name: ProvisionDeviceViaPolicy
  verb: provision_device_via_policy
  withStoreHandle:
    readOnly: false
    transaction: SNAPSHOT

By default, opResourceInfo has these values for the action:

opResourceInfo:
  name: ProvisioningPolicy # Because this action is defined for this resource!
  isCollection: false      # Default is false
  isPlural: false          # Default is false
  # For single resource non-collection requests, defaults for paths are determined like below:
  requestPaths:
    resourceName: [ "name" ]
  responsePaths: {}

You can find request/response object definitions in: https://github.com/cloudwan/edgelq/blob/main/devices/proto/v1/provisioning_policy_custom.proto

This method primarily operates on the ProvisioningPolicy resource, and the exact resource can be extracted from the “name” field in the request. By default, Audit would decide that the primary resource for Activity logs for these calls is ProvisioningPolicy. The following Audit specification would be implicitly assumed:

{
  name : "ProvisionDeviceViaPolicy"
  activity_type : WriteType                 # Because withStoreHandle api-skeleton annotation tells it is a SNAPSHOT
  request_resource_field_paths : [ "name" ] # Because this is what requestPaths api-skeleton annotation tells us.
}

However, we know that this method takes the ProvisioningPolicy object, but creates a Device resource, and the response object contains the Device instance. To ensure that the field resource.name in Activity logs points to a Device, not ProvisioningPolicy, we write that response_resource_field_paths should point to device.name.

To be able to still query Activity logs by ProvisioningPolicy, we also attach annotation to request object:

option (annotations.audit.fields) = {
  labels : [ {key : "provisioning_policy_name" path : "name"} ]
};

This is one example modification of default behavior.

We can also disable auditing for particular methods entirely: Again in auditing.proto for the Devices service you may see:

{
  name : "DeviceService"
  methods : [ {name : "UpdateDevice" disable_logging : true} ]
},

The reason, in this case, is that, as of now, all devices are sending UpdateDevice each minute. To avoid too many requests to Audit, we have for now this disabled, till a solution is found (perhaps you already don’t see this part in auditing for devices).

In the auditing.proto file for the Proxies service ( https://github.com/cloudwan/edgelq/blob/main/proxies/proto/v1/auditing.proto ), you may see something different too:

{
  name : "BrokerService"
  methods : [
    {name : "Connect" activity_type : OperationType},
    {name : "Listen" activity_type : OperationType}
  ]
}

In Broker API in API-skeleton, you can see that Connect and Listen are streaming calls, Listen is used by an Edge agent to provide access to other actors, and Connect is used by an actor to connect an Edge agent. Those calls are non-writing and, therefore would not be audited by default. To force auditing, and classify them as Operation kind, we specify this directly in the auditing file.

A final example that is good to see, is the auditing file for monitoring: https://github.com/cloudwan/edgelq/blob/main/monitoring/proto/v4/auditing.proto.

First, you can see that we are classifying some resources as INTERNAL types, like RecoveryStoreShardingInfo. It means that any writes to these resources are not classified as writes, but as “internal”. This changes the category in Activity logs, making it easier to filter out. Finally, we are enabling reads auditing for ListTimeSeries call:

{
  name : "TimeSerieService"
  methods : [ {
    name : "ListTimeSeries"
    scope_field_paths : [ "parent" ]
    activity_type : ReadType
    disable_logging : false
  } ]
}

Before finishing, it will be worth we have some extra customizations in the code for ListTimeSeries calls.

Customizations of Audit in Golang code

There is a package github.com/cloudwan/edgelq/common/serverenv/auditing with some functions that can be used.

Most common examples can be summarized like this:

package some_server

import (
  "context"

  "google.golang.org/grpc/codes"
  "google.golang.org/grpc/status"

  "github.com/cloudwan/edgelq/common/serverenv/auditing"
)

func (srv *CustomMiddleware) SomeStreamingCall(
    stream StreamName,
) error {
	ctx := stream.Context()

    firstRequestObject, err := stream.Recv()
    if err != nil {
        return status.Errorf(status.Code(err), "Error receiving first client msg: %s", status.Convert(err).Message())
    }
	
	// Lets assume, that request contains project ID, but it is somehow encoded AND not available
	// from a field in a straight way. Because of this, we cannot provide protobuf annotation. We can
	// do this from code however:
	projectId := firstRequestObject.ExtractProjectId()
    auditing.SetCustomScope(ctx, "projects/" + projectId) // Now we ensure this is where log parent is.
	
	// We can set also some custom labels, because these were not available as any direct fields.
	// However, to have it working, we will still need to declare labels in protobuf:
	//
	// message StreamNameRequest {
	//  option (ntt.annotations.audit.fields) = {
	//    labels : [ { key: "custom_label" } ]
	//  };
	// }
	//
	// Note we specify only key, not path! But if we do this, we can then do:
	auditing.SetCustomLabel(ctx, "custom_label", firstRequestObject.ComputeSomething())

    // Now, we want to inform Audit Logs Exporter that this stream is exportable. If we did not do this,
	// then Audit would export Activity logs only AFTER STREAM FINISHES (this function exits!). If this
	// stream is long-running (like several minutes, or maybe hours), then it may not be the best option.
	// It would be better to send Activity logs NOW. However, be aware that you should not call
	// any SetCustomLabel or SetCustomScope calls after exporting stream - activity logs are "concluded"
	// and labels can no longer be modified. New activity log events may be still being appended for each
	// client and server message though!
	auditing.MarkStreamAsExportable(ctx)
	
	firstServerMsg := srv.makeFirstResp(stream, firstRequestObject)
	if err = stream.Send(firstServerMsg); err != nil {
      return status.Errorf(status.Code(err), "Error sending first server msg: %s", status.Convert(err).Message())
    }
	
	// There may be multiple Recv/Send here ...
	
	return nil
}

By default, Activity logs record all client/server messages, each represents an Activity Log Event object, appended to the existing Activity Log. It may not always be the best choice if objects are large. For example, for ListTimeSeries, which is audited, we don’t need responses. The request object contains elements like filter or parent, so we can predict/check what data was returned from monitoring. In such a case, we can disable appending ActivityLog (also, ListTimeSeriesResponse can be very large!):

func (r *ListTimeSeriesResponse) AuditShouldRecord() bool {
	return false
}

The function AuditShouldRecord can be defined for any request/response object. Audit Logs Exporter will examine if they implement this method to act accordingly.

We can also sample logs, we do this for ListTimeSeries. Since those methods are executed quite often, we don’t want too many activity logs for them. We implemented the following functions for request objects:

func (r *ListTimeSeriesRequest) ShouldSample(
	ctx context.Context,
	sampler handlers.Sampler,
) bool {
	return sampler.ShouldSample(ctx, r)
}

func (r *ListTimeSeriesRequest) SamplingKey() string {
	// ... Compute value and return
}

First, we need to implement ShouldSample, which gets the default sampler. If ShouldSample returns true, then the activity is logged. The default sampler requires a SamplingKey() string implemented from an object. It ensures that “new” requests are being logged, not similar to those before (at least till TTL expires or cache lost entry).

Also, if some streaming calls are heavy (like downloading a multi-GB image), make sure these requests/responses are not logged at all! Otherwise, Audit may get fat.

Monitoring registration (and usage notes)

Monitoring is a bit simpler case than IAM or Audit. Unlike them, it does not integrate on a protobuf level and does not inject any code. The common registration is via metric/resource descriptors, followed by periodic time series submission.

It is up to the service to decide if there is a need for time-series numeric data with aggregations needed. If there is, then service developers need to:

  • Declare MonitoredResourceDescriptor instances via fixtures file. Those resources are defined for the whole service.
  • Declare MetricDescriptor instances via fixture file. Those resources must be created per each project using a service.

With descriptors created from the fixture controller, clients can start submitting logs via CreateTimeSeries calls. It is recommended to use the cached client from Monitoring: https://github.com/cloudwan/edgelq/blob/main/monitoring/metrics_client/v4/tsh_cached_client.go

This typically is used for agents running on edge devices, it is the responsibility of service developers to create relevant code. It is good to use the InventoryManager example.

Fixture files for this example service can be found here:

Notes:

  • For MetricDescriptors, it is mandatory to provide value for metadata.services. The reason is, that the project is a separate entity from a Service, and can enable/disable the services it uses. Given limited access, the service should declare ownership of metric descriptors it is creating in a project.
  • As of now, in this example, the fixtures controller will forbid modifications of MetricDescriptors by project admins, for example, if they add some label or index, changes will be reverted to reflect these in fixtures. However, in the future, we plan to give some flexibility to mix user changes with fixtures. This can enable use cases, like additional indices that are usable for specific projects only. This allows per-tenant customizations. This is a good reason to keep MetricDescriptors are defined per project rather than per service.
  • Because metric descriptors are created per each project, we call them dynamic fixtures.

File main.go for a controller will need to import relevant Go packages from Monitoring. Example is in https://github.com/cloudwan/inventory-manager-example/blob/master/cmd/inventorymanagercontroller/main.go.

Packages needed are:

  • github.com/cloudwan/edgelq/monitoring/access/v4/metric_descriptor
  • github.com/cloudwan/edgelq/monitoring/access/v4/monitored_resource_descriptor

In this config file ( https://github.com/cloudwan/inventory-manager-example/blob/master/deployment/controller-config.yaml ) we can find usage of these two fixture files. Note that MonitoredResourceDescriptors instances are declared with a parent. This is again, like in IAM registration, ensuring that the fixtures controller only gets the observed state from this particular sub-collection. Resources MetricDescriptors don’t specify the parent field (we have multiple projects!). Therefore, we must provide different mechanisms to ensure we get access to metric descriptors we can access. We do this with filter param: We filter by metadata.services.owningService value. This way we guarantee to see resources we have write access to.

Other notable elements for MetricDescriptors are how we are filtering input projects:

createForEach:
- kind: inventory-manager.edgelq.com/Project
  version: v1
  filter: multiRegionPolicy.defaultControlRegion="$myRegionId"
  varRef: project

First, we use inventory-manager.edgelq.com/Project instances, not iam.edgelq.com/Project. This way we can be sure we don’t get PermissionDenied, once (it is our service after all). We can skip the filter for enabledServices CONTAINS this way.

Another notable element is the filter, we get projects only from our region only. It is recommended to create per-project fixtures this way in multi-region env. If our service is in many regions, then each region will take its share of projects.

The last element is where the variable $myRegionId comes from. This is defined in the main.go file for the controller. If you take a look at the example: https://github.com/cloudwan/inventory-manager-example/blob/master/cmd/inventorymanagercontroller/main.go.

In the versioned constructor, you can find the following:

vars := map[string]interface{}{
    "myRegionId": envRegistry.MyRegionId(),
}

This is an example of passing some custom variables to the fixture controller.

Some simplified examples of client submitting logs can be found here, in the function keepSendingConnectivityMetrics: https://github.com/cloudwan/inventory-manager-example/blob/master/cmd/simple-agent-simulator/agent.go

Usage registration

Service monitoring.edgelq.com, apart from being an optional registration option, has some other specific built-in registration already. We talk here about usage metrics:

  • Number of open calls being currently processed and not concluded (more useful for long-running streams!)
  • Request and response byte sizes (uncompressed protobufs)
  • Call durations, in the form of Distributions, to catch all individual values.
  • Database read and write counts.
  • Database resource counters (but these are limited only to those tracked by Limits service).

SPEKTRA Edge platform creates metric descriptors for each service separately in this fixture file:

Resource descriptors are also defined per service:

This way, we can have separate resource types like:

  • custom.edgelq.com/server
  • another.edgelq.com/server
  • etc.

From these fixtures, you can learn what metrics your backend service will be submitting to monitoring.edgelq.com.

Notable things:

  • All usage metrics go to your service project, where the service belongs (along with its ServiceAccount).
  • To track usage by each tenant project, all metric descriptors have a user_project_id label. This will contain the project ID (without the projects/ prefix) for which a call is accounted for.
  • User project ID labels for calls are computed based on the requestPaths object in requests!

To ensure the backend sends usage metrics, it is necessary to include this in the main.go file. For example, for Inventory Manager, in server main.go we have an InitServerUsageReporter call, find it in https://github.com/cloudwan/inventory-manager-example/blob/master/cmd/inventorymanagerserver/main.go. When constructing a store, you need to add a store and cache plugin, NewUsageStorePlugin. You can grep this string in the main.go file as well.

This describes all minimum registration needed from the developer.

There is some coding customization available though: It is possible to customize how user_project_id is extracted. By default, the usage component uses auto-generated method descriptors (in client packages), which are generated based on requestPaths in API skeletons. It is possible to customize this by implementing additional functions to generate objects. An example can be found here: https://github.com/cloudwan/edgelq/blob/main/monitoring/client/v4/time_serie/time_serie_service_descriptors.go.

For a client msg handle, we can define the UsageOverrideExtractUserProjectIds function, then from a request object extract the project ID where usage goes. If possible, it is however better to skip to defaults with api-skeleton.

Logging registration

Logging registration is another optional one and is even simpler than monitoring. It is recommended to use logging.edgelq.com if there is a need for non-numerical time series like data (logs).

Service developer needs to:

  • Define fixtures with LogDescriptor instances to be created per each project (optionally for service or organization). Defining per project may enable in the future some per-project customizations.
  • File main.go for the controller will need, traditionally, relevant Go package (now it is github.com/cloudwan/edgelq/logging/access/v1/log_descriptor).
  • Complete configuration of fixtures in controller config.
  • Use logging API from Edge agent runtime (or even any runtime if they want/need it, edge agents are just the most typical).

In InventoryManager we have an example:

It is similar to monitoring but simpler.

Limits registration

Service limits.edgelq.com allows to limit the number of resources that can be created in a Project, to avoid system overload, or because of contractual agreements.

Limitations:

  • Only resources under projects can be limited
  • Limit object is created per unique combination of Project, Region, and Resource type.

Therefore, when integrating with limits, it is highly recommended (again) to work primarily with Projects, and then model resources keeping in mind that only the total count of them (in a region) is limited. For example, we can’t limit the number of “items in an array in a resource”. If we need to, we should create a child resource type, and provide a limited number of these that can be created in a project/region entirely.

With those pre-conditions, the remaining steps are rather simple to follow, we will go one by one.

First, we need to define service plans. It is necessary to provide default plans for organizations and projects too. This should be done again with fixtures, as we have in this example: https://github.com/cloudwan/inventory-manager-example/blob/master/fixtures/v1/inventory_manager_plans.yaml.

As always, this requires importing the relevant package in main.go, and entry in config file. As in https://github.com/cloudwan/inventory-manager-example/blob/master/deployment/controller-config.yaml.

The service plan will be assigned automatically to the service during initial bootstrapping by limits.edgelq.com. Organization plans will be at least used by “top” organizations (those without parent organizations). They will have one of the organization plans assigned. Organizations from this point can define either their plans or continue using defaults provided by a service via fixtures.

When someone creates a resource under a project, the server needs to check whether it exceeds its limit, if it does, then the server must reject the call with a ResourceExhausted error. Similarly, when the resource is deleted, limit usage should decrease. This must happen on a Store level, not an API server. Resources often can be created or deleted not via standard Create/Delete calls, but custom methods. We need to track each Save/Delete call on the store level. SPEKTRA Edge provides relevant modules already though. If you look at the file here: https://github.com/cloudwan/inventory-manager-example/blob/master/cmd/inventorymanagerserver/main.go, you should notice that, when we construct a store (via NewStoreBuilder), we are adding a relevant plugin (find NewV1ResourceAllocatorStorePlugin). It injects necessary behavior, it checks the local limit tracker and ensures its value is in sync. Version ‘v1’ corresponds to the limits service version, not 3rd party service.

There is also a need to maintain synchronization between SPEKTRA Edge-based service using Limits and limits.edgelq.com itself. Ultimately, it is limits.edgelq.com where limit configuration is happening. For this reason, it is required that the service using Limits exposes an API that Limits can understand. This is why, in the main.go file for a server runtime, you can find the mixin limits server instantiation (find NewLimitsMixinServer) call. It needs to be included.

Also, for limit synchronization, we need a controller module provided by the SPEKTRA Edge framework. By convention, this is a part of the business logic controller. You can find it example here: https://github.com/cloudwan/inventory-manager-example/blob/master/cmd/inventorymanagercontroller/main.go

Find the NewLimitsMixinNodeManager call - this must be included, and the created manager must be run along with others.

Limits mixin node manager needs its entry in controller config, as in https://github.com/cloudwan/inventory-manager-example/blob/master/config/controller.proto.

There is one very common customization required for limit registration only. By default, if limits service is enabled, then ALL resources under projects are tracked. Sometimes it may not always be intended, and resources should not be limited. As of now, we can do this via code, we need to provide a function for the resource allocator.

We have an example in InventoryManager again: https://github.com/cloudwan/inventory-manager-example/blob/master/resource_allocator/resource_allocator.go.

In this example, we are creating an allocator that does not count usage if the resource type is ReaderAgent. It is also possible to filter out specific fields and so on. This function is called for any creation, update (if for some reason resource switches from/to counted to/from non-counted!), or deletion.

This ResourceAllocator is used in the main.go function in server runtime, we are passing it to the store plugin.