Registering your Service to the SPEKTRA Edge platform
While goten provides a framework for building services, SPEKTRA Edge provides a ready environment with a set of common, pre-defined set of services. This document describes a selected set of specific registrations needed by the developer, other services can and should typically be used with the standard API approach.
Integration with SPEKTRA Edge is practically enforced/recommended on multiple levels:
- Your service needs to register itself in meta.goten.com, otherwise it can’t simply work.
- Your resources model must be organized around the following top resources:
- meta.goten.com/Service
- iam.edgelq.com/Organization
- iam.edgelq.com/Project
- For multi-tenants, you need to have your Project resource in the api-skeleton.
- You need to follow authentication & authorization model of iam.edgelq.com.
- Although you may skip it, it is highly recommended to use audit.edgelq.com to Audit the usage of service, and monitoring.edgelq.com to track its usage. Your service activities in core SPEKTRA Edge are monitored by those services.
- If a service needs to control the amount of resources, limits.edgelq.com is highly recommended.
The above list contains mandatory or highly recommended registrations, but
practically all services are at your disposal. SPEKTRA Edge provides also
edge devices with their own OS, where you can deploy your agent applications.
Hardware and containers are managed using services devices.edgelq.com
and applications.edgelq.com
.
Service with a high level of registration example: https://github.com/cloudwan/inventory-manager-example
This provides more insights into how custom services can be integrated with core SPEKTRA Edge services.
Fixtures controller
Before jumping into SPEKTRA Edge registration, one common element of all registrations is the fixtures controller.
The fixtures controller is responsible for creating & updating resources in various services that are needed for:
-
Correct operation of an SPEKTRA Edge Service. Example: Service
iam.edgelq.com
needs a list of permissions from each service, that describe what users can do in a given service. If permissions are not maintained in IAM, then SPEKTRA Edge will have trouble helping with Authorization. It would render the Service non-operable as a result. As part of the bootstrapping Service, Permission fixtures must be submitted by interested Service. -
Correct operation of a Project or Organization.
Example: The user who created a given Project/Organization automatically gets an administrator RoleBinding resource in the created Project or Organization. Without it, the creator of a Project/Organization would not be able to access their entity. It would render it non-operable.
Some fixtures are a bit more dynamic. For example, when an existing Project is enabling some particular service, then a given Service automatically gets RoleBinding in a project, which allows the Service to manage its resources that are associated with the Service. Without it, Service would not be able to provide services to a project, rendering it non-operable.
Those cases are handled by the fixtures controller, by convention, the fixtures controller is part of controller runtime.
Be aware, that the fixtures controller not only keeps in sync by creating/updating resources. It also detects if there is UNNEEDED fixture that is not defined, but exists, it is then deleted. This is necessary to clean up the garbage, as, in proper conditions, it also has the potential to make the Service/Project/Organization non-operable and full of errors.
The Fixtures Controller works in this way: It computes a DESIRED set of resources. Then it uses CRUD to get the observed state, and compares it with desired, finally executes a set of Create/Update/Delete calls as necessary. If there is a dynamic change in the desired state, the controller computes & executes a new set of commands. If there is a dynamic change in the observed state, the fixtures controller will attempt to fix it.
Fixtures are a set of YAML files in the fixtures
directory. They are
either completely static or templated (have <VARIABLE>
elements).
Templated fixtures are created < FOR EACH >
Project, or Organization,
or Service - typically, but not limited to. Those “for each” fixture
provide a source of dynamic updates to the desired state.
Fixtures are built into the controller image during compilation. Then config file decides the rest, like how variables are resolved. See basic fixtures controller config in: https://github.com/cloudwan/inventory-manager-example/blob/master/config/controller.proto.
For fixtures, for every resource type, it is necessary to include an access package for related resources. For example, see https://github.com/cloudwan/inventory-manager-example/blob/master/cmd/inventorymanagercontroller/main.go, and the fine import types needed by the fixture controller! This list must include all resource types fixtures the controller can create OR requires (via forEach directive in the config file).
During each registration, we will explain all discuss various fixtures:
- For IAM registration, we will define some static fixtures
- For adding projects, we will show examples of “synchronized collections”, dynamic fixture examples.
- For Monitoring registration, we will show some static and dynamic (per project) fixtures.
- For Logging registration, we will show again some per-project fixtures.
- For Limits, we use plans as static fixtures.
What your service is authorized to do
Your service uses IAM ServiceAccount which will have its own assigned RoleBindings. For any SPEKTRA Edge-based service, you will be allowed to:
- Do anything in your service namespace:
services/{service}/
. Your ServiceAccount will be marked as owner, so you will be able to do anything there. This applies to Service resources in all services, including core SPEKTRA Edge. - Do anything in the root scope (
/
), AS LONG AS permissions are related to your service. So for example, if your ServiceAccount wants to execute some Create for a resource belonging to your service, it will be able to. But not in other services, and especially not in core SPEKTRA Edge services. - ServiceAccount can do some things in
projects/$SERVICE_PROJCT_ID
, depending on the role you are assigned to when making the initial service reservation as described in the preparation section. - For core SPEKTRA Edge services, you will be able to have read access to
all resources in the root scope (
/
), as long as they will satisfy the following filter condition:metadata.services.allowedServices CONTAINS $YOUR_SERVICE
. - For core SPEKTRA Edge services, you will be able to have write access to all
resources in the root scope (
/
), as long as they satisfy the following filter condition:metadata.services.owningService == $YOUR_SERVICE
. - You will be able to create resources in projects that enable your service
in its
enabled_services
field. But they will have to specifymetadata.services.owningService = $YOUR_SERVICE
if we talk about core SPEKTRA Edge service’s resources.
These rough permissions above must be remembered when you start making
requests from your service. Those limitations are reflected in various
examples (for example, when you create ServiceAccount for a project,
you need to specify proper metadata.services
).
IAM registration
Introduction
Service iam.edgelq.com
handles all actor collections (Users,
ServiceAccounts, Groups), tenants (Organizations, Projects), permission
related (Permissions, Roles), finally binds actors with permissions within
tenant scopes (RoleBinding).
The only tenant-type resource not in iam.edgelq.com
is Service, which
resides in meta.goten.com
. It is still treated as a kind of tenant from
the IAM point of view.
The primary point of registration between IAM and any SPEKTRA Edge-based
service is permission-related. Permissions are generated for all services,
for each API method (any API group). The typical format of permission is
the following:
services/$SERVICE_NAME/permissions/$COLLECTION_NAME.$ACTION_VERB
. If
some method has no resource configured in the API skeleton (no
opResourceInfo
value!), then permission has a name of this format:
services/$SERVICE_NAME/permissions/$ACTION_VERB
.
Variable $SERVICE_NAME
is naturally a service name in domain format,
$COLLECTION_NAME
is a lowerPluralCamelJson format of resource collection
(examples: role bindings, devices…), finally $ACTION_VERB
is equal to
the value of verb
of the method in the api-skeleton file. For example,
the action CreateRoleBinding
operates on the roleBindings collection,
the verb is create
, and the service, where the action is defined, is
iam.edgelq.com
. Therefore, the permission name is
services/iam.edgelq.com/permissions/roleBindings.create
.
Another popular permission type is the “attach” kind. Even if the permission
holder can create/update a resource if that the resource has references
to different ones, then authorization must also validate actor can create
a reference relationship. For example, the caller can create a RoleBinding
thanks to the services/iam.edgelq.com/permissions/roleBindings.create
permission, but reference to a Role requires that holder also has permission
services/iam.edgelq.com/roles!attach
.
You should be already familiar with the IAM model, using its README.
What is provided during generation
During Service code generation, the IAM protoc plugin analyzes a service
and collects all permissions that need to exist. It creates a file in
the fixtures
directory, with the name
<service_short_name>.pb.permissions.yaml
. Apart from that, it also
generates Authorization middleware for your server specifically.
Authorization middleware extracts WHAT for each call:
- Collection (typically
parent
field) for collection type methods (isCollection
= true in API-skeleton) - Resource name (typically
name
field) for single resource non-collection type methods (isCollection
andisPlural
= false) - Resource names (typically
names
field) for plural non-collection methods (BatchGet examples!isCollection
is false,isPlural
true).
To get this WHAT
, it uses by default values provided in the API skeleton:
Param opResourceInfo.requestPaths
in an Action declaration. Note CRUD has
implicit built-ins. It gets authenticated principal from the current context
object (associated with the call) and attaches permission related to
the current call. It uses the generic Authorizer component to verify if
the request should pass or be denied.
Minimal registration required from developers
This whole registration is almost out of the box. The minimal elements to do are:
- Developers need to create an appropriate
main.go
file for the server, with Auth-related modules. In the constructor for the main service server handlers, Authorization middleware must be added to the chain, all according to the example InventoryManager. - Developers are highly recommended to write their role fixtures per
their service (Static fixture). Roles are necessary to bind users with
permissions. Roles should be well-thought-out. Inventory manager has
basic roles for users and specific limited role examples for agent
application, with access to clearly defined resources within tenant
project. Although there is a fixture called
<service_short_name>.pb.default.roles.yaml
provided, they are very limited and usually a “bad guess”. Usually, we create a file called<service_short_name>_roles.yaml
for manually written ones. - Developers must configure at the minimum two fixture files:
<service_short_name>_roles.yaml
(or<service_short_name>.pb.default.roles.yaml
), then<service_short_name>.pb.permissions.yaml
.
Fixture controller registration requires two parts. First, in the main.go
file for a controller, it is required to import
github.com/cloudwan/edgelq/iam/access/v1/permission
and
github.com/cloudwan/edgelq/iam/access/v1/role
. Those packages contain
modules that are imported by the fixtures controller framework provided by
Goten/SPEKTRA Edge. The fixtures controller analyzes YAML files and tries
to find in the global registry associated types, without it, a program
will crash.
Second, in a config file of the controller, you need to define fixture file paths. You can copy-paste them from the inventory manager example, like:
fixtureNodes:
global:
manifests:
- file: "/etc/lqd/fixtures/v1/inventory_manager.pb.permissions.yaml"
groupName: "inventory-manager.edgelq.com/Permissions/CodeGen"
parent: "services/inventory-manager.edgelq.com"
- file: "/etc/lqd/fixtures/v1/inventory_manager_roles.yaml"
groupName: "inventory-manager.edgelq.com/Roles"
parent: "services/inventory-manager.edgelq.com"
It will be mentioned in the deployment document, but by convention,
the fixtures directory is placed in the /etc/lqd
path.
Two notes:
- groupName is mandatory and generally should be unique. This helps in case there is more than one fixture file for the same resource type, to ensure they don’t clash. Still, resource names also must be unique.
- The parent field is mandatory in this particular case too, here,
the fixtures controller gets a guarantee that all Roles and Permissions
have the same parent resource called exactly
services/inventory-manager.edgelq.com
(in this case). Note that a Service has only access to scopes it owns. Without this parent value specified, we would get PermissionDenied error. We will also get aPermissionDenied
error if, in the fixture file, we would attempt to create a Role or Permission with a different parent.
Using this example, we should clarify yet another thing: The Fixtures
controller not only creates/updates resources that are defined in
the fixtures. It also DELETES those that are not defined within fixtures.
This is why we have groupName
and parent
. For example, if there was
a Role, which groupName
is equal toinventory-manager.edgelq.com/Roles
,
and its parent is equal to services/inventory-manager.edgelq.com
, and
it would not exist within the fixture file as defined by
/etc/lqd/fixtures/v1/inventory_manager_roles.yaml
, it WOULD BE DELETED.
This is why params groupName or parents play an important role here, and
why we would get PermissionDenied without parents. The fixtures controller
always gets the observed state to compare against the desired one. This
observed state is obtained using regular CRUD, and this is why we need to
specify a parent for Roles/Permissions, the service will not be authorized
if it tries to get resources from ALL services.
So far we explained the mandatory part of IAM registration. The first common additional registration, although a very small one, is to declare some actions of a Service public. An example is here: https://github.com/cloudwan/inventory-manager-example/blob/master/fixtures/v1/inventory_manager_role_bindings.yaml
We are granting some public role to all authenticated users, regardless of
who they are (but they are users of our service). This requires a separate
entry in fixtures and import in main.go
for RoleBinding (access packages).
More advanced IAM registration
In this topic, there are two things extra that are offered:
- IAM provides a way to OVERRIDE generated Authorization middleware.
Developers can define additional protobuf files with special
annotations in their
proto/$VERSION
directory, that will be merged on generated/assumed defaults. - Some fields in resources can be considered sensitive from a reading or writing perspective. Developers can define custom IAM permissions that are required to be owned to write to/read from them. Permissions and protected fields can be defined in protobuf files.
Starting from the first part, overriding Authorization defaults. By convention,
we create an authorization.proto
file along with others. Some simple
examples:
- https://github.com/cloudwan/inventory-manager-example/blob/master/proto/v1/authorization.proto
- https://github.com/cloudwan/edgelq/blob/main/iam/proto/v1/authorization.proto
- https://github.com/cloudwan/edgelq/blob/main/audit/proto/v1/authorization.proto
Example service provides a first basic example: To disable Authorization
altogether for a given action, you just need to provide a skip_authorization
annotation flag for a specific method, in a specific API group. Since this
example is a little too simplified, examples for Audit and IAM were provided
as being more interesting.
For example, take the ListActivityLogs
method:
{
name : "ListActivityLogs"
action_checks : [ {
field_paths : [ "parents" ]
permission : "activityLogs.list"
} ]
}
There is an important problem with this particular method: SPEKTRA Edge
code-generation supports the collection, single resource, or multi-resource
request types. However, in ListActivityLogsRequest we have a plural parents
field because we are enabling users to query from multiple collections
at once. This is a kind of isPluralCollection
type. But such an annotation
does not exist in api-skeleton. However, there is some level of enhancement:
we can explicitly tell IAM to use the “parents” field path, and it will
authorize all individual paths from this field. If the user does not have
access to any of the parents, they will receive a PermissionDenied error.
There is also the possibility to provide multiple field paths (but only one will be used).
Another interesting case example, is CreateProject:
{
name : "CreateProject"
action_checks : [ {
field_paths : [ "project.parent_organization" ]
permission : "projects.create"
} ]
}
In api-skeleton, Project and Organization are both “top” resources. Their
name patterns are: projects/{project}
and organizations/{organization}
.
Judging by these, the creation project should require permission on the system
level and, the same for the organization. However, in practice we want
projects to be final tenants and organizations’ intermediaries. Note that
Organization and Project resources have a parent_organization
field.
Especially for organization resources, it is not possible to specify that
the parent of the Organization is “Organization”. Name pattern cannot be
like: organizations/{organization}/organizations/{organization}/...
.
Therefore, from a naming perspective, both projects and organizations are
considered to be “top” resources. However, when it comes to creation,
IAM Authorization middleware should make an exception, and take authorization
scope object (WHERE) from a different field path, in the case of CreateProject,
it must be project.parent_organization
. This changes generated code of
Authorization for CreateProject, and permission is required in the parent
organization scope instead.
To declare sensitive fields in resources, it is necessary to use
annotations.iam.auth_checks
annotations. There are no current examples in
InventoryManager, but there are some examples in secrets.edgelq.com
:
As of now, there is:
option (annotations.iam.auth_checks) = {
read_checks : [
{permission : "mask_encrypted_data" paths : "enc_data"},
{permission : "secrets.sensitiveData" paths : "data"}
]
};
Note you also need to include also edgelq/iam/annotations/iam.proto
import
in the resource proto file.
When the secret is being read, then additional permissions may be checked:
services/secrets.edgelq.com/permissions/mask_encrypted_data
, if denied, field pathenc_data
will be cleared from response object.services/secrets.edgelq.com/permissions/secrets.sensitiveData
, if denied, field pathdata
will be cleared from the response object.
Those read checks apply to all methods that contain resource bodies in response, therefore, even UpdateSecret or CreateSecret responses would have fields cleared. However, it will mostly be used to clear values from List/Search/Get/BatchGet responses.
Param set_checks
are just like read_checks
, but work in reverse.
Note that you can specify multiple paths.
Users are generally free to pick any permission name for set/read checks,
but it is recommended to follow secrets.sensitiveData
than
mask_encrypted_data
.
To have a full document about iam-related protobuf annotations, you can access it here: https://github.com/cloudwan/edgelq/blob/main/iam/annotations/iam.proto.
Adding projects (tenants) to the service
For multi-tenant cases, it is recommended to copy Project resources from iam.edgelq.com into 3rd party service. You need a Project resource declared yourself in api-skeleton. This copying, or syncing was already mentioned in some places in developer-guide, as collection synchronization.
Service based on SPEKTRA Edge should copy only these projects, which are
enabling that particular service (in enabled_services
list). Note that
services based on SPEKTRA Edge can only filter projects/organizations that
are using particular services themselves.
Once the project instance copy is in the service database, it is assumed that it is now able to use that service. If project removes service from allowed, then its copy is removed from the service database (garbage collecting).
An example of registration is in InventoryManager. Integration steps:
- In API-skeleton, we are adding Project resources.
- We need a special fixture for a project, like here: https://github.com/cloudwan/inventory-manager-example/blob/master/fixtures/v1/inventory_manager_project.yaml
- In the
main.go
file for the controller application, we need to import two modules:github.com/cloudwan/edgelq/iam/access/v1/project
andgithub.com/cloudwan/inventory-manager-example/access/v1/project
(for the inventory manager example, each service, of course, needs its import). - Finally, in the controller config file, we need to set up the appropriate manifest, with example https://github.com/cloudwan/inventory-manager-example/blob/master/deployment/controller-config.yaml
Let’s copy and paste part of the config and discuss it more:
fixtureNodes:
global:
manifests:
- file: "/etc/lqd/fixtures/v1/inventory_manager_project.yaml"
groupName: "inventory-manager.edgelq.com/Projects"
createForEach:
- kind: iam.edgelq.com/Project
version: v1
filter: "enabledServices CONTAINS \"services/inventory-manager.edgelq.com\""
varRef: project
withVarReplacements:
- placeholder: <project>
value: $project.Name.ProjectId
- placeholder: <multiRegionPolicy>
value: $project.MultiRegionPolicy
- placeholder: <metadataLabels>
value: $project.Metadata.Labels
- placeholder: <metadataServices>
value: $project.Metadata.Services
- placeholder: <title>
value: $project.Title
As always, we need to provide file
and groupName
variables. Note that
the resource we are creating in this fixture belongs to our service:
inventory-manager.edgelq.com/Project
. Because it is ours, the service
does not need an additional parent or filter to be authorized correctly,
so those parameters are not necessary here.
We have some new elements though, first is the createForEach
directive.
It instructs to create fixtures defined in a mentioned file
for each combination of input resources. In this case, we have one input
resource, and its type is iam.edgelq.com/Project
, in version v1
. Our
service cannot list all IAM projects, but it can list them if they enable
our service, therefore we are passing the proper filter param. Besides,
we should create project copies only for projects interested in our service
anyway. Each instance of iam.edgelq.com/Project
is remembered as project
variable (as indicated by varRef
).
When fixtures are evaluated from file
/etc/lqd/fixtures/v1/inventory_manager_project.yaml
per each iam project,
we need to replace all variables, so the final YAML is produced. This example
above should be relatively self-explanatory. You may note, however, that
you can extract IDs from names, and take full objects (fixtures variables are
not limited to primitives), maps, or slices.
There is however one more important aspect: Project admins cannot by default add your service to their enabled list. This is to prevent the attachment of a private service to a project, it may be against the service maintainer’s wishes. To allow someone to create/update a project/organization using your service, you will need to create a RoleBinding:
cuttle iam create role-binding \
--service $YOUR_SERVICE \
--role 'services/iam.edgelq.com/service-user' \
--member $ADMIN_OF_ORGS_AND_PROJECTS
Provided user from now on can create new organizational entity that uses your service.
Audit registration
Overview
SPEKTRA Edge provides a LogsExporter component, which is part of observability. It records selected API calls (unary and streams), and submits them to audit.edgelq.com. All activity or resource change logs are classified as service, organization, or project scoped. Out of these 3, service logs are default, if the method call was not classified as neither project nor organization.
Scope classification is relatively simple: When a unary request arrives, the logs exporter analyzes the request, extracts resource name(s) and collection, and decides what is the scope of the request (project, organization, or service). Resource change logs are submitted just before the transaction is concluded, if logs could not have been sent, the transaction fails. This is to ensure that we always track resource change logs at least. Activity logs are submitted in a manner of seconds after the request finishes, which allows some degree of lost messages. In practice, it does not happen often.
For streams, Audit examines client and server messages before deciding how activity logs should look like.
Resource change logs are submitted based on transaction lifespan regardless of grpc method streaming kinds.
Minimal registration
The audit requires minimal effort from developers to include in its default
form. They just need to put a little initialization in the main.go
file
for a server runtime, as in the example InventoryManager service. You can
see it in
https://github.com/cloudwan/inventory-manager-example/blob/master/cmd/inventorymanagerserver/main.go.
Find the following strings:
NewAuditStorePlugin
is necessary to add to a store handle. It is a plugin that observes changes on DB.InitAuditing
is necessary to initialize the Audit Logs exporter that your server will use. You need to pass all relevant handlers (code-generated).
Audit handlers are code generated based on method annotations (therefore, the API skeleton decides normally). There are the following defaults:
- Api-skeleton annotations
opResourceInfo.requestPaths
andopResourceInfo.responsePaths
are used to determine what field paths in request/response objects contain values that would be interesting from an Audit point of view. - Audit by default focuses on auditing all writing calls. It checks
the api-skeleton annotation
withStoreHandle
in each action. If the transaction type is SNAPSHOT or MANUAL, then the call will be logged, not otherwise. - By default, activity log types will be always classified as some kind of writes. Other kinds require manual configuration.
From this point on Audit will work, and service developers will be able to query for logs from their service. Let’s discuss list of possible customizations.
Customizations on the proto files level
Generally, full proto customizations can be found here:
https://github.com/cloudwan/edgelq/blob/main/audit/annotations/audit.proto
You will need to include the edgelq/audit/annotations/audit.proto
import
to use any audit annotations.
The most common customization is the categorization of write activities for a resource. Activity logs have categories: Operations, Creations, Deletions, Spec Updates, State Updates, Meta Updates, Internal, Rejected, Client and Server errors, Reads.
Note that write categories are quite a few: creations, deletions, and three different update kinds. Creations and deletions are easy to classify, but updates are not so much. When a resource is updated, the Audit Logs exporter examines a different object and determines which fields changed, and which not. To determine the update kind, it needs to know which fields are related to spec, which state, and which are meta. This has to be defined within the resource protobuf definition.
It is like the following:
message ResourceName {
option (ntt.annotations.audit.fields) = {
spec_fields : [
"name",
"spec_field",
"other_spec_field"
]
state_fields : [ "some_state_field" ]
meta_fields : [ "metadata", "other_meta" ]
hidden_fields : [ "sensitive_field", "too_big_field" ]
};
}
We must classify all fields. Normally, we put “name” as a spec, and “metadata” as a meta field. Other choices are up to the developer. On top of spec/state/meta, we also can hide some fields from Audit at all (especially if they are sensitive, or big and we want to minimize log sizes).
Note that hidden_fields
can also be defined for any messages, including
request/response objects. Some example from SPEKTRA Edge:
https://github.com/cloudwan/edgelq/blob/main/common/api/credentials.proto.
See annotations for ServiceAccount, we are hiding private key objects
for example, as this would be too sensitive to include in Audit logs.
Be aware of what is being logged!
You can define field specifications on the resource level, or any nested object too.
Going back to update requests: Spec update takes importance over the state, then state over meta. Therefore, if we detect update that modifies one meta, two state, and one spec field, the update is classified as spec update.
Another part of customization developers may find useful, is to ability to attach labels to activity/resource change logs. Those logs can be queried (filtered) by service, method name, API version, resource name/type on which method operates (or which changed), category, and request ID… However, you can notice that resource change and activity logs also have a “labels” field, which is a generic map of strings. This can hold any labels that were defined by developers. Most common way of defining labels can be in request/response objects:
message ActionNameRequest {
option (ntt.annotations.audit.fields) = {
labels : [
{ path : "field_a", key: "label_a" },
{ path : "field_b", key: "label_b" }
]
promoted_labels : [
{ label_keys : [ "label_a" ] }
]
};
string field_a = 1;
string field_b = 2;
}
With this, you can start querying Activity logs like:
{parents: ["projects/tenant1"], filter: "service.name = \"custom.edgelq.com\" AND labels.field_a = \"V1\""}
This query above will also be optimized (index will be created, according
to the promoted_labels
value).
Note that each promoted label set require also service name and parent to be indexed!
Apart from field customization, developers can customize how Audit Logs
Exporter handles method calls. We are typically creating the file
auditing.proto
in the proto/$VERSION
directory for a given service.
There we declare file-level annotation
ntt.annotations.audit.service_audit_customizations
.
Examples in SPEKTRA Edge:
- https://github.com/cloudwan/edgelq/blob/main/devices/proto/v1/auditing.proto
- https://github.com/cloudwan/edgelq/blob/main/proxies/proto/v1/auditing.proto
- https://github.com/cloudwan/edgelq/blob/main/monitoring/proto/v4/auditing.proto
Starting with the device’s service, for example for
ProvisioningPolicyService
, method ProvisionDeviceViaPolicy
. As of now,
we have annotations like:
{
name : "ProvisionDeviceViaPolicy"
activity_type : WriteType
response_resource_field_paths : [ "device.name" ]
}
Method ProvisionDeviceViaPolicy
has in api-skeleton:
actions:
- name: ProvisionDeviceViaPolicy
verb: provision_device_via_policy
withStoreHandle:
readOnly: false
transaction: SNAPSHOT
By default, opResourceInfo
has these values for the action:
opResourceInfo:
name: ProvisioningPolicy # Because this action is defined for this resource!
isCollection: false # Default is false
isPlural: false # Default is false
# For single resource non-collection requests, defaults for paths are determined like below:
requestPaths:
resourceName: [ "name" ]
responsePaths: {}
You can find request/response object definitions in: https://github.com/cloudwan/edgelq/blob/main/devices/proto/v1/provisioning_policy_custom.proto
This method primarily operates on the ProvisioningPolicy resource, and the exact resource can be extracted from the “name” field in the request. By default, Audit would decide that the primary resource for Activity logs for these calls is ProvisioningPolicy. The following Audit specification would be implicitly assumed:
{
name : "ProvisionDeviceViaPolicy"
activity_type : WriteType # Because withStoreHandle api-skeleton annotation tells it is a SNAPSHOT
request_resource_field_paths : [ "name" ] # Because this is what requestPaths api-skeleton annotation tells us.
}
However, we know that this method takes the ProvisioningPolicy object, but
creates a Device resource, and the response object contains the Device
instance. To ensure that the field resource.name
in Activity logs points
to a Device, not ProvisioningPolicy, we write that
response_resource_field_paths
should point to device.name
.
To be able to still query Activity logs by ProvisioningPolicy, we also attach annotation to request object:
option (annotations.audit.fields) = {
labels : [ {key : "provisioning_policy_name" path : "name"} ]
};
This is one example modification of default behavior.
We can also disable auditing for particular methods entirely: Again in
auditing.proto
for the Devices service you may see:
{
name : "DeviceService"
methods : [ {name : "UpdateDevice" disable_logging : true} ]
},
The reason, in this case, is that, as of now, all devices are sending UpdateDevice each minute. To avoid too many requests to Audit, we have for now this disabled, till a solution is found (perhaps you already don’t see this part in auditing for devices).
In the auditing.proto
file for the Proxies service (
https://github.com/cloudwan/edgelq/blob/main/proxies/proto/v1/auditing.proto
), you may see something different too:
{
name : "BrokerService"
methods : [
{name : "Connect" activity_type : OperationType},
{name : "Listen" activity_type : OperationType}
]
}
In Broker API in API-skeleton, you can see that Connect and Listen are streaming calls, Listen is used by an Edge agent to provide access to other actors, and Connect is used by an actor to connect an Edge agent. Those calls are non-writing and, therefore would not be audited by default. To force auditing, and classify them as Operation kind, we specify this directly in the auditing file.
A final example that is good to see, is the auditing file for monitoring: https://github.com/cloudwan/edgelq/blob/main/monitoring/proto/v4/auditing.proto.
First, you can see that we are classifying some resources as INTERNAL
types, like RecoveryStoreShardingInfo
. It means that any writes to these
resources are not classified as writes, but as “internal”. This changes
the category in Activity logs, making it easier to filter out. Finally,
we are enabling reads auditing for ListTimeSeries call:
{
name : "TimeSerieService"
methods : [ {
name : "ListTimeSeries"
scope_field_paths : [ "parent" ]
activity_type : ReadType
disable_logging : false
} ]
}
Before finishing, it will be worth we have some extra customizations in the code for ListTimeSeries calls.
Customizations of Audit in Golang code
There is a package github.com/cloudwan/edgelq/common/serverenv/auditing
with some functions that can be used.
Most common examples can be summarized like this:
package some_server
import (
"context"
"google.golang.org/grpc/codes"
"google.golang.org/grpc/status"
"github.com/cloudwan/edgelq/common/serverenv/auditing"
)
func (srv *CustomMiddleware) SomeStreamingCall(
stream StreamName,
) error {
ctx := stream.Context()
firstRequestObject, err := stream.Recv()
if err != nil {
return status.Errorf(status.Code(err), "Error receiving first client msg: %s", status.Convert(err).Message())
}
// Lets assume, that request contains project ID, but it is somehow encoded AND not available
// from a field in a straight way. Because of this, we cannot provide protobuf annotation. We can
// do this from code however:
projectId := firstRequestObject.ExtractProjectId()
auditing.SetCustomScope(ctx, "projects/" + projectId) // Now we ensure this is where log parent is.
// We can set also some custom labels, because these were not available as any direct fields.
// However, to have it working, we will still need to declare labels in protobuf:
//
// message StreamNameRequest {
// option (ntt.annotations.audit.fields) = {
// labels : [ { key: "custom_label" } ]
// };
// }
//
// Note we specify only key, not path! But if we do this, we can then do:
auditing.SetCustomLabel(ctx, "custom_label", firstRequestObject.ComputeSomething())
// Now, we want to inform Audit Logs Exporter that this stream is exportable. If we did not do this,
// then Audit would export Activity logs only AFTER STREAM FINISHES (this function exits!). If this
// stream is long-running (like several minutes, or maybe hours), then it may not be the best option.
// It would be better to send Activity logs NOW. However, be aware that you should not call
// any SetCustomLabel or SetCustomScope calls after exporting stream - activity logs are "concluded"
// and labels can no longer be modified. New activity log events may be still being appended for each
// client and server message though!
auditing.MarkStreamAsExportable(ctx)
firstServerMsg := srv.makeFirstResp(stream, firstRequestObject)
if err = stream.Send(firstServerMsg); err != nil {
return status.Errorf(status.Code(err), "Error sending first server msg: %s", status.Convert(err).Message())
}
// There may be multiple Recv/Send here ...
return nil
}
By default, Activity logs record all client/server messages, each represents
an Activity Log Event object, appended to the existing Activity Log. It may
not always be the best choice if objects are large. For example, for
ListTimeSeries
, which is audited, we don’t need responses. The request
object contains elements like filter or parent, so we can predict/check
what data was returned from monitoring. In such a case, we can disable
appending ActivityLog (also, ListTimeSeriesResponse can be very large!):
func (r *ListTimeSeriesResponse) AuditShouldRecord() bool {
return false
}
The function AuditShouldRecord
can be defined for any request/response
object. Audit Logs Exporter will examine if they implement this method to
act accordingly.
We can also sample logs, we do this for ListTimeSeries. Since those methods are executed quite often, we don’t want too many activity logs for them. We implemented the following functions for request objects:
func (r *ListTimeSeriesRequest) ShouldSample(
ctx context.Context,
sampler handlers.Sampler,
) bool {
return sampler.ShouldSample(ctx, r)
}
func (r *ListTimeSeriesRequest) SamplingKey() string {
// ... Compute value and return
}
First, we need to implement ShouldSample, which gets the default sampler.
If ShouldSample returns true, then the activity is logged. The default
sampler requires a SamplingKey() string
implemented from an object. It
ensures that “new” requests are being logged, not similar to those before
(at least till TTL expires or cache lost entry).
Also, if some streaming calls are heavy (like downloading a multi-GB image), make sure these requests/responses are not logged at all! Otherwise, Audit may get fat.
Monitoring registration (and usage notes)
Monitoring is a bit simpler case than IAM or Audit. Unlike them, it does not integrate on a protobuf level and does not inject any code. The common registration is via metric/resource descriptors, followed by periodic time series submission.
It is up to the service to decide if there is a need for time-series numeric data with aggregations needed. If there is, then service developers need to:
- Declare MonitoredResourceDescriptor instances via fixtures file. Those resources are defined for the whole service.
- Declare MetricDescriptor instances via fixture file. Those resources must be created per each project using a service.
With descriptors created from the fixture controller, clients can start
submitting logs via CreateTimeSeries
calls. It is recommended to use
the cached client from Monitoring:
https://github.com/cloudwan/edgelq/blob/main/monitoring/metrics_client/v4/tsh_cached_client.go
This typically is used for agents running on edge devices, it is the responsibility of service developers to create relevant code. It is good to use the InventoryManager example.
Fixture files for this example service can be found here:
- https://github.com/cloudwan/inventory-manager-example/blob/master/fixtures/v1/inventory_manager_monitored_resource_descriptors.yaml
- https://github.com/cloudwan/inventory-manager-example/blob/master/fixtures/v1/inventory_manager_metric_descriptors.yaml
Notes:
- For MetricDescriptors, it is mandatory to provide value for
metadata.services
. The reason is, that the project is a separate entity from a Service, and can enable/disable the services it uses. Given limited access, the service should declare ownership of metric descriptors it is creating in a project. - As of now, in this example, the fixtures controller will forbid modifications of MetricDescriptors by project admins, for example, if they add some label or index, changes will be reverted to reflect these in fixtures. However, in the future, we plan to give some flexibility to mix user changes with fixtures. This can enable use cases, like additional indices that are usable for specific projects only. This allows per-tenant customizations. This is a good reason to keep MetricDescriptors are defined per project rather than per service.
- Because metric descriptors are created per each project, we call them dynamic fixtures.
File main.go
for a controller will need to import relevant Go packages
from Monitoring. Example is in
https://github.com/cloudwan/inventory-manager-example/blob/master/cmd/inventorymanagercontroller/main.go.
Packages needed are:
github.com/cloudwan/edgelq/monitoring/access/v4/metric_descriptor
github.com/cloudwan/edgelq/monitoring/access/v4/monitored_resource_descriptor
In this config file (
https://github.com/cloudwan/inventory-manager-example/blob/master/deployment/controller-config.yaml
) we can find usage of these two fixture files. Note that
MonitoredResourceDescriptors instances are declared with a parent.
This is again, like in IAM registration, ensuring that the fixtures
controller only gets the observed state from this particular sub-collection.
Resources MetricDescriptors don’t specify the parent field (we have multiple
projects!). Therefore, we must provide different mechanisms to ensure we get
access to metric descriptors we can access. We do this with filter
param:
We filter by metadata.services.owningService
value. This way we guarantee
to see resources we have write access to.
Other notable elements for MetricDescriptors are how we are filtering input projects:
createForEach:
- kind: inventory-manager.edgelq.com/Project
version: v1
filter: multiRegionPolicy.defaultControlRegion="$myRegionId"
varRef: project
First, we use inventory-manager.edgelq.com/Project
instances, not
iam.edgelq.com/Project
. This way we can be sure we don’t get
PermissionDenied, once (it is our service after all). We can skip
the filter for enabledServices CONTAINS
this way.
Another notable element is the filter, we get projects only from our region only. It is recommended to create per-project fixtures this way in multi-region env. If our service is in many regions, then each region will take its share of projects.
The last element is where the variable $myRegionId
comes from. This is
defined in the main.go
file for the controller. If you take a look at
the example:
https://github.com/cloudwan/inventory-manager-example/blob/master/cmd/inventorymanagercontroller/main.go.
In the versioned constructor, you can find the following:
vars := map[string]interface{}{
"myRegionId": envRegistry.MyRegionId(),
}
This is an example of passing some custom variables to the fixture controller.
Some simplified examples of client submitting logs can be found here,
in the function keepSendingConnectivityMetrics
:
https://github.com/cloudwan/inventory-manager-example/blob/master/cmd/simple-agent-simulator/agent.go
Usage registration
Service monitoring.edgelq.com, apart from being an optional registration option, has some other specific built-in registration already. We talk here about usage metrics:
- Number of open calls being currently processed and not concluded (more useful for long-running streams!)
- Request and response byte sizes (uncompressed protobufs)
- Call durations, in the form of Distributions, to catch all individual values.
- Database read and write counts.
- Database resource counters (but these are limited only to those tracked by Limits service).
SPEKTRA Edge platform creates metric descriptors for each service separately in this fixture file:
Resource descriptors are also defined per service:
This way, we can have separate resource types like:
- custom.edgelq.com/server
- another.edgelq.com/server
- etc.
From these fixtures, you can learn what metrics your backend service will be submitting to monitoring.edgelq.com.
Notable things:
- All usage metrics go to your service project, where the service belongs (along with its ServiceAccount).
- To track usage by each tenant project, all metric descriptors have a
user_project_id
label. This will contain the project ID (without theprojects/
prefix) for which a call is accounted for. - User project ID labels for calls are computed based on
the
requestPaths
object in requests!
To ensure the backend sends usage metrics, it is necessary to include this
in the main.go
file. For example, for Inventory Manager, in server
main.go
we have an InitServerUsageReporter
call, find it in
https://github.com/cloudwan/inventory-manager-example/blob/master/cmd/inventorymanagerserver/main.go.
When constructing a store, you need to add a store and cache plugin,
NewUsageStorePlugin
. You can grep this string in the main.go
file
as well.
This describes all minimum registration needed from the developer.
There is some coding customization available though: It is possible
to customize how user_project_id
is extracted. By default, the usage
component uses auto-generated method descriptors (in client
packages),
which are generated based on requestPaths
in API skeletons. It is
possible to customize this by implementing additional functions to generate
objects. An example can be found here:
https://github.com/cloudwan/edgelq/blob/main/monitoring/client/v4/time_serie/time_serie_service_descriptors.go.
For a client msg handle, we can define
the UsageOverrideExtractUserProjectIds
function, then from a request
object extract the project ID where usage goes. If possible, it is however
better to skip to defaults with api-skeleton.
Logging registration
Logging registration is another optional one and is even simpler than monitoring. It is recommended to use logging.edgelq.com if there is a need for non-numerical time series like data (logs).
Service developer needs to:
- Define fixtures with LogDescriptor instances to be created per each project (optionally for service or organization). Defining per project may enable in the future some per-project customizations.
- File
main.go
for the controller will need, traditionally, relevant Go package (now it isgithub.com/cloudwan/edgelq/logging/access/v1/log_descriptor
). - Complete configuration of fixtures in controller config.
- Use logging API from Edge agent runtime (or even any runtime if they want/need it, edge agents are just the most typical).
In InventoryManager we have an example:
-
Fixtures:
-
Config for the controller can be found here:
https://github.com/cloudwan/inventory-manager-example/blob/master/deployment/controller-config.yaml
-
Client usage example can be found in function
sendExampleCreateLogs
, which is in this file:
It is similar to monitoring but simpler.
Limits registration
Service limits.edgelq.com allows to limit the number of resources that can be created in a Project, to avoid system overload, or because of contractual agreements.
Limitations:
- Only resources under projects can be limited
- Limit object is created per unique combination of Project, Region, and Resource type.
Therefore, when integrating with limits, it is highly recommended (again) to work primarily with Projects, and then model resources keeping in mind that only the total count of them (in a region) is limited. For example, we can’t limit the number of “items in an array in a resource”. If we need to, we should create a child resource type, and provide a limited number of these that can be created in a project/region entirely.
With those pre-conditions, the remaining steps are rather simple to follow, we will go one by one.
First, we need to define service plans. It is necessary to provide default plans for organizations and projects too. This should be done again with fixtures, as we have in this example: https://github.com/cloudwan/inventory-manager-example/blob/master/fixtures/v1/inventory_manager_plans.yaml.
As always, this requires importing the relevant package in main.go
, and
entry in config file. As in
https://github.com/cloudwan/inventory-manager-example/blob/master/deployment/controller-config.yaml.
The service plan will be assigned automatically to the service during initial bootstrapping by limits.edgelq.com. Organization plans will be at least used by “top” organizations (those without parent organizations). They will have one of the organization plans assigned. Organizations from this point can define either their plans or continue using defaults provided by a service via fixtures.
When someone creates a resource under a project, the server needs to check
whether it exceeds its limit, if it does, then the server must reject
the call with a ResourceExhausted
error. Similarly, when the resource is
deleted, limit usage should decrease. This must happen on a Store level,
not an API server. Resources often can be created or deleted not via
standard Create/Delete calls, but custom methods. We need to track each
Save/Delete call on the store level. SPEKTRA Edge provides relevant modules
already though. If you look at the file here:
https://github.com/cloudwan/inventory-manager-example/blob/master/cmd/inventorymanagerserver/main.go,
you should notice that, when we construct a store (via NewStoreBuilder
),
we are adding a relevant plugin (find NewV1ResourceAllocatorStorePlugin
).
It injects necessary behavior, it checks the local limit tracker and ensures
its value is in sync. Version ‘v1’ corresponds to the limits service version,
not 3rd party service.
There is also a need to maintain synchronization between SPEKTRA Edge-based
service using Limits and limits.edgelq.com itself. Ultimately, it is
limits.edgelq.com where limit configuration is happening. For this reason,
it is required that the service using Limits exposes an API that Limits can
understand. This is why, in the main.go
file for a server runtime, you can
find the mixin limits server instantiation (find NewLimitsMixinServer
)
call. It needs to be included.
Also, for limit synchronization, we need a controller module provided by the SPEKTRA Edge framework. By convention, this is a part of the business logic controller. You can find it example here: https://github.com/cloudwan/inventory-manager-example/blob/master/cmd/inventorymanagercontroller/main.go
Find the NewLimitsMixinNodeManager
call - this must be included, and
the created manager must be run along with others.
Limits mixin node manager needs its entry in controller config, as in https://github.com/cloudwan/inventory-manager-example/blob/master/config/controller.proto.
There is one very common customization required for limit registration only. By default, if limits service is enabled, then ALL resources under projects are tracked. Sometimes it may not always be intended, and resources should not be limited. As of now, we can do this via code, we need to provide a function for the resource allocator.
We have an example in InventoryManager again: https://github.com/cloudwan/inventory-manager-example/blob/master/resource_allocator/resource_allocator.go.
In this example, we are creating an allocator that does not count usage
if the resource type is ReaderAgent
. It is also possible to filter out
specific fields and so on. This function is called for any creation, update
(if for some reason resource switches from/to counted to/from non-counted!),
or deletion.
This ResourceAllocator is used in the main.go
function in server runtime,
we are passing it to the store plugin.