This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Operating your Service

How to operate your SPEKTRA Edge service.

1: Deploying your Service
2: Migrating your Service to New Version

1 - Deploying your Service

How to deploy your service.

Once the service is developed well enough, you can deploy it. Quick visual recap is (regional deployment):

The large block on the right/top side (most of the image) is the SPEKTRA Edge-based service. Below you have various applications (web browsers or Edge agents) that can communicate with service or core SPEKTRA Edge services (left).

Service backend deployment is what we focus on in this part. The blue parts in this block are elements you had to develop, three different binaries we discussed in this guideline.

You will need to set up Networking & Ingress elements. Inside a cluster, you will need Deployments for API servers, controllers and db-controllers. Inside the cluster, you will need:

Database of course. Core SPEKTRA Edge provides a database for logging or monitoring metrics but for document storage. NoSQL database is needed. We typically recommend MongoDB as a cloud-agnostic option, but firestore also may be available for GCP.
Redis instance is needed for Node Managers for all Controllers (sharding!). Although arrows are missing, redis can optionally also be used as a Cache for DB.

If possible, in the Kubernetes environment type, it is highly recommended to use HorizontalPodAutoscaler for deployments.

In the Inventory Manager example which we will talk about, we assume we did everything on the Kubernetes cluster. Configuration of kubectl is assumed as its part of general knowledge and not-edgelq specific. Refer to the documentation online for how to create a cluster in Kubernetes and how to configure kubectl.

In the future, we may ship edgelq-lite images though, with instructions for local Kubernetes deployment.

Building images

We use docker build to ship images for backend services. We will use dockerfiles from Inventory Manager as examples.

You will need to build 4 images:

API Server (that you coded)
API Server Envoy proxy (part of API Server)
Controller
DbController

When making an API Server, each pod must contain 2 containers: One is the image of the server, which handles all gRPC calls. But as we mentioned many times, we also need to support:

webGRPC, so web browsers can access the server too, not just native gRPC clients
REST API, for those who prefer this way of communication

This may be handled by envoy proxy (https://www.envoyproxy.io/). They provide ready image sets. It handles webGRPC is pretty much out of the box with proper config. REST API requires a little more work. We need to come back to the regenerate.sh file, like in the InventoryManager example (https://github.com/cloudwan/inventory-manager-example/blob/master/regenerate.sh).

Find the following part:

protoc \
    -I "${PROTOINCLUDE}" \
    "--descriptor_set_out=${INVENTORYMANAGERROOT}/proto/inventory_manager.pb" \
    "--include_source_info" \
    "--include_imports" \
    "${INVENTORYMANAGERROOT}"/proto/v1/*_service.proto \
    "${DIAGNOSTICSPATH}"/proto/v1/*_service.proto

This generates a file inventory_manager.pb, which contains service descriptors from all files in a service, plus optionally diagnostics (part of SPEKTRA Edge repository) - if you want health check from grpc service available from REST.

This generated pb file must be passed to the created envoy proxy image. See the docker file for this: https://github.com/cloudwan/inventory-manager-example/blob/master/build/serviceproxy.dockerfile

We require the argument SERVICE_PB_FILE, which must point to that pb file. During image building, it will be copied to /var/envoy. This concludes the process of building an envoy proxy for a service.

The remaining three images can be constructed often with the same dockerfile. For InventoryManager, we have: https://github.com/cloudwan/inventory-manager-example/blob/master/build/servicebk.dockerfile

This example however is quite generic and may fit many services. We have two docker runs there. The first is for building - we use images with desired Golang installed already, ensuring some build dependencies. This build docker must copy the code repository and execute the build for the main binary. You can notice also the FIXTURES_DIR param, which MAY contain the path to the fixtures directory for your service. This must be passed when building controller images, not necessarily for server/db-controller ones.

In the second docker process (service), we will construct a simple image with minimal env, plus runtime binary, plus optionally fixtures directory (/etc/lqd/fixtures).

For a reference on how variables may be populated, see the skaffold file example (We use scaffold for our build). It is a good tool, we recommend, probably not necessarily mandatory: https://github.com/cloudwan/inventory-manager-example/blob/master/skaffold.yaml.

Note that we are passing the .gitconfig file there. This is mandatory to access private repositories (your service may be private. Also, at the moment of this writing, goten and edgelq are also private!). You may see also the main README for SPEKTRA Edge: https://github.com/cloudwan/edgelq/blob/main/README.md, with more info about building. Since a process may be the same, you may need to configure your own .gitconfig.

Note that the skaffold can be configured to push images to Azure, GCP, AWS, you name it.

Cluster preparedness

In your cluster, you need to prepare some machines that will host:

API Server with envoy proxy
Controller
DbController
Redis instance

In your cluster, you can also deploy MongoDB deployment, inside a cluster, or use managed services like MongoDB Atlas. If you use Managed Cloud, then MongoDB Atlas can be used to deploy instances being run on the same data center as your cluster.

When you get the MongoDB instance, remember its endpoint and get an authentication certificate. It is required to give admin privileges to the Mongo user. It will not only need to make reads/writes of regular resources but also create databases, and collections, configure these collections, and create and manage indices (from proto declarations to Mongo). This requires full access. It is recommended to make MongoDB closed and available from your cluster only!

An authentication certificate will be needed later during deployment, so keep it - as a PEM file.

If you use firestore instead of MongoDB, you will need to have a service account that also is an admin in firestore, that has access to index management. You will need to get Google credentials and remember Google project ID.

Networking

When you made a reservation for the SPEKTRA Edge service domain (Service project and service domain name), you reserved the domain name of your service in the SPEKTRA Edge namespace, but it’s not an actual networking domain. For example, iam.edgelq.com is the name of a Service object in meta.goten.com, but this name is universal, shared by all production, staging, and development environments. To reach IAM, you will have a specific endpoint for a specific environment. For example, one common staging environment we have has the domain stg01b.edgelq.com - and the IAM endpoint is iam.stg01b.edgelq.com.

Therefore, if you reserved custom.edgelq.com on the SPEKTRA Edge platform, you may want to have a domain like someorg.com. Then, optionally you may have subdomains defined, per various env types:

dev.someorg.com

and full endpoint may be custom.dev.someorg.com for development custom.edgelq.com service
stg.someorg.com

and full endpoint may be custom.stg.someorg.com for staging custom.edgelq.com service
someorg.com

and full endpoint may be custom.someorg.com for production custom.edgelq.com service

You will need to purchase the domain separately and this domain can be used for potentially many environments and applications reserved on the SPEKTRA Edge platform (custom, custom2, another…). You may host them on a single cluster as well.

Once you purchase let’s say someorg.com, and decide you want to use stg.someorg.com for staging environments, you will need to configure at least 2 endpoints for each SPEKTRA Edge service. One endpoint is a global one, the other one is a regional one.

Since SPEKTRA Edge is multi-region in its core, it is required to provide these two endpoints. Suppose you have custom.edgelq.com service reserved on SPEKTRA Edge platform, and you bought someorg.com, you will need the following endpoints:

custom.someorg.com

global endpoint for your service
custom.<REGION>.someorg.com

regional endpoint for your service in a specified region.

If your service is single-regional, then you will need in total two endpoints for a service. If you have 2 regions, then you will need three endpoints and so on.

To recap so far:

You will need to reserve an SPEKTRA Edge domain name (like custom.edgelq.com) on the SPEKTRA Edge platform. Then you may reserve more, like another.edgelq.com. Those will be just resources on the SPEKTRA Edge platform.
You will need to purchase a domain from the proper provider (like someorg.com), then optionally configure more subdomains to accommodate more env types if needed.
You will need to configure a global endpoint per each service (like custom.someorg.com, another.someorg.com).
You will need to configure a regional endpoint per each region (like custom.eastus2.someorg.com, another.eastus2.someorg.com).

Note that the domain for global endpoints here is someorg.com, for eastus2 it is eastus2.someorg.com.

Even if you don’t intend to have more than one region, it is required to have a regional domain - you can just use CNAME to make the same.

Let’s move to the public IPs part.

Regional and global domains must be resolved into public IP addresses you own/rent. Note that regional endpoints must be resolved into different IP addresses. The global endpoint may:

Use separate IP addresses than regional ones. This separate IP address will be an anycast. It should still route the traffic to the nearest regional cluster.
Use DNS solution and allow the global domain to be resolved into one of the regional IP addresses according to the best local performance.

For a single-regional setup, you may make regional and global domains use the same IP address, and make a CNAME record.

Meaning, if you have endpoints:

custom.someorg.com, another.someorg.com

They need to resolve to a single IP address. This IP address may be different, or equal to one of the regional endpoints.
custom.eastus2.someorg.com, another.eastus2.someorg.com

those are regional endpoints and needs single regional IP addresses. If you have more regions, then each requires a different IP address.

For each region, you will need different cluster deployments. Inside each cluster, you will need an Ingress object with all necessary certifications.

Networking setup is up to service maintainers, setup may vary significantly depending on the cloud provider or on-premise setup. The required parts from SPEKTRA Edge’s point of view are around domain names.

Config files preparation

With images constructed, you need to prepare the following config files:

API Server config
Envoy proxy
Controller
Db Controller

As the Inventory manager example uses Kubernetes declarations, this may influence some aspects of config files! You will see some variables here and there. Refer to this file for more explanation along the way: https://github.com/cloudwan/inventory-manager-example/blob/master/deployment/env.properties

API Server

Example of API Server config for Inventory Manager: https://github.com/cloudwan/inventory-manager-example/blob/master/deployment/api-server-config.yaml

The proto-model can be found here: https://github.com/cloudwan/inventory-manager-example/blob/master/config/apiserver.proto

Review this config file along with this document.

From the top, by convention, we start with sharding information. We use ring sizes 16 as standard, others are optional. You need to use the same naming conventions. Note that:

byName is mandatory ALWAYS
byProjectId is mandatory because in InventoryManager we use Project related resources
byServiceId is mandatory because in InventoryManager we use Service related resources
byIamScope is mandatory because we use byProjectId or byServiceId.

Below you have a “common” config, which applies to servers, controllers, and db-controllers, although some elements are specific only to one kind. There, we specify the grpc server config (the most important is a port of course). There is some experimental web sockets part (for bidi-streaming support for web browsers exclusively). We need to run on separate ports, but underlying libraries/techniques are experimental and may or may not work. You may skip this if you don’t need bidi-streaming calls for web browsers.

After grpcServer, you can see the databases (dbs) part. Note that namespace convention:

Part envs/$(ENV_NAME)-$(EDGELQ_REGION) ensures that we may potentially run a single database for various environments on a single cluster. This we adopted from development environments, but you may skip this part entirely if you are certain you will just run a single environment in a single cluster.
The second part, inventory-manager/v1-1, first specifies the application (if you have multiple SPEKTRA Edge apps), then version and revision (v1-1). “v1” refers to the API version of the service, then “-1” refers to revision part. If there is a completely new API version, we will need to synchronize databases (copy) during an upgrade. The second part, -1, is there because there is also a possibility of an internal database format upgrade, without API changes.

Other notable parts of the database:

We used the “mongo” backend.
We must specify an API version matching this DB.
You will need to provide the MONGO_ENDPOINT variable, Mongo deployment is not covered in this example.
Note that in the URL you have /etc/lqd/mongo/mongodb.pem specified. As of now, this file must be mounted on the pod during startup. In the future, it may be provided using different ways though.

Instead of Mongo, you may also configure firestore:

dbs:
- namespace: "envs/$(ENV_NAME)-$(EDGELQ_REGION)/inventory-manager/v1-1"
  backend: "firestore"
  apiVersion: "v1"
  connectionPoolSize: $(INVENTORY_MANAGER_DB_CONN_POOL_SIZE)
  firestore:
    projectId: "$(GCP_PROJECT_ID)"
    credentialsFilePath: "/etc/lqd/gcloud/db-google-credentials.json"

Of course, you will need to have these credentials and use them later in deployment.

Later you have the dbCache configuration. We only support Redis for now, note also the endpoint - for deployments like this, it should be some internal endpoint available only inside.

Further on you have the authenticator part. Values AUTH0_TENANT, AUTH0_CLIENT_ID, and EDGELQ_DOMAIN must match those provided by the SPEKTRA Edge cluster you are deploying for. But you need to pay more attention to serviceAccountIdTokenAudiencePrefixes value. There, you need to provide all private and public endpoints your service may encounter. Example there provides:

one private endpoint visible inside the Kubernetes cluster only (the one ending in -service).
public regional endpoint
public global endpoint

Public endpoints must match those configured during the Networking stage!

After authenticator, you have observability settings. You can configure logger, Audit, and Usage there. The last two use audit.edgelq.com and monitoring.edgelq.com. You can also add tracing deployment. As of now, it can work for Jaeger and Google Tracing (GCP only):

Stackdriver example: Note you are responsible for providing Google credentials path

observability:
  tracing:
    exporter: "stackdriver"
    sample_probability: 0.001
    stackdriver:
      projectId: "$(GCP_PROJECT_ID)"
      credentialsFilePath: "/etc/lqd/gcloud/google-credentials.json"

Jaeger part, BUT as of now it hard hardcoded endpoints:

agentEndpointURI = “jaeger-agent:6831”
collectorEndpointURI = “http://jaeger-collector:14268/api/traces”

observability:
  tracing:
    exporter: "jaeger"
    sample_probability: 0.001

This means you will need to deploy Jaeger manually. Furthermore, you should be careful with sampling - some low value is preferred, but it will make an unsuitable tool for bug hunting. SPEKTRA Edge uses now obsolete tracing instrumentation, but the proper one is on the work map. With this, an example will be enhanced.

After observability, you should see clientEnvironment. This used to be responsible for connecting with other services, it was taking domain part and pre-pending short service names. With a multi-domain environment, this is however obsolete. It is there for some compatibility reasons and should point to your domain. It may be dropped in the future. The replacement is envRegistry, which is just below.

Env registry config (envRegistry) is one of the more important parts. You need to specify the current instance type, and region information: which region is for the current deployment, which is the default one for your service. The default one must be the first you deploy your service to. Sub-param service must be the same as the service domain name you reserved on SPEKTRA Edge platform. Then you must provide global and regional (for this region) endpoints for your service. You may provide a private regional endpoint along with localNetworkId. The latter param should have a value of your own choice, it’s not equal to any resource ID created anywhere. It must be only same for all config files for all runtimes running on the same cluster, so they know they can safely use private endpoint (for performance reasons). Finally, scoreCalculator and location is used for multi-region middleware routing, if it detects a request that needs to be routed somewhere else, but somewhere else may be more than 1 region, it will use these options to get the best option.

Next part, bootstrap is necessary to configure EnvRegistry in the first place, this must point to meta service endpoint, where information about the whole SPEKTRA Edge environment will be obtained from.

The last common config parts are:

disableAuth: you should need to leave false here, but you may set it to true for some local debug
disableLimits: It is an old option used in the past for development, but typically needs to be false. It has no effect if limits integration was not done for a service.
Option enableStrictNaming enables strict IDs (32 chars max per ID, only a-z, 0-9, - and _ are allowed). This must be always true. The option exists only because of legacy SPEKTRA Edge environments.
avoidResourceCreationOverride

if true, then an attempt to send a Create request for an existing resource will result in AlreadyExists error. This must be always true. The option exists only because of legacy SPEKTRA Edge environments.
allowNotFoundOnResourceDeletion

if true, then an attempt to send a Delete request for a non-existing resource will result in a NotFound error. This must be always true. The option exists only because of legacy SPEKTRA Edge environments.

Param nttCredentialsFile is a very important one: It must contain the field path to the NTT credentials file you must have obtained when reserving service on the SPEKTRA Edge platform.

Envoy proxy

Example of API Server config for Inventory Manager: https://github.com/cloudwan/inventory-manager-example/blob/master/deployment/envoy.yaml

From a protocol point of view, the task of the envoy proxy is to:

Passthrough gRPC traffic
Convert webGRPC calls (made by web browsers) to gRPC ones.
Convert REST API (HTTP 1.1) calls to gRPC ones.

It also adds a TLS layer between Ingress and the API Server! Note that when a client outside the cluster communicates with your service, it will connect not with the service directly, but to the Ingress Controller sitting at the entry to your cluster. This Ingress will handle TLS with the client, but separate to the API server is also required. Ingress maintains double connections, one to the end client and, the other to the API server. Envoy proxy, sitting in the same Pod as the API Server, handles the upstream part of TLS. Note that in the envoy.yaml you have the /etc/envoy/pem/ directory with TLS certs. You will need to provision them separately, in addition to the public certificate for Ingress.

Refer to envoy proxy documentation for these files. From SPEKTRA Edge’s point of view, you may copy and paste this file from service to service. You should need though:

Replace all “inventory-manager” strings with proper service.
Configure REST API transcoding on a case-by-case basis.

For this REST API, see the following config part:

- name: envoy.filters.http.grpc_json_transcoder
  typed_config:
    "@type": type.googleapis.com/envoy.extensions.filters.http.grpc_json_transcoder.v3.GrpcJsonTranscoder
    proto_descriptor: /var/envoy/inventory_manager.pb
    services:
    - ntt.inventory_manager.v1.ProjectService
    - ntt.inventory_manager.v1.DeviceModelService
    - ntt.inventory_manager.v1.DeviceOrderService
    - ntt.inventory_manager.v1.ReaderAgentService
    - ntt.inventory_manager.v1.RoomService
    - ntt.inventory_manager.v1.SiteService
    - ntt.mixins.diagnostics.v1.UtilityService
    print_options:
      add_whitespace: false
      always_print_primitive_fields: true
      always_print_enums_as_ints: false
      preserve_proto_field_names: false
- name: envoy.filters.http.grpc_web
- name: envoy.filters.http.router

If you come back to Building images documentation part for the envoy proxy, you can see that we created the inventory_manager.pb file, which we included during the build process. We need to ensure this file is present in our envoy.yaml file, and all services are listed. For your service, find all services and put them in this list. You can find them in the protobuf files. As of now, Utility service offers just this one API group.

If you study envoy.yaml as well, you should see that it has two listeners:

On port 8091 we have for websockets (experimental, you should omit this if you don’t need bidistreaming support for web browsers over websockets).
On port 8443 we serve the rest of the protocols (gRPC, webGRPC, REST API).

It forwards traffic (proxying) to ports (setting clusters):

8080 for gRPC
8092 for websockets-grpc

Note those numbers match those on the API server config file! But when you configure Kubernetes Service, you will need use envoy ports.

Controller

Look at example, Inventory manager: https://github.com/cloudwan/inventory-manager-example/blob/master/deployment/controller-config.yaml

The proto-model can be found here: https://github.com/cloudwan/inventory-manager-example/blob/master/config/controller.proto

The top part serverEnvironment is very similar (actually the same) to commonConfig part in the API server config, we just specify fewer options, AND instanceType for envRegistry needs to specify a different value (CONTROLLER). We don’t specify databases, grpc servers, cache, or authenticator, observability is smaller.

The next part, nodeRegistry is required. This specifies the Redis instance that will be used for controller nodes to detect each other. Make sure to provide a unique namespace, don’t copy and paste easily to different controllers if you have more service backends!

Next, businessLogicNodes is required if you have a business logic controller in use. It is relatively simple, typically we need to provide just the node’s name (for Redis registration purposes), and most importantly, the sharding ring. It must match with some value in the backend. You can specify the number of nodes (virtual), that will fit into a single runtime process.

Param limitNodes is required if you use limits integration, and you should just copy-paste those values, with specified rings as in the example.

Finally, fixtureNodes were discussed in SPEKTRA Edge registration doc, so we can skip here.

Db controller

Look at example, Inventory manager: https://github.com/cloudwan/inventory-manager-example/blob/master/deployment/db-controller-config.yaml

The proto-model can be found here: https://github.com/cloudwan/inventory-manager-example/blob/master/config/dbcontroller.proto

The top part, serverEnvironment is very similar to those in api server and controller. Unlike the server, it does not have parts for the server or authenticator. But it has database and cache options because those are needed for database upgrades or multi-region syncing. Param instanceType in envRegistry must be equal to DB_CONTROLLER, but otherwise, all is the same.

It needs a nodeRegistry config because it uses sharding with other db-controllers in the same region and service.

Config nodesCfg is a standard and must be used as in the example.

TLS

Let’s start with the TLS part.

There are two encrypted connections:

Between end client and Ingress (Downstream for Ingress, External)
Between Ingress and API Server (via Envoy - Upstream for Ingress, Internal).

It means we have separate connections, and each one needs encryption. For external connection, we need a certificate that is public, and signed by a trusted authority. There are many ways to obtain it, for Clouds, we can likely get some managed certificates, and optionally use LetsEncrypt services (cloud-agnostic). It is up to service developers to decide how to get them. They need to issue certificates for regional and global endpoints. Refer to LetsEncrypt documentation for how to set up with Ingress if you need it, along with your choice of Ingress in the first place.

For the Internal certificate, for connections to API Server Envoy runtime, we need just a self-signed certificate. If we are in Kubernetes cluster, and we have ClusterIssuer for self-signed certs, we can make (assuming Inventory manager service, and namespace examples, region ID we used is eastus2):

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: inventory-manager.eastus2.examples-cert
  namespace: examples
spec:
  secretName: inventory-manager.eastus2.examples-cert
  duration: 87600h # 10 years
  renewBefore: 360h # 15 days
  privateKey:
    algorithm: RSA
    size: 2048
  usages:
  - server auth
  - digital signature
  - key encipherment
  dnsNames:
  - "inventory-manager.examples.svc.cluster.local"
  - "inventory-manager.examples.pod.cluster.local"
  - "inventory-manager.eastus2.examples.dev04.nttclouds.co"
  - "inventory-manager.examples.dev04.nttclouds.co"
  issuerRef:
    name: selfsigned-clusterissuer
    kind: ClusterIssuer

Note that you need the selfsigned-clusterissuer component ready, but on the internet, there are examples of how to make cluster issuer like that.

With the created Certificate, you can get pem/crt files:

kubectl get secret "inventory-manager.eastus2.examples-cert" --namespace examples -o json | jq -r '.data."tls.key"' | base64 --decode > "./server-key.pem"
kubectl get secret "inventory-manager.eastus2.examples-cert" --namespace examples -o json | jq -r '.data."tls.crt"' | base64 --decode > "./server.crt"

You will need those TLS for upstream connection TLS - keep these files.

Deployment manifests

For the Inventory manager example, we should start examining deployments from customized files: https://github.com/cloudwan/inventory-manager-example/blob/master/deployment/kustomization.yaml

This contains full deployment (except secret files and Ingress object), you may copy, understand, and modify its contents for your case. Ingress requires additional configuration.

Images

In the given example, the code contains my development image registry, so you will need to replace it with your images. Otherwise it is straightforward to understand.

Resources - Deployments and main Service

We have full yaml deployments for all runtimes - note that the apiserver.yaml file has deployment with 2 containers, one for API Server and the other for Envoy proxy.

All deployments have relevant pod auto-scalers (except Redis, to avoid synchronization across pods). You may though deploy also Redis as a managed service, in yaml config files for API-server, controller, and db-controller just replace endpoint!

In this file you also have a Service object at the bottom, that exposes two ports: One https (443), that redirects traffic to envoy proxy on 8443. It serves gRPC, grpc-web, and REST API. The other is experimental for websockets only and may be omitted. This is the Service you will need to provide to Ingress to have a full setup. When you construct an Ingress, you will need to redirect traffic to “inventory-manager-service” k8s Service (but replace the inventory-manager- prefix with something valid for you). If you ask why, since metadata.name is service, then the reason is following the line in customization.yaml:

namePrefix: inventory-manager-

This is pre-pended to all resource names in this directory.

When adopting these files, you need to:

Replace the “inventory-manager-” prefix in all places with a valid value for your service.
Fix container image names (inventorymanagerserverproxy, inventorymanagerserver, inventorymanagercontroller, inventorymanagerdbcontroller) in yaml files AND kustomization.yaml

images should point to your image registry!

Config generator, configuration, and vars

In kustomization, you should see a config generator, that loads config maps for all 4 images. However, we also need to take care of all variables using the $(VAR_NAME) format. First, we declare configurations pointing to params.yaml. Then we declare a full list of vars. These will be populated with the config map generator:

- name: examplesenv
  envs:
  - env.properties

And now we can use config files for replacements.

Secrets recap

Param secretGenerator in kustomization.yaml should recap all secret files we need:

We have 2 TLS files for self-signed certificates, for internal connection between Ingress and API Server Envoy.
We have credentials to MongoDB. This must be obtained for Mongo. You may opt for Firestore if you can and prefer, in which case you need to replace it with Google creds.
We have finally ntt credentials

This must have been obtained when you initially reserved Service on the SPEKTRA Edge platform, using UI or cuttle - see Setting up Environment.

2 - Migrating your Service to New Version

How to migrate your service to the new version.

When we talk about Service versioning, we don’t mean simple cases like adding new elements:

Adding a new field to an existing request/response/resource, as long as it does not destroy old logic).
Removing a field from existing request/response/resource, as long as backend service does not need it).

These operations can be done simply from protobuf files.

message SomeRequest {
  string old_field_a = 1;
  
  int32 old_field_b = 2;
  
  // We can add new field just by assigning new proto wire number to it:
  repeated int32 new_array_field = 3;
}

This is an example of how to properly remove a field

message SomeRequest {
  // It is a former old_field_a. This ensures that if there are new fields,
  // they wont take '1' ID.
  reserved 1;
  
  int32 old_field_b = 2;
  
  repeated int32 new_array_field = 3;
}

In protobufs, fields are identified by their numbers and are designed to simplify adding/removing elements. When we remove it, we should just make this value reserved. The backend service will be ignoring the field with ID “1”, even if the client app is still sending it. To ensure some developers do not introduce yet another field using previously discarded proto numbers (while old clients may still be running), to avoid clashes it is recommended to mark the number reserved.

What else can be done in backward backward-compatible manner:

You can add a new resource type entirely
You can add a new API group or action in the API-skeleton.
You can add a new parent to an existing resource unless it is the first parent (previously it was a root). Note this does not extend to scope attributes! Adding them is breaking change.

You can even:

Rename ENUM numbers in protobuf (as long as you don’t change their numbers)
Rename field names in protobuf (we support this for MongoDB only though!)

These renaming will break code if someone updates the library version, but the API will stay compatible, including the database format if you use MongoDB.

If your worry is about all the mentioned cases, you can stop here and just modify your service normally, in the current API-skeleton and current proto files.

Here, we discuss API breaking changes, like:

Adding first resource parent to the existing resource (Status: Supported, but may require some little tricks)
Changing resource name (Supported)
Replacing field type from one to another, or splitting/merging fields somehow (Supported, but can be improved).
Merging two resource types into one or splitting one resource type into two (NOT YET Supported).
Adding new scope attribute, for example, previously non-regional resources now can be regional (Supported).
One resource instance may be split too many or the other way around (With some hacks, we can provide tips on how to do it).
When we upgrade the version of the imported service, it is considered a breaking change (Supported BUT has traps hidden, needs improvement).

These things are hard - while in Goten we strive to provide a framework for it, there is much to do yet. Some of those cases are not even yet supported, they are in the plans only.

Therefore, you need to know now, that some things WILL BE changed in the future Goten releases to improve the versioning experience. We will try to do this before any actual 3rd party will need serious versioning, but as of this moment, versioning was needed only internally, and there are no official 3rd party services yet (we did not even release 3rd parties to production as of the moment of this writing).

We don’t have “example” versioning like for Inventory Manager, there is only one version there. But we may show you the last simple versioning example in core SPEKTRA Edge services. For example, secrets service, where we upgraded v1alpha2 to v1: https://github.com/cloudwan/edgelq/tree/main/secrets. You may watch this document while observing how it was done in Secrets.

How does it work

When we have a breaking change, the service backend actually “doubles” its size: It will start to offer a new API, while the old one is still running. Essentially, it will expose two APIs. Since the version number is exposed in ALL URL paths, the gRPC server can support all of them at once. However, you will need to maintain somehow 2 instances. New API you need to maintain and develop normally. Old API may need bug fixes only, no new development.

Once you upgrade your server, your service will have the following way of processing requests in OLD API:

Client sends request/stream call in old API. It reaches the envoy, which passes through it, or converts to gRPC if needed, as it was always doing (no change here). Request/Stream reaches your service.
Server gets a request/stream in an old API. It gets through interceptors as normally (including Authentication), since interceptors are common for all calls, regardless of method and version. This is the same as in the old processing.
The first difference is the first middleware. New API requests will get normally to the first middleware: multi-region routing. BUT old API requests will instead hit TRANSFORMER middleware. During this process all requests are converted to the new API versions or you can provide your handling. Then transformer middleware passes the request in a new format to the multi-region routing middleware of the NEW server. When the multi-region middleware of the new server returns a response, it is converted to an OLD version by the transformer middleware of the old server. Streams are also converted - the old stream is wrapped with a transforming stream that converts all requests/responses on the fly. Again, transformer middleware does all the conversions.

Note the significance of this transformer - basically, all old API requests are treated as new APIs. When they access the database in read or write mode, they are operating on new resource instances. A database upgrade is a separate thing to consider, and it will be described in this document later, in Upgrade process.

There are some notable notes about observability modules:

Usage component hidden in the framework will still count usage using the old version, despite transformer middleware. It is helpful because we can easily check if someone is using an old API.
Audit will be altered significantly. Resource change logs will be reported only using the new API (unfortunately for projects using the old version perhaps). But Activity Logs will contain requests/responses in the older format.

Audit is very tricky - once the format of request/response/resource is saved in the Audit storage, it is there. Audit does not know service versioning and does not know how to transform between versions. It is assumed that projects/organizations may be switching to new APIs on their own. If they use the old version - Activity logs will be using the old format, and they will see this format. Resource change logs will require further work. Once the project/organization switches, they should be aware of both versions and therefore can read both formats.

Defining new API-skeleton and prototyping versioning

Breaking changes cannot normally be accepted - therefore, we are tracking versions in api-skeletons. We always must provide the currentVersion param. Suppose we have the v1 version, and now we want the v2. First, we need to open the api-skeleton-v1.yaml file, and provide the following param:

name: somename.edgelq.com
proto:
  # Rest of the fields are omitted...
  package:
    currentVersion: v1
    nextVersion: v2

We must at least indicate what is the next version. In regenerate.sh file, we need to actually call bootstrap two times:

goten-bootstrap -i "${SERVICEPATH}/proto/api-skeleton-v1.yaml" \
  -o "${SERVICEPATH}/proto" \
  -n "${SERVICEPATH}/proto/api-skeleton-v2.yaml" [... OLD imports here...]

goten-bootstrap -i "${SERVICEPATH}/proto/api-skeleton-v2.yaml" \
  -o "${SERVICEPATH}/proto" [... NEW imports here...]

# Your life will be easier if you also format them:
clang-format-12 -i "${SERVICEPATH}"/proto/v1/*.proto
clang-format-12 -i "${SERVICEPATH}"/proto/v2/*.proto

Note that, when we call bootstrap for an older file, we must provide a path to the new one. A new api-skeleton file must be written like a new file, there should be no annotations or traces of the old API-skeleton (other than accommodating to what is possible to support old API).

During version upgrades, we can (and it is highly recommended) upgrade versions of services we import. This can be done only in the context of the upgraded API version.

This describes minimal updates to an old api-skeleton file. However, we can have some level of customization of versioning we can achieve this by modifying the old api-skeleton.

We can define extra instructions for versioning. For resources, we can:

resources:
- name: OldResourceName
  versioning:
    # This can be omitted, if we don't change resource name, or we want to discontinue resource.
    replacement: NewResourceName

    # In practice, I don't know any cases where below options were actually needed by us, but we
    # potentially can opt out from some automatic versioning...

    # With this, Goten will not provide automatic versioning of create request at all. This is more likely
    # to be needed by developers, if there is some special handling there.
    skipTransformersBasicActions:
    - CreateOldResourceName
  
    # Old store access by default will always try to support all store operations on old API resources, it provides
    # automatic conversion. But you can opt out here:
    skipAccessTransformer: true

    # You can skip OldResourceNameChange objects automatic conversion... it will render Watch methods
    # non-working though... I consider personally it may be even removed as an option.
    skipResourceChangeTransformers: true

For actions in API-skeleton, if we want to change their names, we can point this out to the Goten compiler using API-skeleton again (old API-skeleton file):

actions:
- name: OldActionName
  versioning:
    # This can be omitted, if we don't change action name, or we want to discontinue action at all.
    # NewApiGroupName may be omitted if this is same resource/api group as before.
    replacement: NewApiGroupName/NewActionName

Let’s review quickly what was done for the Secrets service (v1alpha2 - v1) upgrade. This is v1alpha2 api-skeleton: https://github.com/cloudwan/edgelq/blob/main/secrets/proto/api-skeleton-v1alpha2.yaml.

Note that nextVersion points to v1. We did not do any customizations here, it was not needed. Then we defined v1 api-skeleton: https://github.com/cloudwan/edgelq/blob/main/secrets/proto/api-skeleton-v1.yaml.

What did we change in breaking way:

Resource Secret is now regional. Therefore, if we had resources like projects/p0/secrets/s0, it would be now projects/p0/regions/some/secrets/s0.

We need to think about how to handle this kind of change, what is some? How convert GET requests, BatchGet, how do we convert existing resources or handle List requests using filter fields? We have LIST WHERE parent = projects/p0, which now needs LIST WHERE parent = projects/p0/regions/- or maybe LIST WHERE parent = projects/p0/regions/some? Also, if there is another service importing us, and they upgrade the version of the secret they import, how this is handled?

We used a trick here: We know that, during the upgrade of Secrets from v1alpha2 to v1, all our environments are single-regional. Therefore, we can assume that region is some constant value. We will provide this in the transformer converting secret reference to the new format. All old clients will keep using secrets from existing single regions, while new clients on new regions will be using new API only (required). The same trick can be done for services that started single-region, but have second thoughts when going multi-region.

We also added a CryptoKey resource, but it would be non-breaking. This new resource type is available only in the new API anyway. In the regenerate.sh file we added a second call to goten-bootstrap: https://github.com/cloudwan/edgelq/blob/main/secrets/regenerate.sh.

Versioning on proto annotations level

Once you have a new API-skeleton, provided necessary changes to the old API-skeleton, modified calls to goten-bootstrap, and finally you called goten-bootstrap for BOTH API-skeletons, you will have generated:

Full set of proto files in the proto/$NEW_VERSION directory. You will need to fill all request/response/resource bodies as normal. This is not covered here, you will probably need to copy contents from old files to new ones and make modifications where necessary.
In proto/$OLD_VERSION directory you should discover new file: <service_short_name>_versioning.proto.

You should have a short examination of it. There is a file-level annotation describing the versioning of this service:

option (goten.annotations.service_versioning) = {
  // We will have more methods generated, for each API group, for each method...
  methods : [{
    original_method : "$OLD_API_GROUP/$OLD_METHOD"
    replacement : "$NEW_API_GROUP/$NEW_METHOD"
  }]
  
  // Again, we may have many proto objects provided, but template for single one.
  // Object may be an instance of request, response, resource, or anything else!
  //
  // For any object NOT mentioned here, the following default is assumed, provided that
  // new object is found somewhere in new API proto package:
  //
  // {
  //  object: $OBJECT_NAME
  //  replacement: $OBJECT_NAME
  //  transformation_direction: BIDIRECTIONAL
  // }
  objects : [
    {
      // We can assume that old and new object name usually are same, but not always.
      object : "$OLD_OBJECT_NAME"
      replacement : "$NEW_OBJECT_NAME"
      
      // To reduce generated transformers code, we can use FROM_NEW_TO_OLD or FROM_OLD_TO_NEW.
      // This is used typically for responses/requests objects. We will need to convert old API
      // request to new API, but never other way around. Therefore, no need for extra generation.
      // DISABLED should be used to explicitly disable conversion of particular object.
      // BIDIRECTIONAL should be used by resources and all sub-types they use.
      transformation_direction : BIDIRECTIONAL // OR may be FROM_NEW_TO_OLD, FROM_OLD_TO_NEW, DISABLED
      
      // These options below probably should be considered obsolete and not used!
      // If this is true, then field path helper objects are not transformed...
      // If you don't understand, probably you dont need this option.
      skip_field_path_transformers : false

      // Skip generation of transformer for Store access.
      skip_resource_access_transformer : true
    }
  ]
};

This versioning file is generated only once based on the api-skeleton, it is assumed that the developer may modify this manually. If you made the next changes to api-skeleton, and you don’t have manual modifications, you should delete this file first.

Once you have filled all proto files in the new API, and ensured you are happy with versioning in general, you should further modify the regenerate.sh file, you must include a new protoc compiler to the list, PLUS add a list of new proto files as the input!

protoc \
    -I "${PROTOINCLUDE}" \
    "--goten-go_out=:${GOGENPATH}" \
    "--goten-validate_out=${GOGENPATH}" \
    "--goten-object_out=:${GOGENPATH}" \
    "--goten-resource_out=:${GOGENPATH}" \
    "--goten-store_out=datastore=firestore:${GOGENPATH}" \
    "--goten-client_out=${GOGENPATH}" \
    "--goten-access_out=${GOGENPATH}" \
    "--goten-server_out=lang=:${GOGENPATH}" \
    "--goten-cli_out=${GOGENPATH}" \
    "--edgelq-doc_out=service=${SERVICE_SHORT_NAME}:${SERVICEPATH}/docs/apis" \
    "--ntt-iam_out=lang=:${GOGENPATH}" \
    "--ntt-audit_out=:${GOGENPATH}" \
    "--goten-versioning_out=:${GOGENPATH}" \
    "${SERVICEPATH}"/proto/v1/*.proto "${SERVICEPATH}"/proto/v2/*.proto

There are 2 additions:

You must have "--goten-versioning_out=:${GOGENPATH}" in the list!
Instead of "${SERVICEPATH}"/proto/v1/*.proto, you also MUST include new version proto files: "${SERVICEPATH}"/proto/v2/*.proto.

When you generate pb file for REST API descriptors, you also need to provide two directories now:

protoc \
    -I "${PROTOINCLUDE}" \
    "--descriptor_set_out=${SERVICEPATH}/proto/${SERVICE_SHORT_NAME_LOWER_CASE}.pb" \
    "--include_source_info" \
    "--include_imports" \
    "${SERVICEPATH}"/proto/v1/*_service.proto \
    "${SERVICEPATH}"/proto/v2/*_service.proto \
    "${DIAGNOSTICSPATH}"/proto/v1/*_service.proto

With the new pb file, to enable REST API for both versions, you will need to modify envoy.yaml and provide a list of API services in the list for this transcoding. Unfortunately, the envoy is not able to figure out this itself. You may need to maintain multiple envoy.yaml files, for backends with the new version, and backends without the new version.

This is all regarding the regenerate.sh file.

Let’s have a quick view of the Secrets versioning we described before. Here you can see the versioning proto file: https://github.com/cloudwan/edgelq/blob/main/secrets/proto/v1alpha2/secrets_versioning.proto. Then again see the regenerate file, with extra protoc calls and more files provided: https://github.com/cloudwan/edgelq/blob/main/secrets/regenerate.sh.

During this upgrade, we also bumped the diagnostics mixin API, but it’s not important here.

Overview of generated code and implementation

Once you regenerate the service, you will have a “double” code size. Several directories of your service repository will have two subdirectories: v1 and v2 for example. Those directories are access, audithandlers, cli, client, fixtures, proto, resources, server, and store.

Directories for the new version you should treat as already known topics, it is the task of the older version to know how to transform to the new version, not the other way around. In this regard, you should first provide an implementation in new version directories: resources, client, server, etc. You may start by copying handwritten Go files from old version directories to new ones, then make all necessary modifications. You should have the new version fully developed first ideally, without looking at the old (apart from keeping in mind you need later to provide transformers for compatibility). Do not touch the cmd/ directory yet, it’s the last part you should work on.

Versioning module and transformers

When you have a new version, you may first look at all the new files that appeared for the old version. First, look at the new directory created: versioning/v1 (if v1 is the old version). It will have several subdirectories, for all resources and API groups. API groups may be a little less visible at first, because for each resource we have an implicit API group sharing the same name. But if you examine files, you should see the following pattern:

versioning/:
  $OLD_VERSION/
    $API_GROUP_NAME/
      <api_name>_service.pb.transformer
    $RESOURCE_NAME/
      <resource_name>.pb.access.go
      <resource_name>.pb.transformer.go
      <resource_name>_change.pb.transformer.go

Since the resource name has an API group with the same name, you will see often a directory with four generated files. You can look around versioning for secrets, since it is simple: https://github.com/cloudwan/edgelq/tree/main/secrets/versioning/v1alpha2.

All files ending with pb.transformer.go are standard transformer files. They contain one transformer struct definition per each protobuf object defined in the proto file. Therefore, files <api_name>_service.pb.transformer will be containing transformers for requests and responses. Files <resource_name>.pb.transformer.go will contain transformers for resources, <resource_name>_change.pb.transformer.go for change objects.

Let’s start with the resource transformer, for the Secret resource: https://github.com/cloudwan/edgelq/blob/main/secrets/versioning/v1alpha2/secret/secret.pb.transformer.go.

Note that we have first an interface, then we have a default implementation for that interface. Main parts to look at:

var (
    registeredSecretTransformer SecretTransformer
)

func SetSecretTransformer(transformer SecretTransformer) {
    ...
}

func GetSecretTransformer() SecretTransformer {
    ...
}

type SecretTransformer interface {
	...
}

type secretTransformer struct{}

We have a global transformer (for the package), and we can get/set it via functions. There is a reason for that, which will be explained shortly.

If you look at the interface though, you will see transformer functions for Secret resources in versions v1 and v1alpha2. Additionally, you will also see functions for transforming all “helper” objects - name, reference, field path, field mask, filter, field path value, etc. All those functions are also doubled for full bidirectional support. Still, they concentrate on a single object.

Before we jump to some transformation examples, let’s recap one thing about Golang: It has an interface, and you can “cast” implementing struct into the interface, but you don’t have polymorphism. Suppose you defined a struct “inheriting” another one and “overwritten” one of its methods, let’s call it A. Now, imagine that the parent struct has a method called B, which calls A internally. With polymorphism, it would be called your implementation, but not in Golang. Therefore, let’s see the current function for transforming Secret resource from v1alpha2 to v1:

func (t *secretTransformer) SecretToV1(
    ctx context.Context,
    src *secret.Secret,
) (*v1_secret.Secret, error) {
	if src == nil {
		return nil, nil
	}
	dst := &v1_secret.Secret{}
	trName, err := GetSecretTransformer().SecretNameToV1(ctx, src.GetName())
	if err != nil {
		return nil, err
	}
	dst.Name = trName
	dst.EncData = src.GetEncData()
	dst.Data = src.GetData()
	dst.Metadata = src.GetMetadata()
	return dst, nil
}

If we subclass secretTransformer and override SecretNameToV1, then inside SecretToV1 we would still call old implementation, if the code was written like:

trName, err := t.SecretNameToV1(ctx, src.GetName())

Since this is not desired, we decided to always get a globally registered transformer when calling other transformer functions, including self. Therefore, transformers are using a global registry (although they are still packaged). There may have been another solution perhaps, but it works fine.

When you want to override the transformer, you need to create another file and implement this transformer, inherit first from the base one. You should implement minimal required implementation. Your custom transformer will need to be exported.

If you look at other files across Secrets versioning (another transformer, not the pb.access.go file!), you should see that they implement much smaller interfaces - usually just objects back and forth. Resources are those with the largest amount of methods, but they follow the same principles.

Overall, you should notice that there is some hierarchy in these transformation calls.

For example, SecretToV1 needs SecretNameToV1, because the name field is part of the resource. SecretNameToV1 actually needs SecretReferenceToV1. Then SecretFieldMaskToV1 needs SecretFieldPathToV1. Next, SecretFilterToV1 needs SecretFieldPathValueToV1 etc.

Filters and field masks are especially important - transformations like ListSecretsRequestToV1 rely on them! In other words, if we have some special conversion of some specific field path within the resource, and we want to support filter conversions (and field masks), then we need to override relevant transformer functions:

<Name>To<Version>

for object transformation itself. We need to convert fields that cannot be code-generated.
<Name>FieldPathTo<Version>

for field mask transformations, we need to provide mapping for field paths that were not auto-generated.
<Name>FieldPathValueTo<Version>

for filter COMPARE conditions, for non-auto generated field path values.
<Name>FieldPathArrayOfValuesTo<Version>

for filter IN conditions (!), for non-auto generated field path values.
<Name>FieldPathArrayItemValue<Version>

for filter CONTAINS conditions if the field we need special treatment is an array and code-gen was not available.

Filters are pretty complex, they are after all set of conditions, and each condition is a combination of some field path value with an operator!

For secrets, we did not change any fields, we changed just its name field patterns by adding a region segment. Because of this, we need to override only: Reference and ParentReference transformers (for both versions). Name transformers are calling references, so we skipped them. WARNING: To be honest, it should be the other way around, name is basic, reference is on the top. This is one of the versioning parts that will be subject to a change, at least till versioning is only used by our team and no 3rd party services exist at this point. The ParentReference type is also considered obsolete and will be removed entirely.

What is at least good, is that those Reference/Name transformers will be used by resource transformers, filters, requests in all CRUD, etc.

Also, our transformer function will be used by all resources having references to Secret resource! This means, that if we have resources like:

message OtherResource {
  string secret_ref = 1 [(goten.annotations.type).reference = {
    resource: "secrets.edgelq.com/Secret"
    target_delete_behavior : BLOCK
  }];
}

This resource would belong to a service that is upgrading the Secrets version, maintainers of that service would not have to worry about transformation at all. Instead, they will need to import our versioning package, and it will be done for them.

Still, there are some areas for improvement. Note that field changes within resources, if they are breaking changes, require plenty of work - up to five transformer functions (those field paths, field path values for filters…), and even 10, because we need bidirectional transformation. In the future, we will have a special transformation function mapping one field path to another with value transformation - for both directions - two functions in total. Then all those transformer functions will be used.

When it comes to transformers, the code-gen compiler will try to match fields following way: If they share the same type (like int32 to int32, string to string, repeated string to repeated string) and proto wire number, then we have a match. Fields are allowed to change names. Any number/type change requires transformation.

Reference/name transformations require the same underlying type and name pattern.

Using transformers, we can construct access objects, like in https://github.com/cloudwan/edgelq/blob/main/secrets/versioning/v1alpha2/secret/secret.pb.access.go.

It takes access interface of new objects and wraps to provide old ones.

Transformers provide some flexibility in transforming objects larger or smaller, but they lack plenty of abilities. You cannot convert one object into two or more, or the other way around. Access to the database during transformation was possible in the past, but so far not necessary, and what is more problematic, prone to bugs. Roadmap predicts different mechanisms now, and it is advised to provide transformations that are possible. They should convert one item to another, and any “sub” item should be delegated to another transformer.

Once you have all transformers for the given version, it is highly recommended to wrap their initialization in a single module. For example, for secrets we have https://github.com/cloudwan/edgelq/blob/main/secrets/versioning/v1alpha2/secrets/registration.go.

We are importing all versioning packages. If there is any registration using the Go init function, we can “dummy” import with an underscore “_”. Otherwise, we need a registration function with arguments. Any runtime that will need those transformers will need to call this whole-service register function with transformers. Those runtimes are server and dbController of a versioned service AND all servers/dbControllers of importing services. For example, the service applications.edgelq.com imports secrets.edgelq.com, so its server and dbController will need to load secrets versioning modules.

Store transforming

It may be useful to have a handle to the new store, and “cast” it to the old one. This way you could interact with new database data via old API. Goten generates a structure that provides exactly that. For secrets, you can see it here: https://github.com/cloudwan/edgelq/blob/main/secrets/store/v1alpha2/secrets/secrets.pb.transformer.go. It takes the interface to the new store to provide new. It uses generated transformers from versioning packages.

This is an extra file Goten provides in older version packages.

Normally, if you have good transformers, this does not need any extra work.

Server transformer middleware

Server transformer middleware may be considered a final part of API transformation. It receives requests in the old format, transforms them into new ones, and passes them to the new middleware chain. It uses transformer objects from versioning packages.

Goten generates this middleware automatically in each of the server packages. For secrets service you have:

Then you have a glue of transformer middlewares:

https://github.com/cloudwan/edgelq/blob/main/secrets/server/v1alpha2/secrets/secrets.pb.middleware.transformer.go

Once you have this glue, you may provide a constructor for the server object as simple as in this example: https://github.com/cloudwan/edgelq/blob/main/secrets/server/v1alpha2/secrets/secrets.go.

There, we are passing new service handlers object and wrap with transformer accepting older API. See function NewTransformedSecretsServer. This is a very simple example: When you have a new server object, just wrap it with transformers.

If you wonder why we left NewSecretsServer that returns the old server, we will explain this when we talk about the need to run 2 versions. This is important: When you create a constructor for old server handlers that wrap a new server, you must leave the old constructor still in place.

If you see generated transformers, you may see that everything is wrapped around “transformation sessions”. Those are used by Audit, who needs to be notified about every converted message. If you are curious, check https://github.com/cloudwan/goten/blob/main/runtime/versioning/transformation_session.go, and see the ApiCommunicationTransformationObserver interface. This allows interested parties to observe if there was any change to the version.

If you were able to provide full versioning with transformers only, you can conclude the main work here. If you however need some extra IO work, split requests, or do anything more complicated, you may want either to:

Disable server transformations

for example by disabling it in api-skeletons!. You can check (read again) about skipTransformersBasicActions. Then you can implement your transforming actions for transformation middleware.
You may also amend transformer middleware by providing additional custom middleware in front of the generated one, or after if you prefer.

In your transformer middleware, you may also use a store object to extract additional data from a database, but it should be done in NO-TRANSACTION, read-only mode.

If you use transformations, you need to wrap up them with functions from the Goten module:

WithUnaryRequestTransformationSession
WithUnaryResponseTransformationSession
WithStreamClientMsgTransformationSession
WithStreamServerMsgTransformationSession

These are defined in https://github.com/cloudwan/goten/blob/main/runtime/versioning/transformation_session.go.

By leveraging custom transformer middlewares, note that you may even construct a “server” instance differently. Let’s go back to “server” construction with a transformer like here (https://github.com/cloudwan/edgelq/blob/main/secrets/server/v1alpha2/secrets/secrets.go), it does not necessarily need to be simple like:

func NewTransformedSecretsServer(
    newServer v1server.SecretsServer,
) SecretsServer {
	return WithTransformerMiddleware(newServer)
}

Instead, you can get a store handle for the new database, API server config, authInfoProvider, and so on. Then, you may construct a server handlers chains in the following way:

Old API middleware for multi-region routing
Old API middleware for authorization
Old API middleware for transaction
Old API middleware for outer
Transformation of middleware to new API - with special customizations
New API Custom middleware (if present)
New API Core server

Inside transformer middleware, you are guaranteed to be in a transaction. This may enable new cases, like splitting one Update request (for old API) into multiple Updates (for new API).

However, in the future, this may become recommended in the first place, with new Goten/SPEKTRA Edge upgrades. Note that if you have changed let’s say resource name, permission names from new and old APIs may be incompatible. You may make sure your roles will have permissions for both cases, but it will be more difficult once we Update our IAM to have their roles! It would be unreasonable to expect project admins to update their roles for new permissions, or to upgrade automatically since roles are stored in the IAM database.

Note that you can easily wrap the new store handle into the old one using the store transformer (from the store package we mentioned!).

If you need, you can take the wrapped old store handle, and construct the old API Server completely like it was before, using its middleware only, without transformer one. Then transformations will be happening only on the store level, sparing perhaps some tricky custom methods.

There is practically no cost in constructing two “server” objects for new and old APIs, those are rather stateless light objects. They are not actual servers but just sets of server handlers. If you use old authorization middleware, however, make sure permission names are the same, or you passed old permissions to new roles too! This way new roles can handle new and old APIs. Authorization does not necessarily care about versioning there.

Custom middlewares are powerful with the possibility to execute extra IO work or splitting requests, or maybe even changing entirely one request to a completely different unexpected type. However, there are limitations to it:

You still need some good transformers for full resource bodies without the context of requests/responses. The reason is, that during the upgrade process resource transformers are used to convert one to another!
References are still tricky. You need to consider that other services (or even your own) have references to resources with tricky name transformations. When those other services upgrade their resources, they will need some instruction on how to convert problematic references. For example, if you have resource A, which you want to split into B and C, then perhaps resource D concerning A will suddenly need to have two references to B and C in the next version.

Also, filter conditions like WHERE ref_to_a = "$value", may need to be transformed into a thing like WHERE ref_to_b = "$value1" AND ref_to_c = "$value2".

Fixtures

With multiple versions, you will see multiple directories for fixtures. When building images, you should include both in your controller’s final image.

When you create fixtures, especially roles, consider if they will work for both new and old APIs. For example, if some action changed the name, or if the resource changed the name, your role for the new API should have permissions for older and newer names.

Problematic maybe if project admins define their roles (which will be supported in the future). We may in this case recommends using older API Authorization in the first place, transformer middleware should be after it.

When you write new fixtures, you should avoid updating old fixtures! Old things must stay as it is!

Ensuring server and controller can work on two versions

With new and old APIs implemented, we need to handle the following:

main.go files - with support for both versions
fixtures for new and old versions.

Once you have “code” in place, you need to acknowledge the fact that, right now, an old API is running in your environments, with old resources, and an old database. Once you push your images, they will inherit the old database, incompatible with a new database. The safest process will be to keep the old database as it is AND prepare a new database in its namespace. Without revealing all the details yet, your service backend will be using two databases to achieve a smooth transition.

When your new servers/controllers start, they need to first check what kind of version is “operating now”. The first time they do that, they will see that the database is old, so new servers cannot run yet. Instead, they will need to run old server handlers, old controller, and old db-controller. This is why when you make a new server, you will need to do plenty of copying and pasting.

While your backend service upgrades the database, it will keep serving old content in the old way. Once the database upgrade finishes, your backend service will flip the primary database version. It will offer new API in its full form, and old API will be transformed to a new API on the fly by servers. This ensures you don’t necessarily need to maintain an old database anymore.

Now you should look carefully at the main.go files for the secrets service:

Server

Starting with the server, note that we are using the vrunner object, which gets a function for constructing different servers depending on whether the old or new API version is currently “active”. In this case, runV1Alpha2MainVersionServer is the old of course. If you look at runV1Alpha2MainVersionServer you should note:

v1alpha2 store handle is constructed normally
v1 store handle is constructed with read-only mode
We are constructing two multi-region policy store handlers. For the old one, we constructed it as we would do normally.
v1alpha2 server uses OLD handlers constructor, without wrapping new server at all!
v1 server is constructed like a standalone, but uses a read-only store handle. It means that all write requests will fail, but it will be already possible to “read” immediately. Read requests may not however return valid data yet.

It is assumed that, at the moment of the server upgrade, no client should be using the new version yet. Therefore, the API server will serve the old API as normal, and the old database will be written/read to. There will be a background database upgrade running and read requests will gradually be more consistent with reality.

In this example, please ignore NewMetaMixinServer (old schema-mixin) and NewLimitsMixinServer. During this particular upgrade of secrets, we also upgraded schema mixin (former meta-mixin) and limits mixin. In the case of 3rd party services, if you use just v1 schema and limits mixins for both versions, just construct the same instance as always, but give them the old API store.

For example, if you had some custom service using limits and schema mixin in v1, and you upgraded from v1 to v2, you should construct the following servers when running in v1 mode:

Limits mixin in v1, with access to v1 store handle, and multi-region policy for v1 version.
Schema mixin in v1, with access to v1 store handle.
Your service in v1, with access to v1 store handle.
Your service in v2, with access to v2 read-only store handle, and multi-region policy for v2.

When we upgrade mixins, we will describe the procedure of how to upgrade, but nothing is predicted on the roadmap.

Your service will detect automatically when a switch happens. In that case, the old server will be canceled, and vrunner will automatically call the constructor for the new version server. In the case of secrets, it would be called runV1MainVersionServer.

If you see this constructor, we build a read-write v1 store handle, and we discard the old store entirely. Now limits and schema mixins had to use a new store handle, and the old API server is a wrapped version of the new one. We still serve the old API, but the database has switched completely.

We will need also to prepare an API-server config file to support two database versions. This is the snippet for secrets service:

dbs:
- namespace: "envs/$(LQDENV)-$(EDGELQ_REGION)/secrets/v1"
  backend: "$(DB_BACKEND)"
  apiVersion: "v1alpha2"
  connectionPoolSize: $(SECRETS_DB_CONN_POOL_SIZE)
  mongo:
    endpoint: "mongodb+srv://$(MONGO_DOMAIN)/?authSource=%24external&authMechanism=MONGODB-X509&retryWrites=true&w=majority&tlsCertificateKeyFile=/etc/lqd/mongo/mongodb.pem"
  firestore:
    projectId: "$(FIRESTORE_GCP_PROJECT_ID)"
    credentialsFilePath: "/etc/lqd/gcloud/db-google-credentials.json"
- namespace: "envs/$(LQDENV)-$(EDGELQ_REGION)/secrets/v2"
  backend: "$(DB_BACKEND)"
  apiVersion: "v1"
  connectionPoolSize: $(SECRETS_DB_CONN_POOL_SIZE)
  mongo:
    endpoint: "mongodb+srv://$(MONGO_DOMAIN)/?authSource=%24external&authMechanism=MONGODB-X509&retryWrites=true&w=majority&tlsCertificateKeyFile=/etc/lqd/mongo/mongodb.pem"
  firestore:
    projectId: "$(FIRESTORE_GCP_PROJECT_ID)"
    credentialsFilePath: "/etc/lqd/gcloud/db-google-credentials.json"

Note that we have 2 database entries, with different namespaces and different apiVersion assigned! For historical reasons, we had a mismatch between the version in namespace and apiVersion though, so don’t worry about this part (v1 DB is v1alpha2 for API, and v2 for DB is v1 for API).

Controller

Like with the server, in controller runtime we also use the vrunner object, if you see secrets controller main.go. If the old version is active, then it will run just an old controller, with old fixtures. Meaning, that once you upgrade your images, your controller should run like it was always doing. However, when a version switch is detected, the old controller will be canceled and one new deployed in its place.

Note that the config object has 2 fixture sets: v1alpha2 and v1. If you look at the config file: https://github.com/cloudwan/edgelq/blob/main/secrets/config/controller.proto, you will also see 2 fixture configs accordingly. Any multi-version service should have this.

It also means that new fixtures for projects and your service will only be deployed when the actual version changes.

For your config file, ensure you provide two fixture sets for both versions.

Db-Controller

Now, the db controller is much more different than the controller and server. You don’t have any vrunner. Instead, you should see that we are calling NewVersionedStorage twice, for different databases. We are even passing both to dbSyncerCtrlManager. You should be aware of multiple tasks happening in a db syncer controller module:

It handles multi-region syncing.
It handles search db syncing if you use a different search backend than a primary database.
It handles database upgrades too!

We don’t use vrunner in db-controller, because it is already used by db-syncer-ctrl and db-constraint-ctrl internally. They switch automatically with version change, so it’s not visible in the main.go file.

When the db-controller starts and detects the old version active, it will continue executing regular tasks for the old service. However, in the background, it will start database copying from the old to the new namespace!

Config file for db-controller will need, like a server instance, two entries for dbs. This is a snippet from our secrets:

dbs:
- namespace: "envs/$(LQDENV)-$(EDGELQ_REGION)/secrets/v1"
  backend: "$(DB_BACKEND)"
  apiVersion: "v1alpha2"
  connectionPoolSize: $(SECRETS_DB_CONN_POOL_SIZE)
  mongo:
    endpoint: "mongodb+srv://$(MONGO_DOMAIN)/?authSource=%24external&authMechanism=MONGODB-X509&retryWrites=true&w=majority&tlsCertificateKeyFile=/etc/lqd/mongo/mongodb.pem"
  firestore:
    projectId: "$(FIRESTORE_GCP_PROJECT_ID)"
    credentialsFilePath: "/etc/lqd/gcloud/db-google-credentials.json"
  disabled: $(V1_ALPHA2_DB_DISABLED)
- namespace: "envs/$(LQDENV)-$(EDGELQ_REGION)/secrets/v2"
  backend: "$(DB_BACKEND)"
  apiVersion: "v1"
  connectionPoolSize: $(SECRETS_DB_CONN_POOL_SIZE)
  mongo:
    endpoint: "mongodb+srv://$(MONGO_DOMAIN)/?authSource=%24external&authMechanism=MONGODB-X509&retryWrites=true&w=majority&tlsCertificateKeyFile=/etc/lqd/mongo/mongodb.pem"
  firestore:
    projectId: "$(FIRESTORE_GCP_PROJECT_ID)"
    credentialsFilePath: "/etc/lqd/gcloud/db-google-credentials.json"

The new element not present on the server side is disabled: $(V1_ALPHA2_DB_DISABLED). If you look back at main.go for controller though, you should see the following code:

func main() {
    ...

    var v1Alpha2Storage *node.VersionedStorage
    dbSyncingCtrlCfg := db_syncing_ctrl.NewDefaultControllerNodeConfig()
    if serverEnvCfg.DbVersionEnabled("v1alpha2") || envRegistry.MyRegionalDeploymentInfo().GetCurrentVersion() == "v1alpha2" {
		...

        dbSyncingCtrlCfg.EnableDowngradeDbSyncing = serverEnvCfg.DbVersionEnabled("v1alpha2")
    }
}

It means that:

If the current detected version is v1alpha2, the old one, then the second boolean check passes, and we are adding v1alpha2 storage regardless of serverEnvCfg.DbVersionEnabled("v1alpha2"). However, if this returns false, then dbSyncingCtrlCfg.EnableDowngradeDbSyncing is false.
If current version is v1, and serverEnvCfg.DbVersionEnabled("v1alpha2") returns false, then v1alpha2 storage is completely non-visible anymore.

We will discuss this when talking about the Upgrade process.

Versioning transformers

If you looked carefully enough, you should notice the following lines in two main.go files for secrets (server and db-controller):

import (
    vsecrets "github.com/cloudwan/edgelq/secrets/versioning/v1alpha2/secrets"
)

func main() {
	...

    vsecrets.RegisterCustomSecretsTransformers(envRegistry.MyRegionId())
	
	...
}

This import is necessary for the correct working of the server and db-controller. The former needs them for API transformation, the latter needs for database upgrade. If you don’t have any custom transformers, and you use just the init function, you at least will need to make a “dummy” import:

import (
    _ "github.com/cloudwan/edgelq/secrets/versioning/v1alpha2/secrets"
)

For non-dummy imports, transformers will be also needed for all importing services. For example, since service applications.edgelq.com imports secrets.edgelq.com, we also had to load same versioning transformers in its main

go files:

Note that we are calling vsecrets.RegisterCustomSecretsTransformers( envRegistry.MyRegionId()) there too! This is necessary to transform references to Secret resources! When you upgrade imported services, make sure to import their transformers.

Upgrading process

By now you should know that, when you upgrade images, your service backend will continue operating on the old API version and old database, but db-controller will be secretly upgrading the database by copying data from one namespace to another.

The information on what version is active is coming from the meta.goten.com service. Each Deployment resource has a field called currentVersion. It also means, that each region controls its version, and you need to run an upgrade process for all regions for a service (Deployment).

Therefore, we focus on a single region only, just in case. First, you pick a region to upgrade, upload images, and restart backend services to use them. They will start serving the old version first, and start upgrading the database.

But they won’t switch on their own, they will just sync the database, then keep syncing forever for every write request happening for the old version. To trigger an upgrade with the version switch, you should use the BeginUpgrade request to Meta service. For example, if you are upgrading service custom.edgelq.com in region us-west2, you may use cuttle. Let us assume you are upgrading from v1 to v2.

cuttle meta begin-upgrade deployment \
  --name 'services/custom.edgelq.com/deployments/us-west2' \
  --total-shards-count 16 \
  --target-version 'v2'

Total shards count, Value 16, is coming from several shards byName you have in db-controller, see db controller config, sharding settings. This must be the same. In the future, we may provide sharding info via meta service resources rather than config files. Ring size 16 is the current standard.

You may find the request proto definition here: https://github.com/cloudwan/goten/blob/main/meta-service/proto/v1/deployment_custom.proto.

Once you start upgrading, monitor services/custom.edgelq.com/deployments/us-west2 with periodic GET or WATCH requests.

Once you start upgrading, the field upgrade_state of deployment will be updated, and you should see data like (other fields are omitted):

{
  "name": "services/custom.edgelq.com/deployments/us-west2",
  "currentVersion": "v1",
  "upgradeState": {
    "targetVersion": "v2",
    "pendingShards": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15],
    "state": "INITIAL_SYNCING"
  }
}

This initial syncing may be a bit misleading because initial syncing already starts automatically, but this time db-controller is reporting process, for each shard completed, it will update:

{
  "name": "services/custom.edgelq.com/deployments/us-west2",
  "currentVersion": "v1",
  "upgradeState": {
    "targetVersion": "v2",
    "readyShards": [0, 2],
    "pendingShards": [1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15],
    "state": "INITIAL_SYNCING"
  }
}

Once all shards move to ready, then the state will change and all ready shards become pending again:

{
  "name": "services/custom.edgelq.com/deployments/us-west2",
  "currentVersion": "v1",
  "upgradeState": {
    "targetVersion": "v2",
    "pendingShards": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15],
    "state": "SWITCHING"
  }
}

When this happens, API servers will reject writing requests. This ensures that the db-controller does not need to play catch up game with writes that may be happening, instead, it can focus on stabilizing the database and finish remaining writes.

At this point, like 99.5% of data should already be on the new database. Initial syncing completes when the DB-controller reached parity at least for a moment. Active writes may make it unsafe to switch databases though, it will be necessary to disable writes for a moment, up to 1 minute. Of course, reading and writing to other services will continue as usual, therefore disruption should be relatively minimal.

Pending shards will start moving to ready. Once all of them are moved, you should see:

{
  "name": "services/custom.edgelq.com/deployments/us-west2",
  "currentVersion": "v2"
}

This concludes the upgrade process. All backend runtimes will automatically switch to the new version.

If you believe the db-controller is stuck, check for logs -> if there is any bug, it may be crashing, which requires a fix or rollback, depending on the env type. If everything is fine, it may deadlock optionally, which happened in our case in some dev environments. This upgrade mechanism is still under work, but restarting the db-controller normally fixes the issue, and it continues to upgrade without any issues. So far we upgraded a couple of environments like this without breaking anything, but still be careful as it is an experimental feature. The worst case however should be averted thanks to database namespace separation, and other means of db upgrades are more risky.

Let’s talk about rollback options.

First, to note, there is some other task db-controller will start in the background, depending on settings, ONCE it switches from old database to new: It will start syncing from new database to old, in a reverse direction than before. This may be beneficial if you will need to revert after several days, and you want to keep updates in the new database. If this is not desired, if you prefer to have quick rollback by just updating pods to old images, you can modify the DB-controller config field disabled:

dbs:
- namespace: "envs/$(LQDENV)-$(EDGELQ_REGION)/secrets/v1"
  backend: "$(DB_BACKEND)"
  apiVersion: "v1alpha2"
  connectionPoolSize: $(SECRETS_DB_CONN_POOL_SIZE)
  mongo:
    endpoint: "mongodb+srv://$(MONGO_DOMAIN)/?authSource=%24external&authMechanism=MONGODB-X509&retryWrites=true&w=majority&tlsCertificateKeyFile=/etc/lqd/mongo/mongodb.pem"
  firestore:
    projectId: "$(FIRESTORE_GCP_PROJECT_ID)"
    credentialsFilePath: "/etc/lqd/gcloud/db-google-credentials.json"

  ## If this value is "true", then, if this API version is inactive, DbController WILL NOT try to sync updates
  ## from new database to old. Old DB will be more and more behind new version with each day when new version is active.
  disabled: $(V1_ALPHA2_DB_DISABLED)

- namespace: "envs/$(LQDENV)-$(EDGELQ_REGION)/secrets/v2"
  backend: "$(DB_BACKEND)"
  apiVersion: "v1"
  connectionPoolSize: $(SECRETS_DB_CONN_POOL_SIZE)
  mongo:
    endpoint: "mongodb+srv://$(MONGO_DOMAIN)/?authSource=%24external&authMechanism=MONGODB-X509&retryWrites=true&w=majority&tlsCertificateKeyFile=/etc/lqd/mongo/mongodb.pem"
  firestore:
    projectId: "$(FIRESTORE_GCP_PROJECT_ID)"
    credentialsFilePath: "/etc/lqd/gcloud/db-google-credentials.json"

Note that we have particular code in db-controller (again):

func main() {
    ...

    var v1Alpha2Storage *node.VersionedStorage
    dbSyncingCtrlCfg := db_syncing_ctrl.NewDefaultControllerNodeConfig()
    if serverEnvCfg.DbVersionEnabled("v1alpha2") || envRegistry.MyRegionalDeploymentInfo().GetCurrentVersion() == "v1alpha2" {
		...

        dbSyncingCtrlCfg.EnableDowngradeDbSyncing = serverEnvCfg.DbVersionEnabled("v1alpha2")
    }
}

If serverEnvCfg.DbVersionEnabled("v1alpha2") returns false, then either db-controller will not even get access to the old database, or if the current version is v1alpha2, when the switch happens, then dbSyncingCtrlCfg.EnableDowngradeDbSyncing will be false. This will ensure that the db-controller will not start syncing in the reverse new -> old direction after the switch. This may make quick rollback safer, without using database backup.

Perhaps it is best to start this way -> disable the old version if inactive, and make the upgrade. Then check if everything is fine, and if there is an emergency to rollback, you can deploy old pods and apply updates quickly. If you do that, remember however to send the UpdateDeployment request to meta.goten.com to ensure the currentVersion field points to the old one. If everything is good, however, you may optionally enable new -> old DB syncing from the config, in case rollback is needed after some amount of days, but you are confident enough that it won’t corrupt the old database.

Other upgrade information:

SearchDB, if present, is automatically synced during the upgrade, but maybe a bit delayed behind (a matter of seconds).
For resources not owned by the local database, we are talking about read copies resulting from multi-region, db-controller will not attempt to sync old -> new. Instead, it will just send watch requests to other regions, separate for old and new APIs. Copies will be done asynchronously and don’t influence the “switch”.
Meta owner references, throughout all services, will be updated asynchronously once the service switches to the new version. Unlike hard schema references pointing to our service, meta owner references are assumed owned by the service they point to, and they must use the version currently used by this service/region.