Migrating your Service to New Version

How to migrate your service to the new version.

When we talk about Service versioning, we don’t mean simple cases like adding new elements:

  • Adding a new field to an existing request/response/resource, as long as it does not destroy old logic).
  • Removing a field from existing request/response/resource, as long as backend service does not need it).

These operations can be done simply from protobuf files.

message SomeRequest {
  string old_field_a = 1;
  
  int32 old_field_b = 2;
  
  // We can add new field just by assigning new proto wire number to it:
  repeated int32 new_array_field = 3;
}

This is an example of how to properly remove a field

message SomeRequest {
  // It is a former old_field_a. This ensures that if there are new fields,
  // they wont take '1' ID.
  reserved 1;
  
  int32 old_field_b = 2;
  
  repeated int32 new_array_field = 3;
}

In protobufs, fields are identified by their numbers and are designed to simplify adding/removing elements. When we remove it, we should just make this value reserved. The backend service will be ignoring the field with ID “1”, even if the client app is still sending it. To ensure some developers do not introduce yet another field using previously discarded proto numbers (while old clients may still be running), to avoid clashes it is recommended to mark the number reserved.

What else can be done in backward backward-compatible manner:

  • You can add a new resource type entirely
  • You can add a new API group or action in the API-skeleton.
  • You can add a new parent to an existing resource unless it is the first parent (previously it was a root). Note this does not extend to scope attributes! Adding them is breaking change.

You can even:

  • Rename ENUM numbers in protobuf (as long as you don’t change their numbers)
  • Rename field names in protobuf (we support this for MongoDB only though!)

These renaming will break code if someone updates the library version, but the API will stay compatible, including the database format if you use MongoDB.

If your worry is about all the mentioned cases, you can stop here and just modify your service normally, in the current API-skeleton and current proto files.

Here, we discuss API breaking changes, like:

  • Adding first resource parent to the existing resource (Status: Supported, but may require some little tricks)
  • Changing resource name (Supported)
  • Replacing field type from one to another, or splitting/merging fields somehow (Supported, but can be improved).
  • Merging two resource types into one or splitting one resource type into two (NOT YET Supported).
  • Adding new scope attribute, for example, previously non-regional resources now can be regional (Supported).
  • One resource instance may be split too many or the other way around (With some hacks, we can provide tips on how to do it).
  • When we upgrade the version of the imported service, it is considered a breaking change (Supported BUT has traps hidden, needs improvement).

These things are hard - while in Goten we strive to provide a framework for it, there is much to do yet. Some of those cases are not even yet supported, they are in the plans only.

Therefore, you need to know now, that some things WILL BE changed in the future Goten releases to improve the versioning experience. We will try to do this before any actual 3rd party will need serious versioning, but as of this moment, versioning was needed only internally, and there are no official 3rd party services yet (we did not even release 3rd parties to production as of the moment of this writing).

We don’t have “example” versioning like for Inventory Manager, there is only one version there. But we may show you the last simple versioning example in core SPEKTRA Edge services. For example, secrets service, where we upgraded v1alpha2 to v1: https://github.com/cloudwan/edgelq/tree/main/secrets. You may watch this document while observing how it was done in Secrets.

How does it work

When we have a breaking change, the service backend actually “doubles” its size: It will start to offer a new API, while the old one is still running. Essentially, it will expose two APIs. Since the version number is exposed in ALL URL paths, the gRPC server can support all of them at once. However, you will need to maintain somehow 2 instances. New API you need to maintain and develop normally. Old API may need bug fixes only, no new development.

Once you upgrade your server, your service will have the following way of processing requests in OLD API:

  • Client sends request/stream call in old API. It reaches the envoy, which passes through it, or converts to gRPC if needed, as it was always doing (no change here). Request/Stream reaches your service.
  • Server gets a request/stream in an old API. It gets through interceptors as normally (including Authentication), since interceptors are common for all calls, regardless of method and version. This is the same as in the old processing.
  • The first difference is the first middleware. New API requests will get normally to the first middleware: multi-region routing. BUT old API requests will instead hit TRANSFORMER middleware. During this process all requests are converted to the new API versions or you can provide your handling. Then transformer middleware passes the request in a new format to the multi-region routing middleware of the NEW server. When the multi-region middleware of the new server returns a response, it is converted to an OLD version by the transformer middleware of the old server. Streams are also converted - the old stream is wrapped with a transforming stream that converts all requests/responses on the fly. Again, transformer middleware does all the conversions.

Note the significance of this transformer - basically, all old API requests are treated as new APIs. When they access the database in read or write mode, they are operating on new resource instances. A database upgrade is a separate thing to consider, and it will be described in this document later, in Upgrade process.

There are some notable notes about observability modules:

  • Usage component hidden in the framework will still count usage using the old version, despite transformer middleware. It is helpful because we can easily check if someone is using an old API.
  • Audit will be altered significantly. Resource change logs will be reported only using the new API (unfortunately for projects using the old version perhaps). But Activity Logs will contain requests/responses in the older format.

Audit is very tricky - once the format of request/response/resource is saved in the Audit storage, it is there. Audit does not know service versioning and does not know how to transform between versions. It is assumed that projects/organizations may be switching to new APIs on their own. If they use the old version - Activity logs will be using the old format, and they will see this format. Resource change logs will require further work. Once the project/organization switches, they should be aware of both versions and therefore can read both formats.

Defining new API-skeleton and prototyping versioning

Breaking changes cannot normally be accepted - therefore, we are tracking versions in api-skeletons. We always must provide the currentVersion param. Suppose we have the v1 version, and now we want the v2. First, we need to open the api-skeleton-v1.yaml file, and provide the following param:

name: somename.edgelq.com
proto:
  # Rest of the fields are omitted...
  package:
    currentVersion: v1
    nextVersion: v2

We must at least indicate what is the next version. In regenerate.sh file, we need to actually call bootstrap two times:

goten-bootstrap -i "${SERVICEPATH}/proto/api-skeleton-v1.yaml" \
  -o "${SERVICEPATH}/proto" \
  -n "${SERVICEPATH}/proto/api-skeleton-v2.yaml" [... OLD imports here...]

goten-bootstrap -i "${SERVICEPATH}/proto/api-skeleton-v2.yaml" \
  -o "${SERVICEPATH}/proto" [... NEW imports here...]

# Your life will be easier if you also format them:
clang-format-12 -i "${SERVICEPATH}"/proto/v1/*.proto
clang-format-12 -i "${SERVICEPATH}"/proto/v2/*.proto

Note that, when we call bootstrap for an older file, we must provide a path to the new one. A new api-skeleton file must be written like a new file, there should be no annotations or traces of the old API-skeleton (other than accommodating to what is possible to support old API).

During version upgrades, we can (and it is highly recommended) upgrade versions of services we import. This can be done only in the context of the upgraded API version.

This describes minimal updates to an old api-skeleton file. However, we can have some level of customization of versioning we can achieve this by modifying the old api-skeleton.

We can define extra instructions for versioning. For resources, we can:

resources:
- name: OldResourceName
  versioning:
    # This can be omitted, if we don't change resource name, or we want to discontinue resource.
    replacement: NewResourceName

    # In practice, I don't know any cases where below options were actually needed by us, but we
    # potentially can opt out from some automatic versioning...

    # With this, Goten will not provide automatic versioning of create request at all. This is more likely
    # to be needed by developers, if there is some special handling there.
    skipTransformersBasicActions:
    - CreateOldResourceName
  
    # Old store access by default will always try to support all store operations on old API resources, it provides
    # automatic conversion. But you can opt out here:
    skipAccessTransformer: true

    # You can skip OldResourceNameChange objects automatic conversion... it will render Watch methods
    # non-working though... I consider personally it may be even removed as an option.
    skipResourceChangeTransformers: true

For actions in API-skeleton, if we want to change their names, we can point this out to the Goten compiler using API-skeleton again (old API-skeleton file):

actions:
- name: OldActionName
  versioning:
    # This can be omitted, if we don't change action name, or we want to discontinue action at all.
    # NewApiGroupName may be omitted if this is same resource/api group as before.
    replacement: NewApiGroupName/NewActionName

Let’s review quickly what was done for the Secrets service (v1alpha2 - v1) upgrade. This is v1alpha2 api-skeleton: https://github.com/cloudwan/edgelq/blob/main/secrets/proto/api-skeleton-v1alpha2.yaml.

Note that nextVersion points to v1. We did not do any customizations here, it was not needed. Then we defined v1 api-skeleton: https://github.com/cloudwan/edgelq/blob/main/secrets/proto/api-skeleton-v1.yaml.

What did we change in breaking way:

  • Resource Secret is now regional. Therefore, if we had resources like projects/p0/secrets/s0, it would be now projects/p0/regions/some/secrets/s0.

We need to think about how to handle this kind of change, what is some? How convert GET requests, BatchGet, how do we convert existing resources or handle List requests using filter fields? We have LIST WHERE parent = projects/p0, which now needs LIST WHERE parent = projects/p0/regions/- or maybe LIST WHERE parent = projects/p0/regions/some? Also, if there is another service importing us, and they upgrade the version of the secret they import, how this is handled?

We used a trick here: We know that, during the upgrade of Secrets from v1alpha2 to v1, all our environments are single-regional. Therefore, we can assume that region is some constant value. We will provide this in the transformer converting secret reference to the new format. All old clients will keep using secrets from existing single regions, while new clients on new regions will be using new API only (required). The same trick can be done for services that started single-region, but have second thoughts when going multi-region.

We also added a CryptoKey resource, but it would be non-breaking. This new resource type is available only in the new API anyway. In the regenerate.sh file we added a second call to goten-bootstrap: https://github.com/cloudwan/edgelq/blob/main/secrets/regenerate.sh.

Versioning on proto annotations level

Once you have a new API-skeleton, provided necessary changes to the old API-skeleton, modified calls to goten-bootstrap, and finally you called goten-bootstrap for BOTH API-skeletons, you will have generated:

  • Full set of proto files in the proto/$NEW_VERSION directory. You will need to fill all request/response/resource bodies as normal. This is not covered here, you will probably need to copy contents from old files to new ones and make modifications where necessary.
  • In proto/$OLD_VERSION directory you should discover new file: <service_short_name>_versioning.proto.

You should have a short examination of it. There is a file-level annotation describing the versioning of this service:

option (goten.annotations.service_versioning) = {
  // We will have more methods generated, for each API group, for each method...
  methods : [{
    original_method : "$OLD_API_GROUP/$OLD_METHOD"
    replacement : "$NEW_API_GROUP/$NEW_METHOD"
  }]
  
  // Again, we may have many proto objects provided, but template for single one.
  // Object may be an instance of request, response, resource, or anything else!
  //
  // For any object NOT mentioned here, the following default is assumed, provided that
  // new object is found somewhere in new API proto package:
  //
  // {
  //  object: $OBJECT_NAME
  //  replacement: $OBJECT_NAME
  //  transformation_direction: BIDIRECTIONAL
  // }
  objects : [
    {
      // We can assume that old and new object name usually are same, but not always.
      object : "$OLD_OBJECT_NAME"
      replacement : "$NEW_OBJECT_NAME"
      
      // To reduce generated transformers code, we can use FROM_NEW_TO_OLD or FROM_OLD_TO_NEW.
      // This is used typically for responses/requests objects. We will need to convert old API
      // request to new API, but never other way around. Therefore, no need for extra generation.
      // DISABLED should be used to explicitly disable conversion of particular object.
      // BIDIRECTIONAL should be used by resources and all sub-types they use.
      transformation_direction : BIDIRECTIONAL // OR may be FROM_NEW_TO_OLD, FROM_OLD_TO_NEW, DISABLED
      
      // These options below probably should be considered obsolete and not used!
      // If this is true, then field path helper objects are not transformed...
      // If you don't understand, probably you dont need this option.
      skip_field_path_transformers : false

      // Skip generation of transformer for Store access.
      skip_resource_access_transformer : true
    }
  ]
};

This versioning file is generated only once based on the api-skeleton, it is assumed that the developer may modify this manually. If you made the next changes to api-skeleton, and you don’t have manual modifications, you should delete this file first.

Once you have filled all proto files in the new API, and ensured you are happy with versioning in general, you should further modify the regenerate.sh file, you must include a new protoc compiler to the list, PLUS add a list of new proto files as the input!

protoc \
    -I "${PROTOINCLUDE}" \
    "--goten-go_out=:${GOGENPATH}" \
    "--goten-validate_out=${GOGENPATH}" \
    "--goten-object_out=:${GOGENPATH}" \
    "--goten-resource_out=:${GOGENPATH}" \
    "--goten-store_out=datastore=firestore:${GOGENPATH}" \
    "--goten-client_out=${GOGENPATH}" \
    "--goten-access_out=${GOGENPATH}" \
    "--goten-server_out=lang=:${GOGENPATH}" \
    "--goten-cli_out=${GOGENPATH}" \
    "--edgelq-doc_out=service=${SERVICE_SHORT_NAME}:${SERVICEPATH}/docs/apis" \
    "--ntt-iam_out=lang=:${GOGENPATH}" \
    "--ntt-audit_out=:${GOGENPATH}" \
    "--goten-versioning_out=:${GOGENPATH}" \
    "${SERVICEPATH}"/proto/v1/*.proto "${SERVICEPATH}"/proto/v2/*.proto

There are 2 additions:

  • You must have "--goten-versioning_out=:${GOGENPATH}" in the list!
  • Instead of "${SERVICEPATH}"/proto/v1/*.proto, you also MUST include new version proto files: "${SERVICEPATH}"/proto/v2/*.proto.

When you generate pb file for REST API descriptors, you also need to provide two directories now:

protoc \
    -I "${PROTOINCLUDE}" \
    "--descriptor_set_out=${SERVICEPATH}/proto/${SERVICE_SHORT_NAME_LOWER_CASE}.pb" \
    "--include_source_info" \
    "--include_imports" \
    "${SERVICEPATH}"/proto/v1/*_service.proto \
    "${SERVICEPATH}"/proto/v2/*_service.proto \
    "${DIAGNOSTICSPATH}"/proto/v1/*_service.proto

With the new pb file, to enable REST API for both versions, you will need to modify envoy.yaml and provide a list of API services in the list for this transcoding. Unfortunately, the envoy is not able to figure out this itself. You may need to maintain multiple envoy.yaml files, for backends with the new version, and backends without the new version.

This is all regarding the regenerate.sh file.

Let’s have a quick view of the Secrets versioning we described before. Here you can see the versioning proto file: https://github.com/cloudwan/edgelq/blob/main/secrets/proto/v1alpha2/secrets_versioning.proto. Then again see the regenerate file, with extra protoc calls and more files provided: https://github.com/cloudwan/edgelq/blob/main/secrets/regenerate.sh.

During this upgrade, we also bumped the diagnostics mixin API, but it’s not important here.

Overview of generated code and implementation

Once you regenerate the service, you will have a “double” code size. Several directories of your service repository will have two subdirectories: v1 and v2 for example. Those directories are access, audithandlers, cli, client, fixtures, proto, resources, server, and store.

Directories for the new version you should treat as already known topics, it is the task of the older version to know how to transform to the new version, not the other way around. In this regard, you should first provide an implementation in new version directories: resources, client, server, etc. You may start by copying handwritten Go files from old version directories to new ones, then make all necessary modifications. You should have the new version fully developed first ideally, without looking at the old (apart from keeping in mind you need later to provide transformers for compatibility). Do not touch the cmd/ directory yet, it’s the last part you should work on.

Versioning module and transformers

When you have a new version, you may first look at all the new files that appeared for the old version. First, look at the new directory created: versioning/v1 (if v1 is the old version). It will have several subdirectories, for all resources and API groups. API groups may be a little less visible at first, because for each resource we have an implicit API group sharing the same name. But if you examine files, you should see the following pattern:

versioning/:
  $OLD_VERSION/
    $API_GROUP_NAME/
      <api_name>_service.pb.transformer
    $RESOURCE_NAME/
      <resource_name>.pb.access.go
      <resource_name>.pb.transformer.go
      <resource_name>_change.pb.transformer.go

Since the resource name has an API group with the same name, you will see often a directory with four generated files. You can look around versioning for secrets, since it is simple: https://github.com/cloudwan/edgelq/tree/main/secrets/versioning/v1alpha2.

All files ending with pb.transformer.go are standard transformer files. They contain one transformer struct definition per each protobuf object defined in the proto file. Therefore, files <api_name>_service.pb.transformer will be containing transformers for requests and responses. Files <resource_name>.pb.transformer.go will contain transformers for resources, <resource_name>_change.pb.transformer.go for change objects.

Let’s start with the resource transformer, for the Secret resource: https://github.com/cloudwan/edgelq/blob/main/secrets/versioning/v1alpha2/secret/secret.pb.transformer.go.

Note that we have first an interface, then we have a default implementation for that interface. Main parts to look at:

var (
    registeredSecretTransformer SecretTransformer
)

func SetSecretTransformer(transformer SecretTransformer) {
    ...
}

func GetSecretTransformer() SecretTransformer {
    ...
}

type SecretTransformer interface {
	...
}

type secretTransformer struct{}

We have a global transformer (for the package), and we can get/set it via functions. There is a reason for that, which will be explained shortly.

If you look at the interface though, you will see transformer functions for Secret resources in versions v1 and v1alpha2. Additionally, you will also see functions for transforming all “helper” objects - name, reference, field path, field mask, filter, field path value, etc. All those functions are also doubled for full bidirectional support. Still, they concentrate on a single object.

Before we jump to some transformation examples, let’s recap one thing about Golang: It has an interface, and you can “cast” implementing struct into the interface, but you don’t have polymorphism. Suppose you defined a struct “inheriting” another one and “overwritten” one of its methods, let’s call it A. Now, imagine that the parent struct has a method called B, which calls A internally. With polymorphism, it would be called your implementation, but not in Golang. Therefore, let’s see the current function for transforming Secret resource from v1alpha2 to v1:

func (t *secretTransformer) SecretToV1(
    ctx context.Context,
    src *secret.Secret,
) (*v1_secret.Secret, error) {
	if src == nil {
		return nil, nil
	}
	dst := &v1_secret.Secret{}
	trName, err := GetSecretTransformer().SecretNameToV1(ctx, src.GetName())
	if err != nil {
		return nil, err
	}
	dst.Name = trName
	dst.EncData = src.GetEncData()
	dst.Data = src.GetData()
	dst.Metadata = src.GetMetadata()
	return dst, nil
}

If we subclass secretTransformer and override SecretNameToV1, then inside SecretToV1 we would still call old implementation, if the code was written like:

trName, err := t.SecretNameToV1(ctx, src.GetName())

Since this is not desired, we decided to always get a globally registered transformer when calling other transformer functions, including self. Therefore, transformers are using a global registry (although they are still packaged). There may have been another solution perhaps, but it works fine.

When you want to override the transformer, you need to create another file and implement this transformer, inherit first from the base one. You should implement minimal required implementation. Your custom transformer will need to be exported.

If you look at other files across Secrets versioning (another transformer, not the pb.access.go file!), you should see that they implement much smaller interfaces - usually just objects back and forth. Resources are those with the largest amount of methods, but they follow the same principles.

Overall, you should notice that there is some hierarchy in these transformation calls.

For example, SecretToV1 needs SecretNameToV1, because the name field is part of the resource. SecretNameToV1 actually needs SecretReferenceToV1. Then SecretFieldMaskToV1 needs SecretFieldPathToV1. Next, SecretFilterToV1 needs SecretFieldPathValueToV1 etc.

Filters and field masks are especially important - transformations like ListSecretsRequestToV1 rely on them! In other words, if we have some special conversion of some specific field path within the resource, and we want to support filter conversions (and field masks), then we need to override relevant transformer functions:

  • <Name>To<Version>

    for object transformation itself. We need to convert fields that cannot be code-generated.

  • <Name>FieldPathTo<Version>

    for field mask transformations, we need to provide mapping for field paths that were not auto-generated.

  • <Name>FieldPathValueTo<Version>

    for filter COMPARE conditions, for non-auto generated field path values.

  • <Name>FieldPathArrayOfValuesTo<Version>

    for filter IN conditions (!), for non-auto generated field path values.

  • <Name>FieldPathArrayItemValue<Version>

    for filter CONTAINS conditions if the field we need special treatment is an array and code-gen was not available.

Filters are pretty complex, they are after all set of conditions, and each condition is a combination of some field path value with an operator!

For secrets, we did not change any fields, we changed just its name field patterns by adding a region segment. Because of this, we need to override only: Reference and ParentReference transformers (for both versions). Name transformers are calling references, so we skipped them. WARNING: To be honest, it should be the other way around, name is basic, reference is on the top. This is one of the versioning parts that will be subject to a change, at least till versioning is only used by our team and no 3rd party services exist at this point. The ParentReference type is also considered obsolete and will be removed entirely.

What is at least good, is that those Reference/Name transformers will be used by resource transformers, filters, requests in all CRUD, etc.

Also, our transformer function will be used by all resources having references to Secret resource! This means, that if we have resources like:

message OtherResource {
  string secret_ref = 1 [(goten.annotations.type).reference = {
    resource: "secrets.edgelq.com/Secret"
    target_delete_behavior : BLOCK
  }];
}

This resource would belong to a service that is upgrading the Secrets version, maintainers of that service would not have to worry about transformation at all. Instead, they will need to import our versioning package, and it will be done for them.

Still, there are some areas for improvement. Note that field changes within resources, if they are breaking changes, require plenty of work - up to five transformer functions (those field paths, field path values for filters…), and even 10, because we need bidirectional transformation. In the future, we will have a special transformation function mapping one field path to another with value transformation - for both directions - two functions in total. Then all those transformer functions will be used.

When it comes to transformers, the code-gen compiler will try to match fields following way: If they share the same type (like int32 to int32, string to string, repeated string to repeated string) and proto wire number, then we have a match. Fields are allowed to change names. Any number/type change requires transformation.

Reference/name transformations require the same underlying type and name pattern.

Using transformers, we can construct access objects, like in https://github.com/cloudwan/edgelq/blob/main/secrets/versioning/v1alpha2/secret/secret.pb.access.go.

It takes access interface of new objects and wraps to provide old ones.

Transformers provide some flexibility in transforming objects larger or smaller, but they lack plenty of abilities. You cannot convert one object into two or more, or the other way around. Access to the database during transformation was possible in the past, but so far not necessary, and what is more problematic, prone to bugs. Roadmap predicts different mechanisms now, and it is advised to provide transformations that are possible. They should convert one item to another, and any “sub” item should be delegated to another transformer.

Once you have all transformers for the given version, it is highly recommended to wrap their initialization in a single module. For example, for secrets we have https://github.com/cloudwan/edgelq/blob/main/secrets/versioning/v1alpha2/secrets/registration.go.

We are importing all versioning packages. If there is any registration using the Go init function, we can “dummy” import with an underscore “_”. Otherwise, we need a registration function with arguments. Any runtime that will need those transformers will need to call this whole-service register function with transformers. Those runtimes are server and dbController of a versioned service AND all servers/dbControllers of importing services. For example, the service applications.edgelq.com imports secrets.edgelq.com, so its server and dbController will need to load secrets versioning modules.

Store transforming

It may be useful to have a handle to the new store, and “cast” it to the old one. This way you could interact with new database data via old API. Goten generates a structure that provides exactly that. For secrets, you can see it here: https://github.com/cloudwan/edgelq/blob/main/secrets/store/v1alpha2/secrets/secrets.pb.transformer.go. It takes the interface to the new store to provide new. It uses generated transformers from versioning packages.

This is an extra file Goten provides in older version packages.

Normally, if you have good transformers, this does not need any extra work.

Server transformer middleware

Server transformer middleware may be considered a final part of API transformation. It receives requests in the old format, transforms them into new ones, and passes them to the new middleware chain. It uses transformer objects from versioning packages.

Goten generates this middleware automatically in each of the server packages. For secrets service you have:

Then you have a glue of transformer middlewares:

Once you have this glue, you may provide a constructor for the server object as simple as in this example: https://github.com/cloudwan/edgelq/blob/main/secrets/server/v1alpha2/secrets/secrets.go.

There, we are passing new service handlers object and wrap with transformer accepting older API. See function NewTransformedSecretsServer. This is a very simple example: When you have a new server object, just wrap it with transformers.

If you wonder why we left NewSecretsServer that returns the old server, we will explain this when we talk about the need to run 2 versions. This is important: When you create a constructor for old server handlers that wrap a new server, you must leave the old constructor still in place.

If you see generated transformers, you may see that everything is wrapped around “transformation sessions”. Those are used by Audit, who needs to be notified about every converted message. If you are curious, check https://github.com/cloudwan/goten/blob/main/runtime/versioning/transformation_session.go, and see the ApiCommunicationTransformationObserver interface. This allows interested parties to observe if there was any change to the version.

If you were able to provide full versioning with transformers only, you can conclude the main work here. If you however need some extra IO work, split requests, or do anything more complicated, you may want either to:

  • Disable server transformations

    for example by disabling it in api-skeletons!. You can check (read again) about skipTransformersBasicActions. Then you can implement your transforming actions for transformation middleware.

  • You may also amend transformer middleware by providing additional custom middleware in front of the generated one, or after if you prefer.

In your transformer middleware, you may also use a store object to extract additional data from a database, but it should be done in NO-TRANSACTION, read-only mode.

If you use transformations, you need to wrap up them with functions from the Goten module:

  • WithUnaryRequestTransformationSession
  • WithUnaryResponseTransformationSession
  • WithStreamClientMsgTransformationSession
  • WithStreamServerMsgTransformationSession

These are defined in https://github.com/cloudwan/goten/blob/main/runtime/versioning/transformation_session.go.

By leveraging custom transformer middlewares, note that you may even construct a “server” instance differently. Let’s go back to “server” construction with a transformer like here (https://github.com/cloudwan/edgelq/blob/main/secrets/server/v1alpha2/secrets/secrets.go), it does not necessarily need to be simple like:

func NewTransformedSecretsServer(
    newServer v1server.SecretsServer,
) SecretsServer {
	return WithTransformerMiddleware(newServer)
}

Instead, you can get a store handle for the new database, API server config, authInfoProvider, and so on. Then, you may construct a server handlers chains in the following way:

  • Old API middleware for multi-region routing
  • Old API middleware for authorization
  • Old API middleware for transaction
  • Old API middleware for outer
  • Transformation of middleware to new API - with special customizations
  • New API Custom middleware (if present)
  • New API Core server

Inside transformer middleware, you are guaranteed to be in a transaction. This may enable new cases, like splitting one Update request (for old API) into multiple Updates (for new API).

However, in the future, this may become recommended in the first place, with new Goten/SPEKTRA Edge upgrades. Note that if you have changed let’s say resource name, permission names from new and old APIs may be incompatible. You may make sure your roles will have permissions for both cases, but it will be more difficult once we Update our IAM to have their roles! It would be unreasonable to expect project admins to update their roles for new permissions, or to upgrade automatically since roles are stored in the IAM database.

Note that you can easily wrap the new store handle into the old one using the store transformer (from the store package we mentioned!).

If you need, you can take the wrapped old store handle, and construct the old API Server completely like it was before, using its middleware only, without transformer one. Then transformations will be happening only on the store level, sparing perhaps some tricky custom methods.

There is practically no cost in constructing two “server” objects for new and old APIs, those are rather stateless light objects. They are not actual servers but just sets of server handlers. If you use old authorization middleware, however, make sure permission names are the same, or you passed old permissions to new roles too! This way new roles can handle new and old APIs. Authorization does not necessarily care about versioning there.

Custom middlewares are powerful with the possibility to execute extra IO work or splitting requests, or maybe even changing entirely one request to a completely different unexpected type. However, there are limitations to it:

  • You still need some good transformers for full resource bodies without the context of requests/responses. The reason is, that during the upgrade process resource transformers are used to convert one to another!

  • References are still tricky. You need to consider that other services (or even your own) have references to resources with tricky name transformations. When those other services upgrade their resources, they will need some instruction on how to convert problematic references. For example, if you have resource A, which you want to split into B and C, then perhaps resource D concerning A will suddenly need to have two references to B and C in the next version.

    Also, filter conditions like WHERE ref_to_a = "$value", may need to be transformed into a thing like WHERE ref_to_b = "$value1" AND ref_to_c = "$value2".

Fixtures

With multiple versions, you will see multiple directories for fixtures. When building images, you should include both in your controller’s final image.

When you create fixtures, especially roles, consider if they will work for both new and old APIs. For example, if some action changed the name, or if the resource changed the name, your role for the new API should have permissions for older and newer names.

Problematic maybe if project admins define their roles (which will be supported in the future). We may in this case recommends using older API Authorization in the first place, transformer middleware should be after it.

When you write new fixtures, you should avoid updating old fixtures! Old things must stay as it is!

Ensuring server and controller can work on two versions

With new and old APIs implemented, we need to handle the following:

  • main.go files - with support for both versions
  • fixtures for new and old versions.

Once you have “code” in place, you need to acknowledge the fact that, right now, an old API is running in your environments, with old resources, and an old database. Once you push your images, they will inherit the old database, incompatible with a new database. The safest process will be to keep the old database as it is AND prepare a new database in its namespace. Without revealing all the details yet, your service backend will be using two databases to achieve a smooth transition.

When your new servers/controllers start, they need to first check what kind of version is “operating now”. The first time they do that, they will see that the database is old, so new servers cannot run yet. Instead, they will need to run old server handlers, old controller, and old db-controller. This is why when you make a new server, you will need to do plenty of copying and pasting.

While your backend service upgrades the database, it will keep serving old content in the old way. Once the database upgrade finishes, your backend service will flip the primary database version. It will offer new API in its full form, and old API will be transformed to a new API on the fly by servers. This ensures you don’t necessarily need to maintain an old database anymore.

Now you should look carefully at the main.go files for the secrets service:

Server

Starting with the server, note that we are using the vrunner object, which gets a function for constructing different servers depending on whether the old or new API version is currently “active”. In this case, runV1Alpha2MainVersionServer is the old of course. If you look at runV1Alpha2MainVersionServer you should note:

  • v1alpha2 store handle is constructed normally
  • v1 store handle is constructed with read-only mode
  • We are constructing two multi-region policy store handlers. For the old one, we constructed it as we would do normally.
  • v1alpha2 server uses OLD handlers constructor, without wrapping new server at all!
  • v1 server is constructed like a standalone, but uses a read-only store handle. It means that all write requests will fail, but it will be already possible to “read” immediately. Read requests may not however return valid data yet.

It is assumed that, at the moment of the server upgrade, no client should be using the new version yet. Therefore, the API server will serve the old API as normal, and the old database will be written/read to. There will be a background database upgrade running and read requests will gradually be more consistent with reality.

In this example, please ignore NewMetaMixinServer (old schema-mixin) and NewLimitsMixinServer. During this particular upgrade of secrets, we also upgraded schema mixin (former meta-mixin) and limits mixin. In the case of 3rd party services, if you use just v1 schema and limits mixins for both versions, just construct the same instance as always, but give them the old API store.

For example, if you had some custom service using limits and schema mixin in v1, and you upgraded from v1 to v2, you should construct the following servers when running in v1 mode:

  • Limits mixin in v1, with access to v1 store handle, and multi-region policy for v1 version.
  • Schema mixin in v1, with access to v1 store handle.
  • Your service in v1, with access to v1 store handle.
  • Your service in v2, with access to v2 read-only store handle, and multi-region policy for v2.

When we upgrade mixins, we will describe the procedure of how to upgrade, but nothing is predicted on the roadmap.

Your service will detect automatically when a switch happens. In that case, the old server will be canceled, and vrunner will automatically call the constructor for the new version server. In the case of secrets, it would be called runV1MainVersionServer.

If you see this constructor, we build a read-write v1 store handle, and we discard the old store entirely. Now limits and schema mixins had to use a new store handle, and the old API server is a wrapped version of the new one. We still serve the old API, but the database has switched completely.

We will need also to prepare an API-server config file to support two database versions. This is the snippet for secrets service:

dbs:
- namespace: "envs/$(LQDENV)-$(EDGELQ_REGION)/secrets/v1"
  backend: "$(DB_BACKEND)"
  apiVersion: "v1alpha2"
  connectionPoolSize: $(SECRETS_DB_CONN_POOL_SIZE)
  mongo:
    endpoint: "mongodb+srv://$(MONGO_DOMAIN)/?authSource=%24external&authMechanism=MONGODB-X509&retryWrites=true&w=majority&tlsCertificateKeyFile=/etc/lqd/mongo/mongodb.pem"
  firestore:
    projectId: "$(FIRESTORE_GCP_PROJECT_ID)"
    credentialsFilePath: "/etc/lqd/gcloud/db-google-credentials.json"
- namespace: "envs/$(LQDENV)-$(EDGELQ_REGION)/secrets/v2"
  backend: "$(DB_BACKEND)"
  apiVersion: "v1"
  connectionPoolSize: $(SECRETS_DB_CONN_POOL_SIZE)
  mongo:
    endpoint: "mongodb+srv://$(MONGO_DOMAIN)/?authSource=%24external&authMechanism=MONGODB-X509&retryWrites=true&w=majority&tlsCertificateKeyFile=/etc/lqd/mongo/mongodb.pem"
  firestore:
    projectId: "$(FIRESTORE_GCP_PROJECT_ID)"
    credentialsFilePath: "/etc/lqd/gcloud/db-google-credentials.json"

Note that we have 2 database entries, with different namespaces and different apiVersion assigned! For historical reasons, we had a mismatch between the version in namespace and apiVersion though, so don’t worry about this part (v1 DB is v1alpha2 for API, and v2 for DB is v1 for API).

Controller

Like with the server, in controller runtime we also use the vrunner object, if you see secrets controller main.go. If the old version is active, then it will run just an old controller, with old fixtures. Meaning, that once you upgrade your images, your controller should run like it was always doing. However, when a version switch is detected, the old controller will be canceled and one new deployed in its place.

Note that the config object has 2 fixture sets: v1alpha2 and v1. If you look at the config file: https://github.com/cloudwan/edgelq/blob/main/secrets/config/controller.proto, you will also see 2 fixture configs accordingly. Any multi-version service should have this.

It also means that new fixtures for projects and your service will only be deployed when the actual version changes.

For your config file, ensure you provide two fixture sets for both versions.

Db-Controller

Now, the db controller is much more different than the controller and server. You don’t have any vrunner. Instead, you should see that we are calling NewVersionedStorage twice, for different databases. We are even passing both to dbSyncerCtrlManager. You should be aware of multiple tasks happening in a db syncer controller module:

  • It handles multi-region syncing.
  • It handles search db syncing if you use a different search backend than a primary database.
  • It handles database upgrades too!

We don’t use vrunner in db-controller, because it is already used by db-syncer-ctrl and db-constraint-ctrl internally. They switch automatically with version change, so it’s not visible in the main.go file.

When the db-controller starts and detects the old version active, it will continue executing regular tasks for the old service. However, in the background, it will start database copying from the old to the new namespace!

Config file for db-controller will need, like a server instance, two entries for dbs. This is a snippet from our secrets:

dbs:
- namespace: "envs/$(LQDENV)-$(EDGELQ_REGION)/secrets/v1"
  backend: "$(DB_BACKEND)"
  apiVersion: "v1alpha2"
  connectionPoolSize: $(SECRETS_DB_CONN_POOL_SIZE)
  mongo:
    endpoint: "mongodb+srv://$(MONGO_DOMAIN)/?authSource=%24external&authMechanism=MONGODB-X509&retryWrites=true&w=majority&tlsCertificateKeyFile=/etc/lqd/mongo/mongodb.pem"
  firestore:
    projectId: "$(FIRESTORE_GCP_PROJECT_ID)"
    credentialsFilePath: "/etc/lqd/gcloud/db-google-credentials.json"
  disabled: $(V1_ALPHA2_DB_DISABLED)
- namespace: "envs/$(LQDENV)-$(EDGELQ_REGION)/secrets/v2"
  backend: "$(DB_BACKEND)"
  apiVersion: "v1"
  connectionPoolSize: $(SECRETS_DB_CONN_POOL_SIZE)
  mongo:
    endpoint: "mongodb+srv://$(MONGO_DOMAIN)/?authSource=%24external&authMechanism=MONGODB-X509&retryWrites=true&w=majority&tlsCertificateKeyFile=/etc/lqd/mongo/mongodb.pem"
  firestore:
    projectId: "$(FIRESTORE_GCP_PROJECT_ID)"
    credentialsFilePath: "/etc/lqd/gcloud/db-google-credentials.json"

The new element not present on the server side is disabled: $(V1_ALPHA2_DB_DISABLED). If you look back at main.go for controller though, you should see the following code:

func main() {
    ...

    var v1Alpha2Storage *node.VersionedStorage
    dbSyncingCtrlCfg := db_syncing_ctrl.NewDefaultControllerNodeConfig()
    if serverEnvCfg.DbVersionEnabled("v1alpha2") || envRegistry.MyRegionalDeploymentInfo().GetCurrentVersion() == "v1alpha2" {
		...

        dbSyncingCtrlCfg.EnableDowngradeDbSyncing = serverEnvCfg.DbVersionEnabled("v1alpha2")
    }
}

It means that:

  • If the current detected version is v1alpha2, the old one, then the second boolean check passes, and we are adding v1alpha2 storage regardless of serverEnvCfg.DbVersionEnabled("v1alpha2"). However, if this returns false, then dbSyncingCtrlCfg.EnableDowngradeDbSyncing is false.
  • If current version is v1, and serverEnvCfg.DbVersionEnabled("v1alpha2") returns false, then v1alpha2 storage is completely non-visible anymore.

We will discuss this when talking about the Upgrade process.

Versioning transformers

If you looked carefully enough, you should notice the following lines in two main.go files for secrets (server and db-controller):

import (
    vsecrets "github.com/cloudwan/edgelq/secrets/versioning/v1alpha2/secrets"
)

func main() {
	...

    vsecrets.RegisterCustomSecretsTransformers(envRegistry.MyRegionId())
	
	...
}

This import is necessary for the correct working of the server and db-controller. The former needs them for API transformation, the latter needs for database upgrade. If you don’t have any custom transformers, and you use just the init function, you at least will need to make a “dummy” import:

import (
    _ "github.com/cloudwan/edgelq/secrets/versioning/v1alpha2/secrets"
)

For non-dummy imports, transformers will be also needed for all importing services. For example, since service applications.edgelq.com imports secrets.edgelq.com, we also had to load same versioning transformers in its main

go files:

Note that we are calling vsecrets.RegisterCustomSecretsTransformers( envRegistry.MyRegionId()) there too! This is necessary to transform references to Secret resources! When you upgrade imported services, make sure to import their transformers.

Upgrading process

By now you should know that, when you upgrade images, your service backend will continue operating on the old API version and old database, but db-controller will be secretly upgrading the database by copying data from one namespace to another.

The information on what version is active is coming from the meta.goten.com service. Each Deployment resource has a field called currentVersion. It also means, that each region controls its version, and you need to run an upgrade process for all regions for a service (Deployment).

Therefore, we focus on a single region only, just in case. First, you pick a region to upgrade, upload images, and restart backend services to use them. They will start serving the old version first, and start upgrading the database.

But they won’t switch on their own, they will just sync the database, then keep syncing forever for every write request happening for the old version. To trigger an upgrade with the version switch, you should use the BeginUpgrade request to Meta service. For example, if you are upgrading service custom.edgelq.com in region us-west2, you may use cuttle. Let us assume you are upgrading from v1 to v2.

cuttle meta begin-upgrade deployment \
  --name 'services/custom.edgelq.com/deployments/us-west2' \
  --total-shards-count 16 \
  --target-version 'v2'

Total shards count, Value 16, is coming from several shards byName you have in db-controller, see db controller config, sharding settings. This must be the same. In the future, we may provide sharding info via meta service resources rather than config files. Ring size 16 is the current standard.

You may find the request proto definition here: https://github.com/cloudwan/goten/blob/main/meta-service/proto/v1/deployment_custom.proto.

Once you start upgrading, monitor services/custom.edgelq.com/deployments/us-west2 with periodic GET or WATCH requests.

Once you start upgrading, the field upgrade_state of deployment will be updated, and you should see data like (other fields are omitted):

{
  "name": "services/custom.edgelq.com/deployments/us-west2",
  "currentVersion": "v1",
  "upgradeState": {
    "targetVersion": "v2",
    "pendingShards": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15],
    "state": "INITIAL_SYNCING"
  }
}

This initial syncing may be a bit misleading because initial syncing already starts automatically, but this time db-controller is reporting process, for each shard completed, it will update:

{
  "name": "services/custom.edgelq.com/deployments/us-west2",
  "currentVersion": "v1",
  "upgradeState": {
    "targetVersion": "v2",
    "readyShards": [0, 2],
    "pendingShards": [1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15],
    "state": "INITIAL_SYNCING"
  }
}

Once all shards move to ready, then the state will change and all ready shards become pending again:

{
  "name": "services/custom.edgelq.com/deployments/us-west2",
  "currentVersion": "v1",
  "upgradeState": {
    "targetVersion": "v2",
    "pendingShards": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15],
    "state": "SWITCHING"
  }
}

When this happens, API servers will reject writing requests. This ensures that the db-controller does not need to play catch up game with writes that may be happening, instead, it can focus on stabilizing the database and finish remaining writes.

At this point, like 99.5% of data should already be on the new database. Initial syncing completes when the DB-controller reached parity at least for a moment. Active writes may make it unsafe to switch databases though, it will be necessary to disable writes for a moment, up to 1 minute. Of course, reading and writing to other services will continue as usual, therefore disruption should be relatively minimal.

Pending shards will start moving to ready. Once all of them are moved, you should see:

{
  "name": "services/custom.edgelq.com/deployments/us-west2",
  "currentVersion": "v2"
}

This concludes the upgrade process. All backend runtimes will automatically switch to the new version.

If you believe the db-controller is stuck, check for logs -> if there is any bug, it may be crashing, which requires a fix or rollback, depending on the env type. If everything is fine, it may deadlock optionally, which happened in our case in some dev environments. This upgrade mechanism is still under work, but restarting the db-controller normally fixes the issue, and it continues to upgrade without any issues. So far we upgraded a couple of environments like this without breaking anything, but still be careful as it is an experimental feature. The worst case however should be averted thanks to database namespace separation, and other means of db upgrades are more risky.

Let’s talk about rollback options.

First, to note, there is some other task db-controller will start in the background, depending on settings, ONCE it switches from old database to new: It will start syncing from new database to old, in a reverse direction than before. This may be beneficial if you will need to revert after several days, and you want to keep updates in the new database. If this is not desired, if you prefer to have quick rollback by just updating pods to old images, you can modify the DB-controller config field disabled:

dbs:
- namespace: "envs/$(LQDENV)-$(EDGELQ_REGION)/secrets/v1"
  backend: "$(DB_BACKEND)"
  apiVersion: "v1alpha2"
  connectionPoolSize: $(SECRETS_DB_CONN_POOL_SIZE)
  mongo:
    endpoint: "mongodb+srv://$(MONGO_DOMAIN)/?authSource=%24external&authMechanism=MONGODB-X509&retryWrites=true&w=majority&tlsCertificateKeyFile=/etc/lqd/mongo/mongodb.pem"
  firestore:
    projectId: "$(FIRESTORE_GCP_PROJECT_ID)"
    credentialsFilePath: "/etc/lqd/gcloud/db-google-credentials.json"

  ## If this value is "true", then, if this API version is inactive, DbController WILL NOT try to sync updates
  ## from new database to old. Old DB will be more and more behind new version with each day when new version is active.
  disabled: $(V1_ALPHA2_DB_DISABLED)

- namespace: "envs/$(LQDENV)-$(EDGELQ_REGION)/secrets/v2"
  backend: "$(DB_BACKEND)"
  apiVersion: "v1"
  connectionPoolSize: $(SECRETS_DB_CONN_POOL_SIZE)
  mongo:
    endpoint: "mongodb+srv://$(MONGO_DOMAIN)/?authSource=%24external&authMechanism=MONGODB-X509&retryWrites=true&w=majority&tlsCertificateKeyFile=/etc/lqd/mongo/mongodb.pem"
  firestore:
    projectId: "$(FIRESTORE_GCP_PROJECT_ID)"
    credentialsFilePath: "/etc/lqd/gcloud/db-google-credentials.json"

Note that we have particular code in db-controller (again):

func main() {
    ...

    var v1Alpha2Storage *node.VersionedStorage
    dbSyncingCtrlCfg := db_syncing_ctrl.NewDefaultControllerNodeConfig()
    if serverEnvCfg.DbVersionEnabled("v1alpha2") || envRegistry.MyRegionalDeploymentInfo().GetCurrentVersion() == "v1alpha2" {
		...

        dbSyncingCtrlCfg.EnableDowngradeDbSyncing = serverEnvCfg.DbVersionEnabled("v1alpha2")
    }
}

If serverEnvCfg.DbVersionEnabled("v1alpha2") returns false, then either db-controller will not even get access to the old database, or if the current version is v1alpha2, when the switch happens, then dbSyncingCtrlCfg.EnableDowngradeDbSyncing will be false. This will ensure that the db-controller will not start syncing in the reverse new -> old direction after the switch. This may make quick rollback safer, without using database backup.

Perhaps it is best to start this way -> disable the old version if inactive, and make the upgrade. Then check if everything is fine, and if there is an emergency to rollback, you can deploy old pods and apply updates quickly. If you do that, remember however to send the UpdateDeployment request to meta.goten.com to ensure the currentVersion field points to the old one. If everything is good, however, you may optionally enable new -> old DB syncing from the config, in case rollback is needed after some amount of days, but you are confident enough that it won’t corrupt the old database.

Other upgrade information:

  • SearchDB, if present, is automatically synced during the upgrade, but maybe a bit delayed behind (a matter of seconds).
  • For resources not owned by the local database, we are talking about read copies resulting from multi-region, db-controller will not attempt to sync old -> new. Instead, it will just send watch requests to other regions, separate for old and new APIs. Copies will be done asynchronously and don’t influence the “switch”.
  • Meta owner references, throughout all services, will be updated asynchronously once the service switches to the new version. Unlike hard schema references pointing to our service, meta owner references are assumed owned by the service they point to, and they must use the version currently used by this service/region.