This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Goten Organization

Understanding the Goten directory structure and libraries.

In the SPEKTRA Edge repository, we have directories for each service:

edgelq/
 applications/
 audit/
 devices/
 iam/
 limits/
 logging/
 meta/
 monitoring/
 proxies/
 secrets/
 ztp/

All names of these services end with the .edgelq.com suffix, except meta. The full name of the meta service is meta.goten.com. The reason is that the core of this service is not in the SPEKTRA Edge repository, it is in the Goten repository:

goten/
 meta-service/

This is where meta’s api-skeleton is, Protocol buffers files, almost all the code-generated modules, and server implementation. The reason why we talk about meta service first is quite important, because it also teaches the difference between SPEKTRA Edge and Goten.

Goten is called a framework for SPEKTRA Edge, but this framework has two main tool sets:

  1. Compiler

    It takes the schema of your service, and generates all the boilerplate code.

  2. Runtime

    Runtime libraries, which are referenced by generated code, are used heavily throughout all services based on Goten.

Goten provides its schema language on top of Protocol Buffers: It introduces the concept of Service packages (with versions), API groups, actions, resources. Unlike raw Protocol Buffers, we have a full-blown schema with references that can point across regions and services, and those services can also vary in versions. Resources can reference each other, services can import each other.

Goten balances between code-generation and runtime libraries operating on “resources” or “methods”. It is usually the tradeoff between performance, type safety, code size, maintainability, and the readability.

If you look at the meta-service, you will see that it has four resource types:

  1. Region
  2. Service
  3. Resource
  4. Deployment

This is pretty much exactly what Goten provides. To be a robust framework, to provide on its promises, Multi-region, multi-service, and multi-version, Goten needs a concept of a service that contains information about regions, services, etc.

SPEKTRA Edge provides various services on a higher level, but Goten provides the baseline for them and allows relationships between them. The only reason why the “meta” directory exists also in SPEKTRA Edge, is because the meta service also needs extra SPEKTRA Edge integration like authorization layer. In the SPEKTRA Edge repo, we have additional components added to meta, and finally, we have meta main.go files. If you look at the files meta service has in the SPEKTRA Edge repo (for the v1 version, not v1alpha2), you will understand that the edgelq version wraps up what goten provides. We have also a “v1alpha2” service (full one), from times before we moved meta to Goten. In those past times, SPEKTRA Edge was overriding half of the functionality provided by Goten, and it was heading in a terrible direction from there.

Goten Directory Structure

As a framework, goten provides:

  1. Modules related to service schema and prototyping (API skeleton and proto files).
  2. Compilers that generate code based on schema
  3. Runtime libraries linked during the compilation

For schema & prototyping, we have directories:

  • schemas

    This directory contains generated JSON schema for api-skeleton files. It is generated based on file annotations/bootstrap.proto.

  • annotations

    Protobuf is already a kind of language for building APIs, but Goten is said to provide a higher level one. This directory contains various proto options (extra decorations), that enhance standard protobuf language. There is one exceptional file though: bootstrap.proto, which DOES NOT define any options, instead it describes the api-skeleton schema in protobuf. The file in the schemas directory is just a compilation of this file. The annotations directory contains generated Golang code describing those proto options. Normally ignore it.

  • types

    Contains set of reusable protobuf messages that are used in services using Goten, for example, “Meta” object (file types/meta.proto) is used in almost every resource type. The difference between annotations and types is that, while annotations describe options we can attach to files/proto messages/fields/enums etc., types contain just reusable objects/enums. Apart from that, each proto file contains compiled Golang objects in the relevant directory.

  • contrib/protobuf/google

    This directory, as far as I understand, allows us to avoid downloading full protobuf deps from Google, it just has bits we decided to take. SubDirectory api maps to our annotations, and type to types. There is a weird exception, because distribution.proto matches more type directories than api in this manner, but let it be. Perhaps it can be deleted entirely, as I am not sure where we use it if at all. However, I told you one small lie, SPEKTRA Edge Contributors DO HAVE to download some protocol buffers (will be mentioned in the scripts directory). The problem is, that this downloaded library is more lightweight and does not contain types we put in contrib/protobuf/google.

All the above directories can be considered a form of Goten-protobuf language that you should know from the developer guide.

For compilers (code-generators), we have directories:

  • compiler

    Each subdirectory (well, almost) contains a specific compiler that generates some set of files that Goten as a whole generates. For example compiler/server generates server middleware.

  • cmd

    Goten does not come with any runtime on its own. This directory provides main.go files for all compilers (code-generators) Goten has.

Compilers generate code you should already know from the developer guide as well.

Runtime libraries have just a single directory:

  • runtime

    Contains various modules for clients, servers, controllers… Each will be talked about separately in various topic-oriented documents.

  • Compiled types

    types/meta/, and types/multi_region_policy may be considered part of the runtime, they map to types objects. You may say, that while resource proto schema imports goten/types/meta.proto, generated code will refer to Go package goten/types/meta/.

In the developer guide, we had brief mentions of some base runtime types, but we were treating them as black boxes, while in this document set, we will dive in.

Other directories in Goten:

  • example

    Contains some typical services developed on Goten, but without SPEKTRA Edge. The current purpose of them is only to run some integration tests though.

  • prototests

    Contains just some basic tests over base extended types by Goten, but does not delve as deep as tests in the example directory.

  • meta-service

    It contains full service of meta without SPEKTRA Edge components and main files. It is supposed to be wrapped by Goten users, SPEKTRA Edge in our case.

  • scripts

    Contains one-of scripts for installing development tools, reusable scripts for other scripts, or regeneration script that regenerates files from the current goten directory (regenerate.sh).

  • src

    This directory name is the most confusing here. It does not contain anything for the framework. It contains generated Java code of annotations and types directories in Goten. It is generated for the local pom.xml file. This Java module is just an import dependency for Goten, so Java code can use protobuf types defined by Goten. We have some Java code in the SPEKTRA Edge repository, so for this purpose, in Goten, we have a small Java package.

  • tools

    Just some dummy imports to ensure they are present in go.mod/go.sum files in goten.

  • webui

    Some generic UI for Goten service, but note this lies abandoned, as our front-end teams no longer develop generic UI, focusing on specialized only.

Regarding files other than obvious:

  • pom.xml

    This is for building a Java package containing Goten protobuf types.

  • sdk-config.yaml

    This is used to generate the goten-sdk repository (public one), since goten itself is private. Nobody wants to keep copying manually public files from goten to goten-sdk, so we have this done for us.

  • tools.go

    It just ensures we have deps in go.mod. Unsure why it is separated from the tools directory.

SPEKTRA Edge Directory Structure

SPEKTRA Edge is a home repository for all core SPEKTRA Edge services, and an adaptation of meta.goten.com, meaning that its sub directories should be familiar, and you should navigate their code well enough since they are “typical” Goten-built services.

We have a common directory though, with some example elements (more important):

  • api and rpc

    Those directories contain extra protobuf reusable types. You will most likely interact with api.ServiceAccount (not to confuse with the iam.edgelq.com/ServiceAccount resource)!

  • cli_configv1, cli_configv2

    The second directory is used by the cuttle CLI utility, and will be needed for all cuttles for 3rd parties.

  • clientenv

    Those contains obsolete config for client env, but its grpc dialers and authclients (for user authentication) are still in use. Needs some cleanup.

  • consts

    It has a set of various common constants in SPEKTRA Edge.

  • doc

    It wraps protoc-gen-goten-doc with additional functionality, to display needed permissions for actions.

  • fixtrues_controller

    It is the full fixtures controller module.

  • serverenv

    It contains a common set for backend runtimes provided by SPEKTRA Edge (typically server, but some elements are used by controllers too).

  • widecolumn

    It contains a storage alternative to the Goten store, for some advanced cases, we will have a different document design for this.

Other directories:

  • healthcheck

    It contains a simple image that polls health checks of core SPEKTRA Edge services.

  • mixins

    It contains a set of mixins, they will be discussed via separate topics.

  • protoc-gen-npm-apis

    It is a Typescript compiler for the frontend team, maintained by the backend. You should read more about compilers here

  • npm

    It is where code generated by protoc-gen-npm-apis goes.

  • scripts

    Set of common scripts, developers must learn to use primarily regenerate-all-sh whenever they change any api-skeleton or proto file.

  • src

    It contains some “soon legacy” Java-generated code for the Monitoring Pipeline, which will get the separate documents.

1 - Goten Server Library

Understanding the Goten server library.

The server should more or less be already known from the developer guide. We will provide some missing bits here only.

When we talk about servers, we can distinguish:

  • gRPC Server instance that is listening on a TCP port.
  • Server handler sets that implement some Service GRPC interface.

To underline what I mean, look at the following code snippet from IAM:

grpcServer := grpcserver.NewGrpcServer(
  authenticator.AuthFunc(),
  commonCfg.GetGrpcServer(),
  log,
)

v1LimMixinServer := v1limmixinserver.NewLimitsMixinServer(
  commonCfg,
  limMixinStore,
  authInfoProvider,
  envRegistry,
  policyStore,
)
v1alpha2LimMixinServer := v1alpha2limmixinserver.NewTransformedLimitsMixinServer(
  v1LimMixinServer,
)
schemaServer := v1schemaserver.NewSchemaMixinServer(
  commonCfg,
  schemaStore,
  v1Store,
  policyStore,
  authInfoProvider,
  v1client.GetIAMDescriptor(),
)
v1alpha2MetaMixinServer := metamixinserver.NewMetaMixinTransformerServer(
  schemaServer,
  envRegistry,
)
v1Server := v1server.NewIAMServer(
  ctx,
  cfg,
  v1Store,
  authenticator,
  authInfoProvider,
  envRegistry,
  policyStore,
)
v1alpha2Server := v1alpha2server.NewTransformedIAMServer(
  cfg,
  v1Server,
  v1Store,
  authInfoProvider,
)

v1alpha2server.RegisterServer(
  grpcServer.GetHandle(),
  v1alpha2Server,
)
v1server.RegisterServer(grpcServer.GetHandle(), v1Server)

metamixinserver.RegisterServer(
  grpcServer.GetHandle(),
  v1alpha2MetaMixinServer,
)
v1alpha2limmixinserver.RegisterServer(
  grpcServer.GetHandle(),
  v1alpha2LimMixinServer,
)
v1limmixinserver.RegisterServer(
  grpcServer.GetHandle(),
  v1LimMixinServer,
)
v1schemaserver.RegisterServer(
  grpcServer.GetHandle(),
  schemaServer,
)
v1alpha2diagserver.RegisterServer(
  grpcServer.GetHandle(),
  v1alpha2diagserver.NewDiagnosticsMixinServer(),
)
v1diagserver.RegisterServer(
  grpcServer.GetHandle(),
  v1diagserver.NewDiagnosticsMixinServer(),
)

There, an instance called grpcServer is an actual GRPC Server instance listening on a TCP port. If you dive into this implementation, you should notice we are constructing an EdgelqGrpcServer structure. It may consist of actually two port listening instances:

  • googleGrpcServer *grpc.Server, which is initialized with a set of unary and stream interceptors, optional TLS.
  • websocketHTTPServer *http.Server, which is initialized only if the websocket port was set. It delegates handling to improbableGrpcwebServer, which uses googleGrpcServer.

This Google server is the primary one and handles regular gRPC calls. The reason for the additional HTTP server is that we need to support web browsers, which cannot support native gRPC protocol. Instead:

  • grpcweb is needed to handle unary and server-streaming calls.
  • websockets are needed for bidirectional streaming calls.

Additionally, we have REST API support…

We have this envoy proxy sidecar, a separate container running next to the server instance. It handles all REST API, converting to native gRPC. It converts grpcweb into native grpc too, but has issues with websockets. For this reason, we added a Golang HTTP server with an improbable gRPC web instance. This improbable grpc web instance can handle both grpcweb and websockets, but we use it for websockets only, since it is missing from envoy proxy.

In theory, an improbable web server would be able to handle ALL protocols, but there is a drawback: For native gRPC calls will be less performant than the native grpc server (and ServeHTTP is less maintained). It is recommended to keep them separate, so we will stick with 2 ports. We may have some opportunity to remove the envoy proxy though.

Returning to the googleGrpcServer instance, we have all stream/unary interceptors that are common for all calls, but this does not implement the actual interface we expect from gRPC servers. Each service version provides a complete interface to implement. For example, see the IAMServer interface in this file: https://github.com/cloudwan/edgelq/blob/main/iam/server/v1/iam/iam.pb.grpc.go.

Those server interfaces are in files ending with pb.grpc.go.

To have a full server, we need to combine the GRPC Server instance for SPEKTRA Edge (EdgelqGrpcServer), with, let’s make up some name for it: A business logic server instance (set of handlers). In this iam.pb.grpc.go file this business logic instance is iamServer. Going back to the main.go snippet that is provided way above, we are registering eight business logic servers (handler sets) on the provided *grpc.Server instance. As long as paths are unique across all, it is fine to register as many as we can. Typically, we must include primary service for all versions, then all mixins in all versions.

Those business logic servers provide code-generated middleware, typically executed in this order:

  • Multi-region routing middleware (may redirect processing somewhere else, or split across many regions).
  • Authorization middleware (may use a local cache, or send a request to IAM to obtain fresh role bindings).
  • Transaction middleware (configures access to the database, for snapshot transactions and establishes new session).
  • Outer middleware, which provides validation, and common outer operations for certain CRUD requests. For example, for update calls, it will ensure the resource exists and apply an update mask to achieve the final resource to save.
  • Optional custom middleware and server code - which are responsible for final execution.

Transaction middleware also may repeat execution of all internal middleware

  • core server, if the transaction needs to be repeated.

There are also “initial handlers” in generated pb.grpc.go files. For example, see this file: https://github.com/cloudwan/edgelq/blob/main/iam/server/v1/group/group_service.pb.grpc.go. For example, you can see _GroupService_GetGroup_Handler as example for unary, and _GroupService_WatchGroup_Handler as an example for streaming calls.

It is worth mentioning how interceptors play with middleware and these “initial handlers”. Let’s copy and paste interceptors from the current edgelq/common/serverenv/grpc/server.go file:

grpc.StreamInterceptor(grpc_middleware.ChainStreamServer(
    grpc_ctxtags.StreamServerInterceptor(),
    grpc_logrus.StreamServerInterceptor(
      log,
      grpc_logrus.WithLevels(codeToLevel),
    ),
    grpc_recovery.StreamServerInterceptor(
      grpc_recovery.WithRecoveryHandlerContext(recoveryHandler),
    ),
    RespHeadersStreamServerInterceptor(),
    grpc_auth.StreamServerInterceptor(authFunc),
    PayloadStreamServerInterceptor(log, PayloadLoggingDecider),
    grpc_validator.StreamServerInterceptor(),
)),
grpc.UnaryInterceptor(grpc_middleware.ChainUnaryServer(
    grpc_ctxtags.UnaryServerInterceptor(),
    grpc_logrus.UnaryServerInterceptor(
      log,
      grpc_logrus.WithLevels(codeToLevel),
    ),
    grpc_recovery.UnaryServerInterceptor(
      grpc_recovery.WithRecoveryHandlerContext(recoveryHandler),
    ),
    RespHeadersUnaryServerInterceptor(),
    grpc_auth.UnaryServerInterceptor(authFunc),
    PayloadUnaryServerInterceptor(log, PayloadLoggingDecider),
    grpc_validator.UnaryServerInterceptor(),
)),

Unary requests are executed in the following way:

  • Function _GroupService_GetGroup_Handler is called first! It calls the first interceptor but before that, it creates a handler that wraps the first middleware and passes to the interceptor chain.
  • The first interceptor is: grpc_ctxtags.UnaryServerInterceptor(). It calls the handler passed, which is the next interceptor.
  • The next interceptor is grpc_logrus.UnaryServerInterceptor and so on. At some point, we are calling the interceptor executing authentication.
  • The last interceptor (grpc_validator.UnaryServerInterceptor()) calls finally handler created by GroupService_GetGroup_Handler.
  • First middleware is called. The call is executed through the middleware chain, and may reach the core server, but may return earlier.
  • Interceptors are unwrapping in reverse order.

It is visible how this is called from the ChainUnaryServer implementation if you look.

Streaming calls are a bit different because we start from the interceptors themselves:

  • gRPC Server instance takes function _GroupService_WatchGroup_Handler and casts into grpc.StreamHandler type.
  • Object grpc.StreamHandler, which is a handler for our method, is passed to the interceptor chain. During the chaining process, grpc.StreamHandler is wrapped with all streaming interceptors, starting from the last. Therefore, the most internal StreamHandler will be _GroupService_WatchGroup_Handler.
  • grpc_ctxtags.StreamServerInterceptor() is the entry point! It then invokes the next interceptors, and we go further and further, till we reach _GroupService_WatchGroup_Handler, which is called by the last stream interceptor, grpc_validator.StreamServerInterceptor().
  • Middlewares are executed in the same way as always.

See the ChainStreamServer implementation if you don’t believe it.

In total, this should give an idea of how the server works and what are the layers.

2 - Goten Controller Library

Understanding the Goten controller library.

You should know about controller design from the developer guide. Here we give a small recap of the controller with tips about code paths.

The controller framework is part of the wider Goten framework. It has annotations + compiler parts, in:

You can read more about Goten compiler. For now, in this place, we will talk just about generated controllers.

There are some runtime elements for all controller components (NodeManager, Node, Processor, Syncer…) in runtime/controller direction in Goten repo: https://github.com/cloudwan/goten/tree/main/runtime/controller.

In the config.proto, we have node registry access config and nodes manager configs, which you should already know from controller/db-controller config proto files.

A bit more interesting thing we have with Node managers. As it was said in the Developer Guide, we scale horizontally by adding more nodes. To have more nodes in a single pod, which increases the chance of fairer workload distribution, we often have more than 1 Node instance per type. We organize them with Node Managers. You should see a directory runtime/controller/node_management/manager.go.

Each Node must implement:

type Node interface {
  Run(ctx context.Context) error
  UpdateShardRange(ctx context.Context, newRange ShardRange)
}

Node Manager component creates on the startup as many Nodes as it has in the config. Next, it runs all of them, but they don’t get yet any share of shards. Therefore, they are idle. Managers register all nodes in the registry, where all node IDs across all pods are collected. The registry is responsible for returning the shard range assigned for each node. Whenever a pod dies or a new one is deployed, the Node registry will notify the manager about new shard ranges per Node. It then notifies the relevant Node via the UpdateShardRange call.

Registry for Redis uses periodic polling, therefore there may be a chance two controllers executing the same work in theory for a couple of seconds. It probably will be better to improve, but we design controllers around the observed/desired state, and duplicating the same request may bring some temporary warning errors, but they should be harmless. Still, it’s a field for improvement.

See the NodeRegistry component (in file registry.go, we use Redis).

Apart from the node managers directory in runtime/controller, you can see the processor package. We have there from more notable elements:

  • Runner module, which is processor runner goroutine. It is the component for executing all events in a thread-safe manner, but developers must not do any IO.
  • Syncer module, which is generic and based on interfaces, although we generate type-safe wrappers in all controllers. It is quite large, it consists of Desired/Observed state objects (file syncer_states.go), an updater that operates on its own goroutine (file syncer_updater.go), and finally central Syncer object, defined in syncer.go. It compares the desired vs observed state and pushes updates to the syncer updater.
  • In synchronizable we have structures responsible for propagating sync/lostSync events across Processor modules, so ideally developers don’t need to handle them themselves.

Syncer is fairly complex, it needs to handle failures/recoveries, resets, and bursts of updates. Note that it does not use Go channels because:

  • They have limited capacity (defined). This is not nice considering we have IO works there.
  • Maps are best if there are multiple updates to a single resource because they will allow to merging of multiple events (overwrite previous ones). Channels would force at least to consume all items from the queue.

3 - Goten Data Store Library

Understanding the Goten data store library.

The developer guide gives some examples of simple interaction with the Store interface, but hides all implementation details, which we will cover here now, at least partially.

The store should provide:

  • Read and write access to resources according to the resource Access interface. Transactions, which will guarantee resources that have been read from the database (or query collections) will not change before the transaction is committed. This is provided by the core store module, described in this doc.
  • Transparent cache layer, reducing pressure on the database, managed by “cache” middleware, described in this doc.
  • Transparent constraint layer handling references to other resources, and handling blocking references. This is a more complex topic, and we will discuss this in different documents (multi-region, multi-service, multi-version design).
  • Automatic resource sharding by various criteria, managed by store plugins, covered in this doc.
  • Automatic resource metadata updates (generation, update time…), managed by store plugins, covered in this doc.
  • Observability is provided automatically (we will come back to it in the Observability document).

The above list should however at least give an idea, that interface calls may be often complex and require interactions with various components using IO operations! In general, a call to the Store interface may involve:

  • Calling underlying database (mongo, firestore…), for transactions (write set), non-cached reads…
  • Calling cache layer (redis), for reads or invalidation purposes.
  • Calling other services or regions in case of references to resources to other services and regions. This will be not covered by this document but in this multi-multi-multi thing.

Store implementation resides in Goten, here: https://github.com/cloudwan/goten/tree/main/runtime/store.

The primary file is store.go, with the following interfaces:

  • Store is the public store interface for developers.
  • Backend and TxSession are to be implemented by specific backend implementations like Firestore and Mongo. They are not exposed to end-service developers.
  • SearchBackend is like Backend, for just for search, which is often provided separately. Example: Algolia, but in the future we may introduce Mongo combining both search and regular backend implementation.

The store is also actually a “middleware” chain like a server. In the file store.go we have store struct type, which wraps the backend and provides the first core implementation of the Store interface. This wrapper does:

  • Add tracing spans for all operations
  • For transactions, store an observability tracker in the current ctx object.
  • Invokes all relevant store plugin functions, so custom code can be injected apart from “middlewares”.
  • Accumulates resources to save/delete, does not trigger updates immediately. They are executed at the end of the transaction.

You can consider it equivalent to a server core module (in the middleware chain).

To study the store, you should at least check the implementation of WithStoreHandleOpts.

  • You can see that plugins are notified about new and finished transactions.
  • Function runCore is a RETRY-ABLE function that may be invoked again for the aborted transaction. However, this can happen only for SNAPSHOT transactions. This also implies that all logic within a transaction must be repeatable.
  • runCore executes a function passed to the transaction. In terms of server middleware chains, it means we are executing outer + custom middleware (if present) and/or core server.
  • Store plugins are notified when a transaction is attempted (perhaps again), and get a chance to inject logic just before committing. They also have a chance to cancel the entire operation.
  • You should also note, that Store Save/Delete implementations do not add any changes to the backend. Instead, creations, updates, and deletions are accumulated and passed in batch commit inside WithStoreHandleOpts.

Notable things for Save/Delete implementations:

  • They don’t do any changes yet, they are just added to the change set to be applied (inside WithStoreHandleOpts).
  • For Save, we extract current resources from the database and this is how we detect whether it is an update or creation.
  • For Delete, we also get the current object state, so we know the full resource body we are about to delete.
  • Store plugins get a chance to see created/updated/deleted resource bodies. For updates, we can see before/after.

To see a plugin interface, check the plugin.go file. Some simple store plugins you could check, are those in the directory store_plugins:

  • metaStorePlugin in meta.go must be always the first store plugin inserted. It ensures the metadata object is initialized and tracks the last update.
  • You should also see a sharding plugins (by_name_sharding.go and by_service_id_sharding.go),

Multi-region plugins and design will be discussed in another document.

Store Cache middleware

The core store module, as described in store.go, is wrapped with cache “middleware”, see subdirectory cache, file cached_store.go, which implements the Store interface and wraps the lower level:

  • WithStoreHandleOpts decorates function passed to it, to include cache session restart, in case we have writes that invalidate the cache. After WithStoreHandleOpts finishes (inner), we need to push invalidated objects to the worker. It will either invalidate or mark itself as bad if invalidation fails.
  • All read requests (Get, BatchGet, Query, Search) first try to get data from the cache and pass it to the inner in case of failure, cache miss, or not cache-able.
  • Struct cachedStore implements not only the Store interface but the store plugin as well. In the constructor NewCachedStore you should see it adds itself as a plugin. The reason is that cachedStore is interested in creating/updated (pre + post) and deleted resource bodies. Save provides only the current resource body, and Delete provides only the name to delete. To utilize the fact that the core store already extracts the “previous” resource state, we implement cachedStore as a plugin.

Note that watches are non-cacheable. The cached store also needs a separate backend, we support as of now Redis implementation only.

The reason why we invalidate references/query groups after the transaction concludes (WithStoreHandleOpts), is because we want new changes to be already in the database. If we invalidate after writes, then when the new cache is refreshed, it will be for data after the transaction. This is one safeguard, but not sufficient yet.

The cache is written to during non-transaction reads (gets or queries). If results were not in the cache, we fall back to the internal store, using the main database. With results obtained, we are saving them in cache, but this is a bit less simple:

  • When we first try to READ from cache but face cache MISS, then we are writing “reservation indicator” for the given cache key.
  • When we get results from an actual database, we have fresh results… but there is a small chance, there is a write transaction undergoing, that just finished and invalidated cache (deleted keys).
  • Cache backend writer must update cache only if data was not invalidated, if reservation indicator was not deleted, then no write transaction happened. We can safely update the cache.

This reservation is not done in cached_store.go, it is required behavior from the backend, see store/cache/redis/redis.go file. It uses SETXX when updating the cache, meaning we write only if data exists (reservation marker is present). This behavior is the second safeguard for a valid cache.

The remaining issue may potentially be with 2 reads and one writing transaction:

  • First read request faces, cache miss, makes reservation.
  • First read request gets old data from the database.
  • Transaction just concluded, overwriting old data, deleting reservation.
  • Second read also faces cache miss, and makes a reservation.
  • The second read gets new data from the database.
  • Second read updates cache with new data.
  • First request updates cache with old data, because key exists (redis only supports if key exists condition)!

This is a known scenario that can cause the issue, it however relies on the first read request being suspended for quite a long time, allowing for concluded transaction, and invalidation (which happens with extra delay after write), furthermore we have full flow of another read request. As of now, probability may be comparable to serial accidental lotto wins, so we still allow for the long-live cache. Cache update happens in the code just after getting results from the database, so first read flow must be suspended by the CPU scheduler for quite a very long and then starved a bit.

It may have been better if we find a Redis alternative, that can do proper Compare and Swap, cache update can only happen for reservation key, and this key must be unique across read requests. It means the first request will be only written if the cache contains the reservation key with the proper unique ID relevant to the first request. If it contains full data or the wrong ID, it means another read updates reservation. If some read has cache miss, but sees a reservation mark, then it must skip cache updating.

The cached store relies on the ResourceCacheImplementation interface, which is implemented by code generation, see any <service>/store/<version>/<resource> directory, there is a cache implementation in a dedicated file, generated based on cache annotations passed in a resource.

Using centralized cache (redis) we can support very long caches, lasting even days.

Resource Metadata

Each resource has a metadata object, as defined in https://github.com/cloudwan/goten/blob/main/types/meta.proto.

The following fields are managed by store modules:

  • create_time, update_time and delete_time. Two of these are updated by the Meta store plugin, delete is a bit special since we don’t have yet a soft delete function, we have asynchronous deletion and this is handled by the constraint store layer, not covered by this document.
  • resource_version is updated by Meta store plugin.
  • shards are updated by various store plugins, but can accept client sharding too (as long as they don’t clash).
  • syncing is provided by a store plugin, it will be described in multi-region, multi-service, multi-version design doc.
  • lifecycle is managed by a constraint layer, again, it will be described in multi-region, multi-service, multi-version design doc.

Users can manage exclusively: tags, labels, annotations, and owner_references, although the last one may be managed by services when creating lower-level resources for themselves.

Field services is often a mix: Each resource may often apply its own rules. Meta service populates this field itself, For IAM, it depends on kind: For example, Roles and RoleBindings detect their contents and decide what services own them and which can read them. When 3rd party service creates some resource in core SPEKTRA Edge, they must annotate their service. Some resources, like Device in devices.edgelq.com, its the client deciding which services can read it.

Field generation is almost dead, as well as uuid. We may however fix this at some point. Originally Meta was copied and pasted from Kubernetes and not all the fields were implemented.

Auxiliary search functionality

The store can provide Search functionality if this is configured. By default, FailedPrecondition will be returned if no search backend exists. As of now, the only backend we support is Algolia, but we may add Mongo as well in the future.

If you check the implementation of Search in store.go and cache/cached_store.go, it is pretty much like List, but allows additional search phrases.

Since the search database is however additional to the main one, there is some problem to resolve: Syncing from the main database to search. This is an asynchronous process, and the Search query after Save/Delete is not guaranteed to be accurate. Algolia says it may even be minutes in some cases. Plus, this synchronization must not be allowed within transactions, because there is a chance search backend can accept updates, but the primary database not.

The design decisions regarding search:

  • Updates to the search backend are happening asynchronously after the Store’s successful transaction.
  • Search backend needs separate cache keys (they are prefixed), to avoid mixing.
  • Updates to the search backend must be retried in case of failures because we cannot allow the search to stay out of sync for too long.
  • Because of potentially long search updates and, the asynchronous nature of them, we decided that search writes are NOT executed by Store components at all! The store does only search queries.
  • We dedicated a separate SearchUpdater interface (See store/search_updater.go file) for updating the Search backend. It is not a part of the Store!
  • The SearchUpdater module is used by db-controllers, which observe changes on the Store in real-time, and update the search backend accordingly, taking into account potential failures, writes must be retried.
  • Cache for search backend needs invalidation too. Therefore, there is a store/cache/search_updater.go file too, which wraps the inner SearchUpdater for the specific backend.
  • To summarize: Store (used by Server modules) makes Search queries, DbController using SearchUpdater makes writes and invalidates search cache.

Other store interface useful wrappers

To achieve a read-only database entirely, use the NewReadOnlyStore wrapper in with_read_only.go.

Normally, the store interface will reject even reads when no transaction was set (WithStoreHandleOpts was not used). This is to prevent people from using DB after forgetting to set transactions explicitly. It can be corrected by using the WithAutomaticReadOnlyTx wrapper in the auto_read_tx_store.go.

To also be able to write to a database without transaction set explicitly using WithStoreHandleOpts, it is possible to use WithAutomaticTx wrapper in auto_tx_store.go, but it is advised to consider other approaches first.

Db configuration and store handle construction

Store handle construction and database configuration are separated.

The store needs configuration because:

  • Collections may need pre-initialization.
  • Store indices may need configuration too.

Configuration tasks are configured by db-controller runtimes by convention. Typically, in main.go files we have something like:

senvstore.ConfigureStore(
    ctx,
    serverEnvCfg,
    v1Desc.GetVersion(),
    v1Desc,
	schemaclient.GetSchemaMixinDescriptor(),
    v1limmixinclient.GetLimitsMixinDescriptor(),
)
senvstore.ConfigureSearch(ctx, serverEnvCfg, v1Desc)

The store is configured after being given the main service descriptor, plus all the mixins, so they can configure additional collections. If a search feature is used, then it needs a separate configuration.

Configuration functions are in the edgelq/common/serverenv/store/configurator.go file, and they refer to further files in goten:

  • goten/runtime/store/db_configurator.go
  • goten/runtime/store/search_configurator.go

Configuration therefore happens at db-controller startup but in a separate manner.

Then, the store handler we construct in the server and db-controller runtimes. It is done by the builder from the edgelq repository, see the edgelq/common/serverenv/store/builder.go file. If you have seen any server initialization (I mean main.go) file, you can see how the store builder constructs “middlewares” (WithCacheLayer, WithConstraintLayer), and adds plugins executing various functions.