This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
Goten Organization
Understanding the Goten directory structure and libraries.
In the SPEKTRA Edge repository, we have directories for each service:
edgelq/
applications/
audit/
devices/
iam/
limits/
logging/
meta/
monitoring/
proxies/
secrets/
ztp/
All names of these services end with the .edgelq.com suffix, except meta.
The full name of the meta service is meta.goten.com. The reason is that
the core of this service is not in the SPEKTRA Edge repository, it is in the
Goten repository:
goten/
meta-service/
This is where meta’s api-skeleton is, Protocol buffers files, almost all
the code-generated modules, and server implementation. The reason why
we talk about meta service first is quite important, because it also teaches
the difference between SPEKTRA Edge and Goten.
Goten is called a framework for SPEKTRA Edge, but this framework has two main
tool sets:
-
Compiler
It takes the schema of your service, and generates all the boilerplate
code.
-
Runtime
Runtime libraries, which are referenced by generated code, are used
heavily throughout all services based on Goten.
Goten provides its schema language on top of Protocol Buffers: It introduces
the concept of Service packages (with versions), API groups, actions,
resources. Unlike raw Protocol Buffers, we have a full-blown schema with
references that can point across regions and services, and those services
can also vary in versions. Resources can reference each other, services can
import each other.
Goten balances between code-generation and runtime libraries operating on
“resources” or “methods”. It is usually the tradeoff between performance,
type safety, code size, maintainability, and the readability.
If you look at the meta-service, you will see that it has four resource
types:
- Region
- Service
- Resource
- Deployment
This is pretty much exactly what Goten provides. To be a robust framework,
to provide on its promises, Multi-region, multi-service, and multi-version,
Goten needs a concept of a service that contains information about regions,
services, etc.
SPEKTRA Edge provides various services on a higher level, but Goten provides
the baseline for them and allows relationships between them. The only
reason why the “meta” directory exists also in SPEKTRA Edge, is because
the meta service also needs extra SPEKTRA Edge integration like authorization
layer. In the SPEKTRA Edge repo, we have additional components added to meta,
and finally, we have meta main.go
files. If you look at the files meta
service has in the SPEKTRA Edge repo (for the v1 version, not v1alpha2), you
will understand that the edgelq version wraps up what goten provides. We
have also a “v1alpha2” service (full one), from times before we moved meta
to Goten. In those past times, SPEKTRA Edge was overriding half of
the functionality provided by Goten, and it was heading in a terrible
direction from there.
Goten Directory Structure
As a framework, goten provides:
- Modules related to service schema and prototyping (API skeleton and
proto files).
- Compilers that generate code based on schema
- Runtime libraries linked during the compilation
For schema & prototyping, we have directories:
-
schemas
This directory contains generated JSON schema for api-skeleton files.
It is generated based on file annotations/bootstrap.proto.
-
annotations
Protobuf is already a kind of language for building APIs, but Goten is
said to provide a higher level one. This directory contains various
proto options (extra decorations), that enhance standard protobuf
language. There is one exceptional file though: bootstrap.proto
,
which DOES NOT define any options, instead it describes
the api-skeleton schema in protobuf. The file in the schemas
directory
is just a compilation of this file. The annotations directory contains
generated Golang code describing those proto options. Normally ignore it.
-
types
Contains set of reusable protobuf messages that are used in services
using Goten, for example, “Meta” object (file types/meta.proto
) is
used in almost every resource type. The difference between annotations
and types is that, while annotations describe options we can attach
to files/proto messages/fields/enums etc., types contain just reusable
objects/enums. Apart from that, each proto file contains compiled
Golang objects in the relevant directory.
-
contrib/protobuf/google
This directory, as far as I understand, allows us to avoid downloading
full protobuf deps from Google, it just has bits we decided to take.
SubDirectory api
maps to our annotations
, and type
to types
.
There is a weird exception, because distribution.proto
matches more
type directories than api in this manner, but let it be. Perhaps it
can be deleted entirely, as I am not sure where we use it if at all.
However, I told you one small lie, SPEKTRA Edge Contributors DO HAVE to
download some protocol buffers (will be mentioned in the scripts
directory). The problem is, that this downloaded library is more
lightweight and does not contain types we put in contrib/protobuf/google
.
All the above directories can be considered a form of Goten-protobuf language that you should know from the developer guide.
For compilers (code-generators), we have directories:
-
compiler
Each subdirectory (well, almost) contains a specific compiler that
generates some set of files that Goten as a whole generates. For
example compiler/server
generates server middleware.
-
cmd
Goten does not come with any runtime on its own. This directory
provides main.go
files for all compilers (code-generators) Goten has.
Compilers generate code you should already know from the developer guide
as well.
Runtime libraries have just a single directory:
-
runtime
Contains various modules for clients, servers, controllers… Each will
be talked about separately in various topic-oriented documents.
-
Compiled types
types/meta/
, and types/multi_region_policy
may be considered part of
the runtime, they map to types objects. You may say, that while resource
proto schema imports goten/types/meta.proto
, generated code will refer
to Go package goten/types/meta/
.
In the developer guide, we had brief mentions of some base runtime types,
but we were treating them as black boxes, while in this document set, we
will dive in.
Other directories in Goten:
-
example
Contains some typical services developed on Goten, but without SPEKTRA Edge.
The current purpose of them is only to run some integration tests though.
-
prototests
Contains just some basic tests over base extended types by Goten, but
does not delve as deep as tests in the example
directory.
-
meta-service
It contains full service of meta without SPEKTRA Edge components and main
files. It is supposed to be wrapped by Goten users, SPEKTRA Edge in our case.
-
scripts
Contains one-of scripts for installing development tools, reusable
scripts for other scripts, or regeneration script that regenerates
files from the current goten directory (regenerate.sh
).
-
src
This directory name is the most confusing here. It does not contain
anything for the framework. It contains generated Java code of
annotations
and types
directories in Goten. It is generated for
the local pom.xml
file. This Java module is just an import dependency
for Goten, so Java code can use protobuf types defined by Goten. We have
some Java code in the SPEKTRA Edge repository, so for this purpose, in
Goten, we have a small Java package.
-
tools
Just some dummy imports to ensure they are present in go.mod/go.sum
files in goten.
-
webui
Some generic UI for Goten service, but note this lies abandoned, as
our front-end teams no longer develop generic UI, focusing on
specialized only.
Regarding files other than obvious:
-
pom.xml
This is for building a Java package containing Goten protobuf types.
-
sdk-config.yaml
This is used to generate the goten-sdk repository (public one), since
goten itself is private. Nobody wants to keep copying manually public
files from goten to goten-sdk, so we have this done for us.
-
tools.go
It just ensures we have deps in go.mod. Unsure why it is separated
from the tools directory.
SPEKTRA Edge Directory Structure
SPEKTRA Edge is a home repository for all core SPEKTRA Edge services, and
an adaptation of meta.goten.com
, meaning that its sub directories should
be familiar, and you should navigate their code well enough since they are
“typical” Goten-built services.
We have a common directory though, with some example elements
(more important):
-
api and rpc
Those directories contain extra protobuf reusable types. You will
most likely interact with api.ServiceAccount
(not to confuse with
the iam.edgelq.com/ServiceAccount
resource)!
-
cli_configv1, cli_configv2
The second directory is used by the cuttle CLI utility, and will be
needed for all cuttles for 3rd parties.
-
clientenv
Those contains obsolete config for client env, but its grpc dialers
and authclients (for user authentication) are still in use. Needs some
cleanup.
-
consts
It has a set of various common constants in SPEKTRA Edge.
-
doc
It wraps protoc-gen-goten-doc with additional functionality, to
display needed permissions for actions.
-
fixtrues_controller
It is the full fixtures controller module.
-
serverenv
It contains a common set for backend runtimes provided by SPEKTRA Edge
(typically server, but some elements are used by controllers too).
-
widecolumn
It contains a storage alternative to the Goten store, for some
advanced cases, we will have a different document design for this.
Other directories:
-
healthcheck
It contains a simple image that polls health checks of core SPEKTRA Edge
services.
-
mixins
It contains a set of mixins, they will be discussed via separate topics.
-
protoc-gen-npm-apis
It is a Typescript compiler for the frontend team, maintained by
the backend. You should read more about
compilers here
-
npm
It is where code generated by protoc-gen-npm-apis goes.
-
scripts
Set of common scripts, developers must learn to use primarily
regenerate-all-sh
whenever they change any api-skeleton or
proto file.
-
src
It contains some “soon legacy” Java-generated code for
the Monitoring Pipeline, which will get the separate documents.
1 - Goten Server Library
Understanding the Goten server library.
The server should more or less be already known from the developer guide.
We will provide some missing bits here only.
When we talk about servers, we can distinguish:
- gRPC Server instance that is listening on a TCP port.
- Server handler sets that implement some Service GRPC interface.
To underline what I mean, look at the following code snippet from IAM:
grpcServer := grpcserver.NewGrpcServer(
authenticator.AuthFunc(),
commonCfg.GetGrpcServer(),
log,
)
v1LimMixinServer := v1limmixinserver.NewLimitsMixinServer(
commonCfg,
limMixinStore,
authInfoProvider,
envRegistry,
policyStore,
)
v1alpha2LimMixinServer := v1alpha2limmixinserver.NewTransformedLimitsMixinServer(
v1LimMixinServer,
)
schemaServer := v1schemaserver.NewSchemaMixinServer(
commonCfg,
schemaStore,
v1Store,
policyStore,
authInfoProvider,
v1client.GetIAMDescriptor(),
)
v1alpha2MetaMixinServer := metamixinserver.NewMetaMixinTransformerServer(
schemaServer,
envRegistry,
)
v1Server := v1server.NewIAMServer(
ctx,
cfg,
v1Store,
authenticator,
authInfoProvider,
envRegistry,
policyStore,
)
v1alpha2Server := v1alpha2server.NewTransformedIAMServer(
cfg,
v1Server,
v1Store,
authInfoProvider,
)
v1alpha2server.RegisterServer(
grpcServer.GetHandle(),
v1alpha2Server,
)
v1server.RegisterServer(grpcServer.GetHandle(), v1Server)
metamixinserver.RegisterServer(
grpcServer.GetHandle(),
v1alpha2MetaMixinServer,
)
v1alpha2limmixinserver.RegisterServer(
grpcServer.GetHandle(),
v1alpha2LimMixinServer,
)
v1limmixinserver.RegisterServer(
grpcServer.GetHandle(),
v1LimMixinServer,
)
v1schemaserver.RegisterServer(
grpcServer.GetHandle(),
schemaServer,
)
v1alpha2diagserver.RegisterServer(
grpcServer.GetHandle(),
v1alpha2diagserver.NewDiagnosticsMixinServer(),
)
v1diagserver.RegisterServer(
grpcServer.GetHandle(),
v1diagserver.NewDiagnosticsMixinServer(),
)
There, an instance called grpcServer
is an actual GRPC Server instance
listening on a TCP port. If you dive into this implementation, you should
notice we are constructing an EdgelqGrpcServer
structure. It may consist
of actually two port listening instances:
googleGrpcServer *grpc.Server
, which is initialized with a set of
unary and stream interceptors, optional TLS.
websocketHTTPServer *http.Server
, which is initialized only if
the websocket port was set. It delegates handling to
improbableGrpcwebServer, which uses googleGrpcServer.
This Google server is the primary one and handles regular gRPC calls.
The reason for the additional HTTP server is that we need to support
web browsers, which cannot support native gRPC protocol. Instead:
- grpcweb is needed to handle unary and server-streaming calls.
- websockets are needed for bidirectional streaming calls.
Additionally, we have REST API support…
We have this envoy proxy sidecar, a separate container running next to
the server instance. It handles all REST API, converting to native gRPC.
It converts grpcweb into native grpc too, but has issues with websockets.
For this reason, we added a Golang HTTP server with an improbable gRPC web
instance. This improbable grpc web instance can handle both grpcweb and
websockets, but we use it for websockets only, since it is missing from
envoy proxy.
In theory, an improbable web server would be able to handle ALL protocols,
but there is a drawback: For native gRPC calls will be less performant than
the native grpc server (and ServeHTTP is less maintained). It is recommended
to keep them separate, so we will stick with 2 ports. We may have some
opportunity to remove the envoy proxy though.
Returning to the googleGrpcServer instance, we have all stream/unary
interceptors that are common for all calls, but this does not implement
the actual interface we expect from gRPC servers. Each service version
provides a complete interface to implement. For example, see the IAMServer
interface in this file:
https://github.com/cloudwan/edgelq/blob/main/iam/server/v1/iam/iam.pb.grpc.go.
Those server interfaces are in files ending with pb.grpc.go
.
To have a full server, we need to combine the GRPC Server instance for
SPEKTRA Edge (EdgelqGrpcServer
), with, let’s make up some name for it:
A business logic server instance (set of handlers). In this iam.pb.grpc.go
file this business logic instance is iamServer
. Going back to the main.go
snippet that is provided way above, we are registering eight business
logic servers (handler sets) on the provided *grpc.Server
instance.
As long as paths are unique across all, it is fine to register as many as
we can. Typically, we must include primary service for all versions, then
all mixins in all versions.
Those business logic servers provide code-generated middleware, typically
executed in this order:
- Multi-region routing middleware (may redirect processing somewhere else,
or split across many regions).
- Authorization middleware (may use a local cache, or send a request to
IAM to obtain fresh role bindings).
- Transaction middleware (configures access to the database, for snapshot
transactions and establishes new session).
- Outer middleware, which provides validation, and common outer operations
for certain CRUD requests. For example, for update calls, it will ensure
the resource exists and apply an update mask to achieve the final resource
to save.
- Optional custom middleware and server code - which are responsible for
final execution.
Transaction middleware also may repeat execution of all internal middleware
- core server, if the transaction needs to be repeated.
There are also “initial handlers” in generated pb.grpc.go
files. For
example, see this file:
https://github.com/cloudwan/edgelq/blob/main/iam/server/v1/group/group_service.pb.grpc.go.
For example, you can see _GroupService_GetGroup_Handler
as example for
unary, and _GroupService_WatchGroup_Handler
as an example for streaming
calls.
It is worth mentioning how interceptors play with middleware and these
“initial handlers”. Let’s copy and paste interceptors from the current
edgelq/common/serverenv/grpc/server.go
file:
grpc.StreamInterceptor(grpc_middleware.ChainStreamServer(
grpc_ctxtags.StreamServerInterceptor(),
grpc_logrus.StreamServerInterceptor(
log,
grpc_logrus.WithLevels(codeToLevel),
),
grpc_recovery.StreamServerInterceptor(
grpc_recovery.WithRecoveryHandlerContext(recoveryHandler),
),
RespHeadersStreamServerInterceptor(),
grpc_auth.StreamServerInterceptor(authFunc),
PayloadStreamServerInterceptor(log, PayloadLoggingDecider),
grpc_validator.StreamServerInterceptor(),
)),
grpc.UnaryInterceptor(grpc_middleware.ChainUnaryServer(
grpc_ctxtags.UnaryServerInterceptor(),
grpc_logrus.UnaryServerInterceptor(
log,
grpc_logrus.WithLevels(codeToLevel),
),
grpc_recovery.UnaryServerInterceptor(
grpc_recovery.WithRecoveryHandlerContext(recoveryHandler),
),
RespHeadersUnaryServerInterceptor(),
grpc_auth.UnaryServerInterceptor(authFunc),
PayloadUnaryServerInterceptor(log, PayloadLoggingDecider),
grpc_validator.UnaryServerInterceptor(),
)),
Unary requests are executed in the following way:
- Function
_GroupService_GetGroup_Handler
is called first! It calls
the first interceptor but before that, it creates a handler that
wraps the first middleware and passes to the interceptor chain.
- The first interceptor is:
grpc_ctxtags.UnaryServerInterceptor()
.
It calls the handler passed, which is the next interceptor.
- The next interceptor is
grpc_logrus.UnaryServerInterceptor
and so on.
At some point, we are calling the interceptor executing authentication.
- The last interceptor (
grpc_validator.UnaryServerInterceptor()
) calls
finally handler created by GroupService_GetGroup_Handler
.
- First middleware is called. The call is executed through the middleware
chain, and may reach the core server, but may return earlier.
- Interceptors are unwrapping in reverse order.
It is visible how this is called from the ChainUnaryServer
implementation
if you look.
Streaming calls are a bit different because we start from the interceptors
themselves:
- gRPC Server instance takes function
_GroupService_WatchGroup_Handler
and casts into grpc.StreamHandler
type.
- Object
grpc.StreamHandler
, which is a handler for our method, is passed
to the interceptor chain. During the chaining process, grpc.StreamHandler
is wrapped with all streaming interceptors, starting from the last.
Therefore, the most internal StreamHandler will be
_GroupService_WatchGroup_Handler
.
grpc_ctxtags.StreamServerInterceptor()
is the entry point! It then
invokes the next interceptors, and we go further and further, till we
reach _GroupService_WatchGroup_Handler
, which is called by the last
stream interceptor, grpc_validator.StreamServerInterceptor()
.
- Middlewares are executed in the same way as always.
See the ChainStreamServer
implementation if you don’t believe it.
In total, this should give an idea of how the server works and what are
the layers.
2 - Goten Controller Library
Understanding the Goten controller library.
You should know about controller design from the
developer guide.
Here we give a small recap of the controller with tips about code paths.
The controller framework is part of the wider Goten framework. It has
annotations + compiler parts, in:
You can read more about Goten compiler.
For now, in this place, we will talk just about generated controllers.
There are some runtime elements for all controller components (NodeManager,
Node, Processor, Syncer…) in runtime/controller
direction in Goten
repo: https://github.com/cloudwan/goten/tree/main/runtime/controller.
In the config.proto
, we have node registry access config and nodes manager
configs, which you should already know from controller/db-controller config
proto files.
A bit more interesting thing we have with Node managers. As it was said in
the Developer Guide, we scale horizontally by adding more nodes. To have
more nodes in a single pod, which increases the chance of fairer workload
distribution, we often have more than 1 Node instance per type. We organize
them with Node Managers. You should see a directory
runtime/controller/node_management/manager.go
.
Each Node must implement:
type Node interface {
Run(ctx context.Context) error
UpdateShardRange(ctx context.Context, newRange ShardRange)
}
Node Manager component creates on the startup as many Nodes as it has
in the config. Next, it runs all of them, but they don’t get yet any share
of shards. Therefore, they are idle. Managers register all nodes in
the registry, where all node IDs across all pods are collected.
The registry is responsible for returning the shard range assigned for
each node. Whenever a pod dies or a new one is deployed, the Node
registry will notify the manager about new shard ranges per Node. It then
notifies the relevant Node via the UpdateShardRange
call.
Registry for Redis uses periodic polling, therefore there may be a chance
two controllers executing the same work in theory for a couple of seconds.
It probably will be better to improve, but we design controllers around
the observed/desired state, and duplicating the same request may bring some
temporary warning errors, but they should be harmless. Still, it’s a field
for improvement.
See the NodeRegistry
component (in file registry.go
, we use Redis).
Apart from the node managers directory in runtime/controller
, you can see
the processor package. We have there from more notable elements:
- Runner module, which is processor runner goroutine. It is the component
for executing all events in a thread-safe manner, but developers must not
do any IO.
- Syncer module, which is generic and based on interfaces, although we
generate type-safe wrappers in all controllers. It is quite large, it
consists of Desired/Observed state objects (file
syncer_states.go
),
an updater that operates on its own goroutine (file syncer_updater.go
),
and finally central Syncer object, defined in syncer.go
. It compares
the desired vs observed state and pushes updates to the syncer updater.
- In
synchronizable
we have structures responsible for propagating
sync/lostSync events across Processor modules, so ideally developers
don’t need to handle them themselves.
Syncer is fairly complex, it needs to handle failures/recoveries, resets,
and bursts of updates. Note that it does not use Go channels because:
- They have limited capacity (defined). This is not nice considering we
have IO works there.
- Maps are best if there are multiple updates to a single resource
because they will allow to merging of multiple events (overwrite
previous ones). Channels would force at least to consume all items
from the queue.
3 - Goten Data Store Library
Understanding the Goten data store library.
The developer guide gives some examples of simple interaction with
the Store interface, but hides all implementation details, which we
will cover here now, at least partially.
The store should provide:
- Read and write access to resources according to the resource Access
interface. Transactions, which will guarantee resources that have been
read from the database (or query collections) will not change before
the transaction is committed. This is provided by the core store module,
described in this doc.
- Transparent cache layer, reducing pressure on the database, managed
by “cache” middleware, described in this doc.
- Transparent constraint layer handling references to other resources,
and handling blocking references. This is a more complex topic, and
we will discuss this in different documents (multi-region, multi-service,
multi-version design).
- Automatic resource sharding by various criteria, managed by store
plugins, covered in this doc.
- Automatic resource metadata updates (generation, update time…),
managed by store plugins, covered in this doc.
- Observability is provided automatically (we will come back to it
in the Observability document).
The above list should however at least give an idea, that interface calls
may be often complex and require interactions with various components using
IO operations! In general, a call to the Store interface may involve:
- Calling underlying database (mongo, firestore…), for transactions
(write set), non-cached reads…
- Calling cache layer (redis), for reads or invalidation purposes.
- Calling other services or regions in case of references to resources
to other services and regions. This will be not covered by this document
but in this multi-multi-multi thing.
Store implementation resides in Goten, here:
https://github.com/cloudwan/goten/tree/main/runtime/store.
The primary file is store.go
, with the following interfaces:
Store
is the public store interface for developers.
Backend
and TxSession
are to be implemented by specific backend
implementations like Firestore and Mongo. They are not exposed to
end-service developers.
SearchBackend
is like Backend
, for just for search, which is often
provided separately. Example: Algolia, but in the future we may introduce
Mongo combining both search and regular backend implementation.
The store is also actually a “middleware” chain like a server. In the file
store.go
we have store struct type, which wraps the backend and
provides the first core implementation of the Store interface. This wrapper
does:
- Add tracing spans for all operations
- For transactions, store an observability tracker in the current
ctx object.
- Invokes all relevant store plugin functions, so custom code can be
injected apart from “middlewares”.
- Accumulates resources to save/delete, does not trigger updates
immediately. They are executed at the end of the transaction.
You can consider it equivalent to a server core module (in the middleware
chain).
To study the store, you should at least check the implementation of
WithStoreHandleOpts
.
- You can see that plugins are notified about new and finished
transactions.
- Function runCore is a RETRY-ABLE function that may be invoked again
for the aborted transaction. However, this can happen only for SNAPSHOT
transactions. This also implies that all logic within a transaction must
be repeatable.
- runCore executes a function passed to the transaction. In terms of
server middleware chains, it means we are executing outer + custom
middleware (if present) and/or core server.
- Store plugins are notified when a transaction is attempted (perhaps again),
and get a chance to inject logic just before committing. They also have
a chance to cancel the entire operation.
- You should also note, that Store Save/Delete implementations do not add
any changes to the backend. Instead, creations, updates, and deletions
are accumulated and passed in batch commit inside
WithStoreHandleOpts
.
Notable things for Save/Delete implementations:
- They don’t do any changes yet, they are just added to the change set
to be applied (inside WithStoreHandleOpts).
- For Save, we extract current resources from the database and this is
how we detect whether it is an update or creation.
- For Delete, we also get the current object state, so we know the full
resource body we are about to delete.
- Store plugins get a chance to see created/updated/deleted resource
bodies. For updates, we can see before/after.
To see a plugin interface, check the plugin.go
file. Some simple store
plugins you could check, are those in the directory store_plugins
:
- metaStorePlugin in
meta.go
must be always the first store plugin
inserted. It ensures the metadata object is initialized and tracks
the last update.
- You should also see a sharding plugins (
by_name_sharding.go
and
by_service_id_sharding.go
),
Multi-region plugins and design will be discussed in another document.
Store Cache middleware
The core store module, as described in store.go
, is wrapped with cache
“middleware”, see subdirectory cache
, file cached_store.go
, which
implements the Store interface and wraps the lower level:
- WithStoreHandleOpts decorates function passed to it, to include cache
session restart, in case we have writes that invalidate the cache.
After WithStoreHandleOpts finishes (inner), we need to push invalidated
objects to the worker. It will either invalidate or mark itself as bad
if invalidation fails.
- All read requests (Get, BatchGet, Query, Search) first try to get data
from the cache and pass it to the inner in case of failure, cache miss,
or not cache-able.
- Struct
cachedStore
implements not only the Store interface but the
store plugin as well. In the constructor NewCachedStore
you should
see it adds itself as a plugin. The reason is that cachedStore is
interested in creating/updated (pre + post) and deleted resource
bodies. Save provides only the current resource body, and Delete
provides only the name to delete. To utilize the fact that the core
store already extracts the “previous” resource state, we implement
cachedStore
as a plugin.
Note that watches are non-cacheable. The cached store also needs a separate
backend, we support as of now Redis implementation only.
The reason why we invalidate references/query groups after the transaction
concludes (WithStoreHandleOpts
), is because we want new changes to be
already in the database. If we invalidate after writes, then when the new
cache is refreshed, it will be for data after the transaction. This is one
safeguard, but not sufficient yet.
The cache is written to during non-transaction reads (gets or queries). If
results were not in the cache, we fall back to the internal store, using
the main database. With results obtained, we are saving them in cache, but
this is a bit less simple:
- When we first try to READ from cache but face cache MISS, then we are
writing “reservation indicator” for the given cache key.
- When we get results from an actual database, we have fresh results…
but there is a small chance, there is a write transaction undergoing,
that just finished and invalidated cache (deleted keys).
- Cache backend writer must update cache only if data was not invalidated,
if reservation indicator was not deleted, then no write transaction
happened. We can safely update the cache.
This reservation is not done in cached_store.go
, it is required behavior
from the backend, see store/cache/redis/redis.go
file. It uses SETXX when
updating the cache, meaning we write only if data exists (reservation marker
is present). This behavior is the second safeguard for a valid cache.
The remaining issue may potentially be with 2 reads and one writing
transaction:
- First read request faces, cache miss, makes reservation.
- First read request gets old data from the database.
- Transaction just concluded, overwriting old data, deleting reservation.
- Second read also faces cache miss, and makes a reservation.
- The second read gets new data from the database.
- Second read updates cache with new data.
- First request updates cache with old data, because key exists
(redis only supports if key exists condition)!
This is a known scenario that can cause the issue, it however relies on
the first read request being suspended for quite a long time, allowing
for concluded transaction, and invalidation (which happens with extra
delay after write), furthermore we have full flow of another read
request. As of now, probability may be comparable to serial accidental
lotto wins, so we still allow for the long-live cache. Cache update
happens in the code just after getting results from the database, so
first read flow must be suspended by the CPU scheduler for quite
a very long and then starved a bit.
It may have been better if we find a Redis alternative, that can do
proper Compare and Swap, cache update can only happen for reservation
key, and this key must be unique across read requests. It means the
first request will be only written if the cache contains the reservation
key with the proper unique ID relevant to the first request. If it
contains full data or the wrong ID, it means another read updates
reservation. If some read has cache miss, but sees a reservation mark,
then it must skip cache updating.
The cached store relies on the ResourceCacheImplementation
interface,
which is implemented by code generation, see any
<service>/store/<version>/<resource>
directory, there is a cache
implementation in a dedicated file, generated based on cache annotations
passed in a resource.
Using centralized cache (redis) we can support very long caches, lasting
even days.
Each resource has a metadata object, as defined in
https://github.com/cloudwan/goten/blob/main/types/meta.proto.
The following fields are managed by store modules:
create_time
, update_time
and delete_time
. Two of these are
updated by the Meta store plugin, delete is a bit special since
we don’t have yet a soft delete function, we have asynchronous
deletion and this is handled by the constraint store layer,
not covered by this document.
resource_version
is updated by Meta store plugin.
shards
are updated by various store plugins, but can accept
client sharding too (as long as they don’t clash).
syncing
is provided by a store plugin, it will be described
in multi-region, multi-service, multi-version design doc.
lifecycle
is managed by a constraint layer, again, it will be
described in multi-region, multi-service, multi-version design doc.
Users can manage exclusively: tags
, labels
, annotations
, and
owner_references
, although the last one may be managed by services
when creating lower-level resources for themselves.
Field services
is often a mix: Each resource may often apply its
own rules. Meta service populates this field itself, For IAM, it
depends on kind: For example, Roles and RoleBindings detect their
contents and decide what services own them and which can read them.
When 3rd party service creates some resource in core SPEKTRA Edge, they
must annotate their service. Some resources, like Device in
devices.edgelq.com, its the client deciding which services
can read it.
Field generation
is almost dead, as well as uuid
. We may however
fix this at some point. Originally Meta was copied and pasted from
Kubernetes and not all the fields were implemented.
Auxiliary search functionality
The store can provide Search functionality if this is configured. By
default, FailedPrecondition will be returned if no search backend exists.
As of now, the only backend we support is Algolia, but we may add Mongo
as well in the future.
If you check the implementation of Search
in store.go
and
cache/cached_store.go
, it is pretty much like List, but allows additional
search phrases.
Since the search database is however additional to the main one, there
is some problem to resolve: Syncing from the main database to search.
This is an asynchronous process, and the Search query after Save/Delete
is not guaranteed to be accurate. Algolia says it may even be minutes
in some cases. Plus, this synchronization must not be allowed within
transactions, because there is a chance search backend can accept updates,
but the primary database not.
The design decisions regarding search:
- Updates to the search backend are happening asynchronously after
the Store’s successful transaction.
- Search backend needs separate cache keys (they are prefixed), to
avoid mixing.
- Updates to the search backend must be retried in case of failures
because we cannot allow the search to stay out of sync for too long.
- Because of potentially long search updates and, the asynchronous nature
of them, we decided that search writes are NOT executed by Store
components at all! The store does only search queries.
- We dedicated a separate
SearchUpdater
interface (See
store/search_updater.go
file) for updating the Search backend.
It is not a part of the Store!
- The
SearchUpdater
module is used by db-controllers, which observe
changes on the Store in real-time, and update the search backend
accordingly, taking into account potential failures, writes must
be retried.
- Cache for search backend needs invalidation too. Therefore, there
is a
store/cache/search_updater.go
file too, which wraps the inner
SearchUpdater
for the specific backend.
- To summarize: Store (used by Server modules) makes Search queries,
DbController using SearchUpdater makes writes and invalidates search
cache.
Other store interface useful wrappers
To achieve a read-only database entirely, use the NewReadOnlyStore
wrapper in with_read_only.go
.
Normally, the store interface will reject even reads when no transaction
was set (WithStoreHandleOpts was not used). This is to prevent people from
using DB after forgetting to set transactions explicitly. It can be
corrected by using the WithAutomaticReadOnlyTx
wrapper in the
auto_read_tx_store.go
.
To also be able to write to a database without transaction set explicitly
using WithStoreHandleOpts, it is possible to use WithAutomaticTx
wrapper
in auto_tx_store.go
, but it is advised to consider other approaches first.
Db configuration and store handle construction
Store handle construction and database configuration are separated.
The store needs configuration because:
- Collections may need pre-initialization.
- Store indices may need configuration too.
Configuration tasks are configured by db-controller runtimes by convention.
Typically, in main.go
files we have something like:
senvstore.ConfigureStore(
ctx,
serverEnvCfg,
v1Desc.GetVersion(),
v1Desc,
schemaclient.GetSchemaMixinDescriptor(),
v1limmixinclient.GetLimitsMixinDescriptor(),
)
senvstore.ConfigureSearch(ctx, serverEnvCfg, v1Desc)
The store is configured after being given the main service descriptor,
plus all the mixins, so they can configure additional collections.
If a search feature is used, then it needs a separate configuration.
Configuration functions are in the
edgelq/common/serverenv/store/configurator.go
file, and they refer
to further files in goten:
goten/runtime/store/db_configurator.go
goten/runtime/store/search_configurator.go
Configuration therefore happens at db-controller startup but in
a separate manner.
Then, the store handler we construct in the server and db-controller
runtimes. It is done by the builder from the edgelq repository, see
the edgelq/common/serverenv/store/builder.go
file. If you have seen
any server initialization (I mean main.go
) file, you can see how
the store builder constructs “middlewares” (WithCacheLayer,
WithConstraintLayer), and adds plugins executing various functions.