1 - Schema Mixin
Understanding the schema mixin implementation.
Mixins are special kinds of services, that are supposed to be
mixed/blended with proper services. Like any service, they have
api-skeleton, protobuf files, resources, and server handlers.
What they don’t get, is independent deployment. They don’t exist
in the Meta Service registry. Instead, their resources and API groups
are mixed with proper resources.
Moreover, for schema mixins, we are not validating references to
other resources, they are excluded from this mechanism, and it’s
up to the developer to keep them valid.
The Goten repository provides schema mixin, under runtime/schema-mixin
.
If you look at this mixin service, you will see that it has ResourceShadow
resource. By mixing the schema mixin with let’s say Meta service, which
formally has four resource types, four API groups, we have the following
total Meta service with:
- Resources: Region, Service, Deployment, Resource, ResourceShadow
- API Groups: Region, Service, Deployment, Resource, ResourceShadow
(CRUD plus custom actions).
If you inspect the Meta service database, you will have five collections
(unless there are more mixins).
See api-skeleton:
https://github.com/cloudwan/goten/blob/main/runtime/schema-mixin/proto/api-skeleton-v1.yaml.
By requiring that ALL services attach to themselves schema-mixin, we can
guarantee, that all services can access each other via schema-mixin. This
is one of the key ingredients of Goten’s protocol. Some common service is
always needed, because, to enable circular communication between two services,
which can’t possibly know each other schemas, they need some kind of common
protocol.
Take a look at the resource_shadow.proto
file. Just a note: You can
ignore target_delete_behavior
, they are more for informative purposes.
But for mixins, Goten does not provide schema management. ResourceShadow
is a very special kind of resource, and it exists for every other resource
in a deployment (except other mixins). What I mean, let’s take a look at
the list of resources that may exist in the Deployment of Meta service
in region us-west2, like:
regions/us-west2
(Kind: meta.goten.com/Region
)
services/meta.goten.com
(Kind: meta.goten.com/Service
)
services/meta.goten.com/resources/Region
(Kind: meta.goten.com/Resource
)
services/meta.goten.com/resources/Deployment
(Kind: meta.goten.com/Resource
)
services/meta.goten.com/resources/Service
(Kind: meta.goten.com/Resource
)
services/meta.goten.com/resources/Resource
(Kind: meta.goten.com/Resource
)
services/meta.goten.com/deployments/us-west2
(Kind: meta.goten.com/Deployment
)
If those resources exist in the database for meta.goten.com in
us-west2, then collection ResourceShadow will have the following
resources:
resourceShadows/regions/us-west2
resourceShadows/services/meta.goten.com
resourceShadows/services/meta.goten.com/resources/Region
resourceShadows/services/meta.goten.com/resources/Deployment
resourceShadows/services/meta.goten.com/resources/Service
resourceShadows/services/meta.goten.com/resources/Resource
resourceShadows/services/meta.goten.com/deployments/us-west2
Basically it’s a one-to-one mapping, with the following exceptions:
- if there are other mixin resources, they don’t get ResourceShadows.
- synced read-only copies from other regions do not get ResourceShadows.
For example, resource
regions/us-west2
will exist in region us-west2,
and resourceShadows/regions/us-west2
will also exist in us-west2.
But, if regions/us-west2
is copied to other regions, like eastus2,
then resourceShadows/regions/us-west2
WILL NOT exist in eastus2.
This makes Resource shadows rather “closed” within their Deployment.
ResourceShadow instances are created/updated along a resource they
represent, during each transaction. It ensures that they are always
in sync with a resource. They contain all references to other resources
and contain all back reference source deployments. The reason we have back
reference deployments, not an exact list, is that the full list would
have been massive, imagine a Project instance and 10000 Devices pointing
to it. Instead, if let’s say those devices are spread across four regions,
ResourceShadow for Project will have 4 back reference sources, more
manageable.
Now, with ResourceShadows, we can provide some abstraction needed to
facilitate communication between services. However, note that we don’t
use standard CRUD at all (for shadows). They were in the past, but
the problem with CRUD is that they don’t contain the “API Version” field.
For example, we have the secrets.edgelq.com service in versions
v1alpha2 and v1. In the older version, we have a Secret resource
with the name pattern projects/{project}/secrets/{secret}
. Now,
with v1 upgrade, name pattern changed to
projects/{project}/regions/{region}/secrets/{secret}
. Note that
this means, that the ResourceShadow name changes too!
Suppose there are services S1 and S2. S1 imports secrets in v1alpha2,
and S2 imports secrets in v1. Suppose both S1 and S2 want to create
resources concerning some Secret instance. In this case, they would
try to use schema-mixin API, and they would give conflicting resource
shadow names, but this conflict arises from a different version, not
because of a bug. S1 would try to establish a reference to shadow for
projects/{project}/secrets/{secret}
, and S2 would use the version with
region.
This problem repeats for the whole CRUD for ResourceShadow, so we don’t
use it. Instead, we developed a bunch of custom actions you can see in
the api-skeleton of schema-mixin like EstablishReferences, ConfirmBlockades,
etc. All those requests contain a version field, and the API Server can
use versioning transformers to convert between names between versions.
Now, coming back to custom actions for ResourceShadows, see API-skeleton
along, recommended to see protobuf with request objects!
We had a flow on how references are established, when API Servers handle
writing requsts, this is where schema mixin API is in use.
EstablishReferences
is used by Store modules in API Servers, when they
save resources with cross-region/service references. This is called the
DURING transaction of Store in API Server. It ensures that referenced
resources will not be deleted for the next few minutes. It creates tentative
blockades in ResourceShadow instances on the other side. You may check
the implementation in the goten repo, file
runtime/schema-mixin/server/v1/resource_shadow/resource_shadow_service.go
.
When the transaction concludes, then Deployment asynchronously will send
ConfirmBlockades
to remove the tentative blockade from referenced
ResourceShadow in the target Service. It will leave with a back reference
source though!
For deletion requests, the API Server must call CheckIfResourceIsBlocked
before proceeding with resource deletion. It must also block deletion if
there are tentative blockades in ResourceShadow.
We also described Meta owner flows with three cases.
When Meta Ownee Deployment tries to confirm the meta owner, it must use
the ConfirmMetaOwner
call to a Meta Owner Deployment instance. If all is
fine, then we will get a successful response. If there is a version
mismatch, Meta Ownee Deployment will send UpgradeMetaOwnerVersion
request
to itself (its API Server), so the meta owner reference is finally in the
desired state. If ConfirmMetaOwner
discovers the Meta Owner does not
confirm ownership, then Meta Ownee Deployment should use the
RemoveMetaOwnerReference
call.
When it is Meta Owner Deployment that needs to initiate actions
(cases two and three), it needs to use ListMetaOwnees
to get
meta ownees. When relevant, it will need to call UpgradeMetaOwnerVersion
or RemoveMetaOwnerReference
, depending on the context of why
we are iterating meta ownees.
When we described asynchronous deletions handling, the
most important schema-mixin API action is WatchImportedServiceDeletions
.
This is a real-time watch subscription with versioning support. For
example, if we have Services S1 and S2 importing secrets.edgelq.com
in versions v1alpha2 and v1, then if some Secret is deleted (with name
pattern containing region in v1 only), separate
WatchImportedServiceDeletionsResponse
is sent to S1 and S2 Deployments,
containing shadow ID of secret in version Service desires.
When it comes to the deletion flow, we also use CheckIfHasMetaOwnee
,
and CheckIfResourceHasDeletionSubscriber
. These methods are used when
waiting for back-references to be deleted generally.
Since the schema-mixin Server is mixed with proper service, it means
we can also access original resources from the Store interface! In total,
Schema-mixin is a powerful utility for Goten as protocol cases.
We still need CRUD in ResourceShadows, because:
- Update, Delete, and Watch functions are used within Deployment
itself (where we know all runtimes use the same version).
- debugging purposes. Developers can use read requests when some bug
needs investigation.
3 - Constraint Store
Understanding the constraint store.
As it was said, Store is a series of its middlewares like Server, but
the base document in the Contributor guide only has shown core and
cache layers. An additional layer is Constraints, you can see it in
the Goten repo, runtime/store/constraints/constraint_store.go
.
It focuses mostly on decorating Save/Delete methods. When Saving,
it grabs the current ResourceShadow instance for the saved resource.
Then it ensures references are up-to-date. Note that it calls
the processUpdate
function, which repopulates shadow instances.
For each new reference, that was not before, it will need to
connect with the relevant Deployment and confirm the relationship.
All new references are grouped into Service & Region buckets. For
each foreign Service or Region, it will need to send an
EstablishReferences
call. It will need to consider versioning
too, because shadow names may change.
Note that we have a “Lifecycle” object, where we store any flags
indicating if asynchronous tasks are pending on the resource. State
PENDING shows that there are some asynchronous tasks to execute.
Method EstablishReferences
is not called for local references.
Instead, at the end of transactions, preCommitExec
is called
to connect with local resources in a single transaction. This is
the most optimal, and the only option possible. Imagine that in
a single transaction we create resources A and B, where A has
reference to B. If we used EstablishReferences
, then it would
fail because B does not exist yet. By skipping this call for
local resources, we are fixing this problem.
When deleting, the Constraint store layer uses processDeletion
,
where we need to check if the resource is not blocked. We also may
need to iterate over other back reference sources (foreign Deployments).
When we do it, we must verify versioning, because other Deployments
may use a lower version of our API, resulting in different resource
shadow names.
For deletion, we also may trigger synchronous cascade deletions
(or unsets).
Also, note that there is something additional about deletions, they
may delete an actual resource instance (unless we have a case like
async deletion annotation), but they won’t delete the ResourceShadow
instance. Instead, they will set deletion time and put Lifecycle into
a DELETING state. This is a special signal that will be distributed
to all Deployments that have resources with references pointing at
deleted resources. This is how they will be executing any cascade
deletions (or unsets). Only when back-references are cleared
This is the last layer in Store objects, along with cache and core,
now you should see in full how the actually Store works, and what
it does, what it interacts with (actual database, local cache,
AND other Deployments). Using Schema mixin API, it achieves
a “global” database across services, regions, and versions.
4 - Database Constraint Controller
Understanding the database constraint controller.
Each db-controller instance consists mainly of two Node managers
modules: One is the DbConstraint Controller. It’s tasks include
execution of all asynchronous tasks related to the local database
(Deployment). There are 3 groups of tasks:
- Handling of owned (by Deployment) resources in PENDING state (Lifecycle)
- Handling of owned (by Deployment) resources in DELETING state (Lifecycle)
- Handling of all subscribed (from current and each foreign Deployment)
resources in the DELETING state (Lifecycle)
The module is found in the Goten repository, module
runtime/db_constraint_ctrl
. As with any other controller, it uses
a Node Manager instance. This Node Manager, apart from running Nodes,
must also keep a map of interested deployments! What does it mean:
we know that iam.edgelq.com imports meta.goten.com. Suppose
we have regions us-west2
and eastus
. In that case, Deployment
of iam.edgelq.com in the us-west2
region will need to remember
four Deployment instances:
- meta.goten.com in
us-west2
- meta.goten.com in
eastus2
- iam.edgelq.com in
us-west2
- iam.edgelq.com in
eastus2
This map is useful for 3rd task group: handling of subscribed resources
in the deleting state. As IAM imports meta and no other service, and also
because IAM resources can reference each other, we can deduce the following:
resources of iam.edgelq.com in region us-west2
can only reference
resources from meta.goten.com and iam.edgelq.com, and only from
regions us-west2
and eastus2
. If we need to handle the cascade deletions
(or unsets), then we need to watch these deployments. See file
node_manager.go
in db_constraint_ctrl
, we are utilizing EnvRegistry
to get dynamic updates about interesting Deployments. In the function
createAndRunInnerMgr
we use the ServiceDescriptor instance to get
information about Services we import, this is how we know which
deployments we need to watch.
As you can see, we utilize EnvRegistry to initiate DbConstraintCtrl
correctly in the first place, and then we maintain it. We also handle
version switches. If this happens, we stop the current inner node manager
and deploy a new one.
When we watch other deployments, we are interested only in schema
references, not meta. Meta references are more difficult to predict
because services don’t need to import each other. For this reason,
responsibility for managing meta owner references is split between
Deployments on both sides: Meta Owner and Meta Ownee, as described
by the flows.
The most important files in runtime/db_constraint_ctrl/node
directory
are:
- owned_deleting_handler.go
- owned_pending_handler.go
- subscribed_deleting_handler.go
Those files are handling all asynchronous tasks as described by many
of the flows, regarding the establishment of references to other
resources (confirming/removing expired tentative blockades), meta
owner references management, cascade deletions, or unsets. I was trying
to document the steps they do and why, so refer to the code for more
information.
For other notable elements in this module:
- For subscribed deleting resource shadows, we have wrapped watcher,
which uses a different method than standard WatchResourceShadows.
The reason is, that other Deployments may vary between API versions
they support. We use the dedicated method by schema mixin API,
WatchImportedServiceDeletions
.
- Subscribed deleting resource shadow events are sent to a common
channel (in
controller_node.go
) file, but they are still grouped
per Deployment (along with tasks).
Note that this module is also responsible for upgrading meta owner
references after Deployment upgrades its current version field! This is
an asynchronous process, and is executed by owned_pending_handler.go
,
function executeCheckMetaOwnees
.
5 - Database Syncer Controller
Understanding the database syncer controller.
Another db-controller big module is DbSyncer Controller. In the
Goten repository, see the runtime/db_syncing_ctrl
module. It is
responsible for:
- Maintaining the
syncing.metadata
field when corresponding
MultiRegionPolicy changes.
- Syncing resources from other Deployments in the same Service
for the current local database (read copies).
- Syncing resources from other Deployments and current Deployment
for Search storage.
- Database upgrade of local Deployment
It mixes multi-version/multi-region features, but the reason is,
that we pretty much share many common structures and patterns regarding
db-syncing here. Version syncing is still copying from one database to
another, even if this is a bit special since we will need to “modify”
the resources we are copying.
This module is interested in dynamic Deployment updates, but only for
current Service. See the node_manager.go
file. We utilize EnvRegistry
to get the current setup. Normally we will initiate inner node manager
when we get SyncEvent, but then we support dynamic updates via
DeploymentSetEvent and DeploymentRemovedEvent. We just need to verify
this Deployment belongs to our service. If it does, it means something
changed there and we should refresh. Perhaps we can get the “previous”
state, but it is fine to make NOOP refresh too. Anyway, we need to ensure
that Node is aware of all foreign Deployments because those are potential
candidates to sync from. Now let’s dive into a single Node instance.
Now, DbSyncingCtrl can be quite complex, even though it copies resource
instances across databases. First, check ControllerNode
struct in
the controller_node.go
file, which symbolizes a single Node responsible
for copying data. What we can say about it (basic breaking down):
- it may have two instances of
VersionedStorage
, one is older, one
for newer API. Generally, we support only the last two versions for
DbSyncer. It should not be needed to have more, and it would make
the already complex structure more difficult. This is necessary
for database upgrades.
- We have two instances of
syncingMetaSet
, for two versioned storages.
Those contain SyncingMeta
objects per multi-region policy-holders and
resource type pair. An instance of syncingMetaSet is used by
localDataSyncingNode instances. To be honest, if ControllerNode had
just one localDataSyncingNode object, not many, then syncingMetaSet
would be part of it!
- We have then rangedLocalDataNodes and rangedRemoteDataNodes maps.
Now, object localDataSyncingNode
is responsible for:
- Maintaining
syncing.metadata
, it must use the syncingMetaSet
passed instance for real-time updates.
- Syncing local resources to Search storage (read copies).
- Upgrading local database.
Then, remoteDataSyncingNode
is responsible for:
- Syncing resources from other Deployments in the same Service for
the current local database (read copies).
- Syncing resources from other Deployments for Search storage.
For each foreign Deployment, we will have separate remoteDataSyncingNode
instances.
It is worth asking the question, why do we have a map of syncing nodes
(local and remote) for shard ranges, the reason is, that we split them
to have at most ten shards. Often we may end up with maps of one sub-shard
range still. Why ten? Because in firestore, which is a supported database,
we can pass a maximum of ten shard numbers in a single request (filter)!
Therefore, we will need to make separate watch queries, and it’s easier
to separate nodes then. Now we can guarantee that a single local/remote
node will be able to send a query successfully to the backend. However,
because we have this split, we needed to separate syncingMetaSet
away
from localDataSyncingNode
, and put it directly in ControllerNode.
Since we have syncingMetaSet
separated, let’s describe what it does
first: Basically, it observes all multi-region policy-holders a Service
uses and computes SyncingMeta objects per policy-holder/resource type pair.
For example, Service iam.edgelq.com has resources belonging to Service,
Organization, and Project, so it watches these 3 resource types. Service
devices.edgelq.com only uses Project, so it watches Project instances,
and so on. It uses the ServiceDescriptor passed in the constructor to
detect all policy-holders.
When syncingMetaSet runs, it collects the first snapshot of all SyncingMeta
instances and then maintains it. It sends events to subscribers in real-time
(See ConnectSyncingMetaUpdatesListener
). This module is not responsible
for updating the metadata.syncing
field yet, but it is an important
first step. It will be triggering localDataSyncingNode
when new
SyncingMeta is detected, so it can run its updates.
The next important module is the resVersionsSet
object, defined in file
res_versions_set.go
. It is a central component in both local and remote
nodes, so perhaps it is worth explaining how it works.
This set contains all resource names with their versions in the tree
structure. By version, I don’t mean API version of the resource, I mean
literal resource version, we have a field in metadata for that,
metadata.resource_version
. This value is a string but can contain
only an integer that increments with every update. This is a base for
comparing resources across databases. How do we know that? Well, if we
have the “main” database owning resource, we know that it contains the
newest version, the field metadata.resource_version
is the highest
there. However, we have other databases… for example search database,
it may be separate, like Algolia. In that case, metadata.resource_version
may be lower. We also have a syncing database (for example across regions).
The other database in another region, which gets just read-only copies,
also can at best match the origin database. resVersionsSet
has important
functions:
SetSourceDbRes
and DelSourceDbRes
are called by original database
owning resource.
SetSearchRes
and DelSearchRes
are called by the search database.
SetSyncDbRes
and DelSyncDbRes
are called by syncing database
(for example cross-region syncing).
CollectMatchingResources
collects all resource names matched by
prefix. This is used by metadata.syncing
updates. When policy-holder
resource updates its MultiRegionPolicy, we will need to collect
all resources subject to it!
CheckSourceDbSize
is necessary for Firestore, which is known to be
able to “lose” some deletions. If the size is incorrect, we will need
to reset the source DB (original) and provide a snapshot.
SetSourceDbSyncFlag
is used by the original DB to signal that it
supplied all updates to resVersionsSet
and now continues with
real-time updates only.
Run
: resVersionsSet is used in multi-threading env, so we will run
on separate goroutine and use Go channels for synchronization. We will
need to use callbacks when necessary.
resVersionsSet also supports listeners when necessary, it triggers when
source DB updates/deletes a resource, or when we reach syncing database
equivalence with the original database. We don’t provide similar signals
for search DB, because simply we don’t need them… but we do for syncing
DB. We will explain later.
Now let’s talk about local and remote nodes, starting with local.
See the local_data_syncing_node.go
file, which constructs all modules
responsible for the mentioned tasks. First, analyze
newShardRangedLocalDataSyncingNode
constructor up to the
if needsVersioning
condition, where we create modules for Database
versioning. Before this condition, we are creating modules for Search
DB syncing and metadata.syncing
maintenance. Note how we are using
the activeVsResVSet
object (type of resVersionsSet
). We are
connecting to the search syncer and syncing meta updater modules. For
each resource type, we are creating an instance of source db watcher,
which gets access to the resource version set. It should be clear now:
Source DB, which is for our local deployment, keeps updating
activeVsResVSet, which in turn passes updates to activeVsSS and
activeVsMU. For activeVsMU, we are also connecting it to activeVsSyncMS,
so we have two necessary signal sources for maintaining the
metadata.syncing
object.
So, you should know now that:
-
search_syncer.go
It is used to synchronize the Search database, for local resources
in this case.
-
syncing_meta_updater.go
It is used to synchronize the metadata.syncing
field for all local
resources.
-
base_syncer.go
It is actually a common implementation for search_syncer.go
, but
not limited to.
Let’s dive deeper and explain what is synchronization protocol here
between source and destination. Maybe you noticed, but why
sourceDbWatcher
contains two watchers, for live and snapshot? Also,
why there is a wait to run a snapshot? Did you see that in the
OnInitialized
function of localDataSyncingNode
, we are running
a snapshot only when we have a sync signal received? There are reasons
for all of that. Let’s discuss design here.
When the DbSyncingCtrl node instance is initiated for the first time,
or when the shard range changes, we will need to re-download all resources
from the current or foreign database, to compare with synced database and
execute necessary creations, updates, and deletions. Moreover, we will need
to ask for a snapshot of data on the destination database. This may take
time, we don’t know how much, but probably downloading potentially millions
of items may not be the fastest operation. It means, that when there are
changes in nodes, upscaling, downscaling, reboots, whatever, we would need
to suspend database syncing, and it may be a bit long, maybe minute, what
if more? Is there an upper limit? If we don’t sync fast, this lag will start
to be quite too visible for users. It is better if we start separate
watchers, for live data directly. Then we will be syncing from the live
database to the destination (like search db), providing almost immediate
sync most of the time. In the meantime, we will collect snapshots of data
from the destination database. See the base_syncer.go
file, and see
function synchronizeInitialData
. When we are done with initialization,
we are triggering a signal, that will notify the relevant instance
(local or remote syncing node). In the file local_data_syncing_node.go
,
function OnInitialized
, we are checking if all components are ready,
then we run RunOrResetSnapshot
for our source db watchers. This is when
the full snapshot will be done, and if there are any “missing” updates
during the handover, we will execute them. Ideally, we won’t have them,
live watcher goes back by one minute when it starts watching, so
some updates may even be repeated! But it’s still necessary to provide
some guarantees of course. I hope this explains the protocol:
- Live data immediately is copying records from source to
destination database…
- In the meantime, the destination database collects snapshots…
- And when the snapshot is collected, we start the snapshot from
the source database…
- We execute anything missing and continue with live data only.
Another reason why we have the design we have, why we use QueryWatcher
instances (and not Watchers), is simple: RAM. DbSyncingCtrl needs to
practically watch all database updates and needs to get full resource
bodies. Note we are also using access.QueryWatcher
instances in
sourceDbWatcher
. QueryWatcher is a lower-level object compared to
just Watcher. It means, that it can’t support multiple queries, it
does not handle resets, or snapshot size checks (firestore only).
This is also a reason why in ControllerNode we have a map of
localDataSyncingNode
instances per shard range… The watcher would
be able to split queries and hide this complexity. But QueryWatcher
has benefits:
- It does not store watched resources in its internal memory!
Imagine millions of resources, whose whole resource bodies are kept
by Watcher instance in RAM. It goes in the wrong direction, so
DbSyncingCtrl is supposed to be slim. In resVersionsSet
we only
keep version numbers and resource names in tree form. We try to
compress all syncer modules into one place, so syncingMetaUpdater
and searchUpdater are in one place. If there is some update, we don’t
need to further split and increase pressure on the infrastructure.
This concludes the local data syncing node discussion in terms of
MultiRegion replication and Search db syncing for LOCAL nodes.
We will describe later in this doc Remote data syncing nodes. However,
let’s continue with the local data syncing node, and talk about its other
task: database upgrades. Therefore, let’s continue the discussion here.
Object localDataSyncingNode
needs to consider now actually four
databases (at maximum):
- Local database for API Version currently active (1)
- Local database for API Version to which we sync to (2)
- Local Search database for API Version currently active (3)
- Local Search database for API Version to which we sync to (4)
Let’s introduce the terms: Active database, and Syncing database. When
we are upgrading to a new API Version, the Active database contains
old data, Syncing database contains new data. When we are synchronizing
in another direction, for rollback purposes (just in case?), the Active
database contains new data, and the syncing database contains old data.
And extra SyncingMetaUpdaters:
syncingMetaUpdater
for the currently active version (5)
syncingMetaUpdater
for synced version (6)
We need sync connections:
- Point 1 to Point 2 (This is most important for database upgrade)
- Point 1 to Point 3
- Point 2 to Point 4
- Point 1 to Point 5 (plus extra signal input from
syncingMetaSet
active instance)
- Point 2 to Point 6 (plus extra signal input from the
syncingMetaSet
syncing instance)
This is insane and probably needs careful code writing, which sometimes
lacking here. We will need to carefully add some tests and try to
put extra makeup on the code, but the deadline was deadline.
Go back to function newShardRangedLocalDataSyncingNode
in
local_data_syncing_node.go
, and see a line with if needsVersioning
and below. This constructs extra elements. First, note we are creating
a syncingVsResVSet
object, and another resVersionsSet
. This set
will be responsible for syncing between the syncing database and
the search store. It is also used to keep signaling the syncing
version to syncingMetaUpdater
. But I see now this was a mistake
because we don’t need this element. Instead, it is enough for
the Active database to keep running its syncingMetaUpdater
. We will
know that those updates will be reflected in the syncing database
because we have already synced in this direction! We will need
to keep however second, additional Search database syncing. When we
finish upgrading the database to the new version, we don’t want to have
an empty search store from the first moment! This may not go unnoticed.
Therefore, we have this database, search syncing for “Syncing database”
too.
But let’s focus on the most important bits: actual database upgrade,
from Active to Syncing local main storages. Find a function called
newResourceVerioningSyncer
, and see what it is called. It receives
access to the syncing database, and it gets access to the
node.activeVsResVSet
object, which contains resources from
the active database. This is the object responsible for upgrading
resources: resourceVersioningSyncer
, in file
resource_versioning_syncer.go
. It works like other “syncers”, and
inherits from base syncer, but it also needs to transform resources.
It uses transformers from versioning
packages. When it uses
resVersionsSet
, it calls SetSyncDbRes
and DelSyncDbRes
,
to compare with original database. We can safely require, that
metadata.resourceVersion
must be the same between old and new
resource instances, transformation cannot change it. Because syncDb
and searchDb are different, we are fine with having search syncer and
versioning syncer use the same resource versions set.
Object resourceVersioningSyncer
also makes extra ResourceShadow
upgrades, transformed resources MAY have different references after
the changes, therefore we need to refresh them! It makes this syncer
even more special.
However, we have little issue with ResourceShadow instances, they don’t
have a metadata.syncing
field, and they are partially covered by
resourceVersioningSyncer
, we are not populating some fields, like back
reference sources. As this is special, we need shadowsSyncer
, defined
in file shadows_versioning_syncer.go
. It synchronizes also ResourceShadow
instances, but fields that cannot be populated by resourceVersioningSyncer
.
During database version syncing, localDataSyncingNode receives signals
(per resource type), when there is a synchronization event between
the source database and the syncing database. See that we have
the ConnectSyncReadyListener
method in resVersionsSet
. This is how
syncDb (here it is a syncing database!) notifies when there is a match
between two databases. This is used by localDataSyncingNode to coordinate
Deployment version switches. See function runDbVersionSwitcher
to see
the full procedure. This is the place basically, where Deployment can
switch from one version to another. When this happens, all backend
services will flip their instances.
This is all about local data syncing nodes. Let us switch to remote
nodes: remote node (object remoteDataSyncingNode
, file
remote_data_syncing_node.go
) is syncing between the local database
and a foreign regional one. It is simpler than local at least. It
synchronizes:
- From remote database to local database
- From remote database to local search database
If there are two API Versions, it is assumed that both regions may be
updating. Then, we have 2 extra syncs:
- From the remote database in the other version to the local database
- From remote database in the other version to local search database
When we are upgrading, it is required to deploy new images on the
first region, then the second, third, and so on, till the last region
gets new images. However, we must not switch versions of any region
till all regions get new images. While switching and deploying can be
done one by one, those stages need separation. This is required for
these nodes to work correctly. Also, if we switch the Deployment version
in one region before we upgrade images in other regions, there is a high
chance users may use the new API and see some significant gaps in resources.
Therefore, versioning upgrade needs to be considered in multi-regions too.
Again, we may be operating on four local databases and two remote APIs in
total, but at least this is symmetric. Remote syncing nodes also don’t
deal with Mixins, so no ResourceShadow cross-db syncing. If you study
newShardRangedRemoteDataSyncingNode
, you can see that it uses
searchSyncer and dbSyncer (db_syncer.go).