This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
Goten Protocol Flows
Understanding the Goten protocol flows.
Design decision includes:
- services are isolated, but they can use/import services on lower
levels only, and they can support only a subset of regions available
from these used/imported services.
- deployments within the Service must be isolated in the context of
versioning. Therefore, they don’t need to point to the same primary
API version and each Service version may import different services
in different versions.
- references may point across services only if the Service imports
another service. References across regions are fine, it is assumed
regions for the same Service trust each other, at least for now.
- all references must carry region, version, and service information
to maintain full global env.
- We have schema and meta owner references. Schema refs define a region
by name, version, and service by context. Meta refs have separate
fields for region, service, and version.
- Schema references may be of blocking type, use cascade deletion, or
unset.
- Meta references must trigger cascade deletion if all owners disappear.
- Each Deployment, Service + Region pair, is responsible for maintaining
metadata.syncing
fields of resources it owns.
- Each Deployment is responsible for catching up with read-copies from
other regions available for them.
- Each Deployment is responsible for local database schema and upgrades.
- Each Deployment is responsible for Meta owner references in all service
regions if they point to the Deployment (via Kind and Region fields!).
- Every time cross-region/service references are established, the other
side may reject this relationship.
We have several components in API servers and db controllers for maintaining
order in this graph. Points one to three are enforced by Meta service and
EnvRegistry components. EnvRegistry uses generated descriptors
from the Goten specification to populate the Meta service. If someone is
“cheating”, then look at point twelve, the other side may reject it.
1 - API Server Flow
Understanding the API server flow.
To enforce general schema consistency, we must first properly handle
requests coming from users, especially writing ones.
The following rules are executed when API servers get a write call:
- when a writing request is sent to the server, multi-region routing
middleware must inspect the request, and ensure that all resources
that will be written to (or deleted), are owned by the current
region. It must store the MultiRegionPolicy object in the context
associated with the current call.
- write requests can only execute write updates for a resources under
single multi-region policy! It means that writing across let’s say
two projects will not be allowed. It is allowed to have writing
operations to global resources though. If there is an attempt to
write to multiple resources across different policy holders in
a single transaction, the Store object must reject the write.
- Store object must populate the
metadata.syncing
field when saving.
It should use MultiRegionPolicy from context.
- When the server calls the Save or Delete function on the store
interface (for whatever Service resource), the following things
happen:
- If this is a creation/update, and the new resource has schema
references that were not there before, then the Store is responsible
for connecting to those Services and ensuring that resources exist,
the relationship is established, and it is allowed to establish
references in general. For references to local resources, it also
needs to check if all is fine.
- If this is deletion, the Store is obliged to check if there are
any blocking back-references. It needs to connect with Deployments
where references may exist, including self. For local synchronous
cascade deletion & unset, it must execute them.
- When Deployment connects with others, it must respect their API
versions used.
- Meta owner references are not checked, because it is assumed they may
be created later. Meta-owner references are asynchronously checked by
the system after the request is completed.
This is a designed flow for API Servers, but we have a couple more flows
regarding schema consistency. First, let’s define some corner cases when
it comes to blocking references across regions/services. Scenario:
- Deployment D1 gets a write (Creation) to resource R1. Establishes
SNAPSHOT transaction.
- R1 references (blocking) R2 in Deployment D2, therefore, on the Save
call, D1 must ensure everything is valid.
- Deployment D1 sends a request to establish a blocking reference to R2
for R1. D2 can see R2 is here.
- D2 blocks resource R2 in its SNAPSHOT transaction. Then sends a signal
to D1 that all is good.
Two things can happen:
- D1 may fail to save R1 because of the failure of its local transaction.
Resource R2 may be left with some blockade.
- Small chance, but after successful blockade on R2, D2 may get delete R2
request, while R1 still does not exist, because
D1 did not finish its transaction yet. If D2 asks D1 for R1, D1 will say
nothing exists. R2 will be deleted, but then R1 may appear.
Therefore, when D2 blocks resource R2, it is a special tentative blockade
with a timeout of up to 5 minutes, if I recall the amount correctly. This
is way more than enough since transactions are configured to timeout after
one minute. It means R2 will not be possible to delete for this period.
Then protocol continues:
- If D1 fails transaction, D2 is responsible to asynchronously remove
tentative blockade from R2.
- If D1 succeeds the transaction, then D1 is responsible for informing
in an asynchronous manner that tentative blockade on R1 is confirmed.
2 - Meta Owner Flow
Understanding the meta owner flow.
Let’s define some terminologies:
-
Meta Owner
It is a resource that is being pointed by the Meta owner reference object
-
Meta Ownee
It is a resource that points to another resource by the
metadata.owner_references
field.
-
Meta Owner Deployment
Deployment to which Meta Owner belongs.
-
Meta Ownee Deployment
Deployment to which Meta Ownee belongs.
-
Meta Owner Reference
It is an item in metadata.owner_references
array field.
We have three known cases where action is required:
-
API Server calls Save method of Store, and saved resource
has non-empty meta owner refs. API Server must schedule
asynchronous tasks to be executed after the resource is
saved locally (We trust meta owner refs are valid). Then
asynchronously:
- deployment owning meta ownee resource must periodically
check if meta owners exist in target Deployments.
- if after some timeout it is detected that the meta owner
reference is not valid, then it must be removed. If it
empties all meta owner refs array, the whole resource must
be deleted.
- if meta owner reference is valid, Deployment with meta
ownee resource is responsible for sending notifications
to Deployment with meta owner resource. If the reference
is valid, it will be successful.
- if Deployment with meta ownee detects that version of meta
owner reference is too old (during validation), then it
must upgrade it.
Note that in this flow Deployment with meta ownee resource
is an actor initializing action, it must ask Deployments
with meta owners if its meta ownee is valid.
-
API Server calls the Save method of Store, and the saved
resource is known to be the meta-owner of some resources
in various Deployments. In this case, it is meta owner
Deployment responsible for actions, asynchronously:
- it must iterate over Deployments where meta ownees may be,
and verify if they are affected by the latest save. If not,
no need for any action. Why however meta ownees may be
affected? Let’s list the points below…
- sometimes, meta owner reference has a flag telling that
the meta owner must have a schema reference to the meta
ownee resource. If this is the case, and we see that
the meta owner lost the reference to a meta ownee,
the meta ownee must be forced to clean up its meta owner
refs. It may trigger its deletion.
- If there was a Meta Owner Deployment version upgrade, this
Deployment is responsible for updating all Meta ownee
resources. Meta ownees must have meta owner references
using the current version of the target Deployment.
-
API Server calls Delete method of Store, and deleted resource is
KNOWN to be meta-owner of some resources in various Deployments.
Deployment owning deleted meta owner resource is responsible for
the following asynchronous actions:
- It must iterate over Deployments where meta ownees may exist,
and list them.
- For each meta ownee, Meta Owner Deployment must notify about
deletion, Meta Ownee Deployment.
- API Server of meta ownee deployment is responsible for removing
meta owner reference from the array list. It may trigger
the deletion of meta ownee if there are no more meta owner
references.
Note that all flows are pretty much asynchronous, but still ensure
consistency of meta owner references. In some cases though it is meta
owner Deployment reaching out, sometimes the other way around. It
depends on which resource was updated last.
3 - Cascade Deletion Flow
Understanding the cascade deletion flow.
When some resource is deleted, and the API Server accepts deletion, it
means there are no blocking references anywhere. This is ensured.
However, there may be resources pointing to deleted ones with asynchronous
deletion (or unset).
In these flows we talk only about schema references, meta are fully
covered already.
When Deployment deletes some resource, then all Deployments affected
by this deletion must take an asynchronous action. It means that if
Deployment D0-1 from Service S0 imports Service S1 and S2, and S1 + S2
have deployments D1-1, D1-2, D2-1, D2-2, then D0-1 must make four
real-time watches asking for any deletions that it needs to handle!
In some cases, I remember service importing five others. If there were
50 regions, it would mean 250 watch instances, but it would be a very
large deployment with sufficient resources for goroutines.
Suppose that D1-1 had some resource RX, that was deleted. Following
happens:
- D1-1 must notify all interested deployments that RX is deleted
by inspecting back reference sources.
- Suppose that RX had some back-references in Deployment D0-1,
Deployment D1-1 can see that.
- D1-1, after notifying D0-1, periodically checks if there are still
active back-references from D0-1.
- Deployment D0-1, which points to D1-1 as an importer, is notified
about the deleted resource.
- D0-1 grabs all local resources that need cascade deletion or unset.
For unsets, it needs to execute regular updates. For deletions, it
needs to delete (or mark for deletion if there are still some other
back-references pointing, which may be blocking).
- Once D0-1 deals with all local resources pointing to RX, it is done,
it has no work anymore.
- At some point, D0-1 will be asked by D1-1 if RX no longer has back
refs. If this is the case, then D0-1 will confirm all is clear and
D1-1 will finally clean up what remains of RX.
Note that:
-
This deletion spree may be deep for large object deletions, like
projects. It may involve multiple levels of Deployments and Services.
-
If there is an error in the schema, some pending deletion may be stuck
forever. By error in the schema, we mean situations like:
- Resource A is deleted, and is back referenced from B and C
(async cascade delete).
- Normally B and C should be deleted, but it may be a problem if
C is let’s say blocked by D, and D has no relationship with A,
so will never be deleted. In this case, B is deleted, but C is
stuck, blocked by D. Unfortunately as of now Goten does not
detect weird errors in schema like this, perhaps it may be
a good idea, although not sure if possible.
- It will be the service developers’ responsibility to fix schema
errors.
-
In the flow, D0-1 imports Service to which D1-1 belongs. Therefore,
we know that D0-1 knows the full-service schema of D1-1, but not the
other way around. We need to consider this in the situation when D1-1
asks D0-1 if RX no longer has back refs.
4 - Multi-Region Sync Flow
Understanding the multi-region synchronization flow.
First, each Deployment must keep updating metadata.syncing
for all
resources it owns. To watch owned resources, it must:
API Server already ensures that the resource on update has the
metadata.syncing
field synced! However, we have an issue when
MultiRegionPolicy object changes. This is where Deployment must
asynchronously update all resources that are subject to this policyholder.
It must therefore send Watch requests for ALL resources that can be
policy-holders. For example, Deployment of iam.edgelq.com
will need
to have three watches:
-
Watch Projects WHERE multi_region_policy.enabled_regions CONTAINS <MyRegion>
by iam.edgelq.com service.
-
Watch Organizations WHERE multi_region_policy.enabled_regions CONTAINS <MyRegion>
by iam.edgelq.com service.
-
Watch Services WHERE multi_region_policy.enabled_regions CONTAINS <MyRegion>
by meta.goten.com service.
Simpler services like devices.edgelq.com would need to watch only
projects, because it does not have other resources subject to this.
Deployment needs to watch policyholders that are relevant in its region.
Flow is now the following:
- When Deployment gets a notification about the update of MultiRegionPolicy,
it needs to accumulate all resources subject to this policy.
- Then it needs to send an Update request for each, API server ensures
that
metadata.syncing
is updated accordingly.
The above description ensures that metadata.syncing
is up-to-date.
The next part is actual multi-region syncing. In this case, Deployments
of each Service MUST have one active watch on all other Deployments from
the same family. For example, if we have iam.edgelq.com in regions
japaneast, eastus2, us-west2, then following watches must be maintainer:
Deployment of iam.edgelq.com in us-west2
has two active watches,
one sent to japaneast region, the other eastus:
WATCH <Resources> WHERE metadata.syncing.owningRegion = japaneast AND metadata.syncing.regions CONTAINS us-west2
WATCH <Resources> WHERE metadata.syncing.owningRegion = eastus2 AND metadata.syncing.regions CONTAINS us-west2
Deployments in japaneast and eastus2 will also have similar two watches.
We have a full mesh of connections.
Then, when some resource in us-west2 gets created with
metadata.syncing.regions = [eastus2, japaneast]
, then one copy will be
sent to each of these regions. Those regions must be executing pretty much
continuous work.
Now, on the startup, it is necessary to mention the following procedure:
- Deployment should check all lists of currently held resources
owned by other regions, but syncable locally.
- Grab a snapshot of these resources from other regions, and
compare if anything is missing, or if we have too much
(missing deletion). If this is the case, it should execute missing
actions to bring the system to sync.
- During the initial snapshot comparison, it is still valuable to
keep copying real-time updates from other regions. It may take
some time for the snapshot to be completed.
5 - Database Migration Flow
Understanding the database migration flow.
When Deployment boots up after the image upgrade, it will detect that
the currently active version is lower than the version it can support.
In that case, the API Server will work on the older version normally,
but the new version API will become available in read-only mode.
Deployment is responsible for asynchronous, background syncing of
higher version database with current version database. Clients are
expected to use older versions anyway, so they won’t necessarily see
incomplete higher versions. Besides, it’s fine, because what matters
is the current version pointed out by Deployment.
It is expected that all Deployments will get new images first before
we start switching to the next versions. Each Deployment will be
responsible for silent copying.
For the MultiRegion case, when multiple deployments of the same service
are on version v1, but they run on images that can support version v2,
they will be still synced with each other, but on both versions: v1 and
v2. When images are being deployed region by region (Deployment by
Deployment), they may experience Unimplemented error messages, but it
should be till images are updated in all regions. We may improve this
and try to detect “available” versions first, before making cross-region
watches.
Anyway, it will be required that new images are deployed to all regions
before the upgrade procedure is triggered on any Regional deployment.
Upgrade then can be done one Deployment by one, using the procedure
described in the migration section of the developer guide.
When one Deployment is officially upgraded to the new version, but
still uses primarily the old version, then all deployments still watch
each other for both versions, for the sake of multi-region syncing.
However, Deployment using a newer version may already opt-out from
pulling older API resources from other Deployments at this point.
Meta owner references are owned by Deployment they point to. It means
that they are upgraded asynchronously after deployment switch the version
to the newer one.