This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
SPEKTRA Edge Limits Service Design
Understanding the SPEKTRA Edge resource limit design.
Limits service is a fairly one of those services with special design.
Service itself should be known from the user guide already and in some
part from the developer guide. Knowledge about plans is assumed. Limits
also is one of the standard services, and its code structure in the
SPEKTRA Edge repository (limits
directory) should be familiar.
What needs explanation is how Limits ensures that “values” don’t get
corrupted, lost, over-allocated, etc. First, resources are allocated
in each service, but resource limits.edgelq.com/Limit
belongs to
a Limits service. Therefore, we can’t easily guarantee counter-integrity
if the resource is created in one service and counted elsewhere. Next,
we know that limit values are passed from service to organizations, then
to potential child organizations, and eventually to projects. From
MultiRegion design, we know that each organization and project may point
to a main region where resources are kept. Therefore, we know that
organizations/{organization}/acceptedPlans/{acceptedPlan}
is in the
organization’s region, and
projects/{project}/planAssignments/{planAssignment}
is in the project’s
region, may be different. This document describes how these Limits work
in this case.
We will also be showing code pointers, where things can be found.
During this guide, you will find out why parallel creations/deletions
are not parallel!
1 - Service Limit Initialization
Understanding how the SPEKTRA Edge service limit initialized.
When Service boots up, it creates limits.edgelq.com/Plan
instances.
Limits controller, defined in limits/controller/v1
, has LimitsAssigner
processor, defined in
limits/controller/v1/limits_assigner/limits_assigner.go
. It is
created per each possible assigner, therefore, it is created per Service
and Organization. LimitsAssigner is typically responsible for creating
AcceptedPlan
instances for child entities, but, for Services, it makes
an exception: It creates an AcceptedPlan for itself! See file
limits/controller/v1/limits_assigner/default_plan_acceptor.go
, function
calculateSnapshot
computes plans for child entities, and for
the Service itself! This is booting things up, the Service can assign
any values it likes to itself.
2 - Project and Organization Limit Initialization
Understanding how the project and organization limit initialized.
If the project has a parent organization, then this parent organization
is an assigner for a project. If the project is root-level, then its
enabled services are assigners, each service can assign an individual
plan for a project. Same for organizations. When a project/organization
is created, the Limits Controller puts the newly created entity in
the “assigner” box (or boxes for root-level entities). Then it creates
an instance(s) of AcceptedPlan
. Implementation, again, is in
limits/controller/v1/limits_assigner/default_plan_acceptor.go
. It is
worth mentioning however now, that DefaultPlanAcceptor
uses LimitPools
of assigner to see if it will be able to create an AcceptedPlan
resource. If not, it will instead annotate the Project/Organization
that failed to create a plan for. This is why in limits_assigner.go
you can see Syncer for not only AcceptedPlan but also Project and
Organization.
3 - AcceptedPlan Update Process
Understanding the AcceptedPlan update process.
Implementation can be found naturally in the server:
limits/server/v1/accepted_plan/accepted_plan_service.go
. We pass
of course “actual” creation to the core server, but this is just
a small step, the whole logic to execute before any CUD operation
is much more significant here.
When the server processes the AcceptedPlan resource (Create or Update),
then we are guaranteed to be in the Limits service region where
the assigner resides. Because LimitsPools are a child of Service or
Organization, we can guarantee that they reside on the same regional
database as AcceptedPlan. Thanks to this, we can verify, within
the SNAPSHOT transaction, that the caller does not attempt to
create/update any AcceptedPlan that would exceed the limit pools of
an assigner! This is the primary guarantee here: Assigner will
not be able to exceed allocated values in its LimitPools. We need to
check cases where AcceptedPlan increases reservations on Assigner
LimitPools. When we decrease (some updates, deletions), then
we don’t need to do that.
However, there is some risk with decreasing accepted plans
(some updates and deletions). There is a risk that doing so would
decrease assignee limit values below current usage. To prevent this,
in the function validateAssigneeLimitsAndGetLimitPoolUpdates
in
server implementation, we are checking assignee limit values. This
will work in 99.99% of cases unless some new resources will be
allocated while we confirm that we can decrease limits. Therefore,
we don’t have guarantees here.
In the result, when we create/update AcceptedPlan, we are only increasing
LimitPools reservations values of Assigner. When we would decrease
LimitPool values, we just don’t yet.
Decreasing values is done by the Limits controller, we have a task for this,
in limits/controller/v1/limits_assigner/limit_pool_state_syncer.go
.
It takes into account all child Limit and LimitPool instances (for assignees),
which are synchronized with PlanAssignment instances. It then sends
UpdateLimitPool requests when it confirms decreased values of AcceptedPlan
action (updated or deleted) took an effect. Reservation is immediate,
release is asynchronous and delayed.
Some cheating however is potentially possible, if the org admin sends
UpdateLimitPool trying to minimize the “Reserved” field, after which
it can attempt to create a new accepted plan quickly enough before
the controller fixes values again. Securing this may be a bit more
tricky, but such an update would leave LimitPool with a Reserved value
way above the configured size, which will be detectable, along with
ActivityLogs, and if not, ResourceChangeLogs. It is unlikely it will
be tried this way. A potential way to secure this would be to disable
AcceptedPlan updates if the Reserved value of LimitPool decreased
recently, with some timeout like 30 seconds. Optionally, we can just
put some custom code in the API Server for UpdateLimitPool, and validate
only straight service admin updates them (check principal from context).
This is not covered by IAM Authorization code-gen middleware, but custom
code can simply do.
4 - Resource Limit Assignment
Understanding the resource limit assignment.
It is assumed that organization admins can see and manage AcceptedPlan
instances, but their tenants can only see them. Furthermore, parent
and child organization and other organization/final projects are separate
IAM scopes. Child entities also may reside in different primary regions
than their parent organization (or service). For these reasons, we have
resource type PlanAssignment
, which is even read-only, see its proto
definition. This allows admins to see the plan assigned for them, but
without any modifications, even if they are owners of their scope.
Because PlanAssignment is located in a region pointed by the
project/organization, we can guarantee synchronization with LimitPool/Limit
resources!
When AcceptedPlan is made, the Limits Controller is responsible for
creating PlanAssignment asynchronously, which may be in a different
region than source AcceptedPlan. The code for it is in
limits/controller/v1/limits_assigner/assigned_plans_copier.go
.
It creates an instance of PlanAssignment and sends a request to API
Server. The server implementation is in, naturally, file
limits/server/v1/plan_assignment/plan_assignment_service.go
. Note
that the controller is setting output-only fields, but it is fine, when
the server creates an instance, it will have these fields too. This
only ensures that, if there is any mismatch in the controller, it will
be forced to make another update.
When processing writes to PlanAssignment, the API Server grabs
AcceptedPlan from the database, we require the child organization or
project to be in a subset of regions available in parents. Therefore,
we know at least a synced read-only copy of AcceptedPlan will be
in the database. This is where we grab the desired configuration from.
PlanAssignment is synchronized with Limit and LimitPool instances, all
of these belong to the same assignee, so we know our database owns these
resources. Therefore, we can provide some guarantees based on SNAPSHOT:
Configured limit values in Limit/LimitPool resources are guaranteed to
match those in PlanAssignment, users don’t get any chance to make any
mistake, and the system is not going to be out of sync here.
Note that we are only changing the configured limit, we have also
so-called active limits. This is maintained by the controller. There is
some chance configured limit is being set below current usage, if this
happens, the active limit will stay on a higher value, as large as usage.
This will affect the source limit pool reserved value, it will stay
elevated! It is assumed however that PlanAssignment and configured
limits must stay in sync with AcceptedPlan values, no matter if we are
currently allocating/deallocating resources on the final API Server side.
Note that the limits controller tracks the active size and reserved value
for LimitPool instances. Limits are on the next level.
5 - Resource Limit Tracking
Understanding how the resource limit is tracked.
We need to provide a guarantee that the usage tracker stays in sync
with the actual resource counter. The best way to do that is to count
during local transactions. However, resource Limit belongs to the Limits
service, not the actual servicee. This is why we have Limits Mixin
Service in the SPEKTRA Edge repository, mixins/limits
.
It injects one resource type: LocalLimitTracker. Note it is a regional
resource, but not a child of a Project. This means that no project admin
will be able to see this resource ever, or any parent organization. This
resource type is hidden, only service admins will be able to see it. This
prevents any chance of final user mismanagement as well. Because this
resource type is mixed along with final service resources, we can achieve
SNAPSHOT transactions between actual resources and trackers. We can even
prevent bugs that could result in the usage tracker having invalid values.
When we create/update the LocalLimitTracker resource, we can extract
the true counter from the local database, see file
mixins/limits/server/v1/local_limit_tracker/local_limit_tracker_service.go
.
To check how LocalLimitTracker usage is tracked during transactions,
check two files:
mixins/limits/resource_allocator/v1/resource_allocator.go
common/store_plugins/resource_allocator.go
This is how the store plugin tracks creations/deletions, at the end of
the transaction, it tries to push extra updates, LocalLimitTracker
instances for all resource types where several instances changed. This
guarantees complete synchronization with the database. But note this
does not create LocalLimitTrackers yet.
This is why Limits Mixin comes with not only an API Server
(so LocalLimitTrackers can be accessed) but also a controller, see
mixins/limits/controller/v1
directory. Inside Limits processor we have:
LocalLimitTrackersManager
instance, which Creates/Updates/Deletes
instances of LocalLimitTracker for every Limit instance in
Limits service.
- Synchronizes Limit instances in limits service using LocalLimitTrackers
from its region. It means that there is no actual point in meddling with
Limit fields, the controller will fix them anyway, and they don’t
participate in actual usage checking anyway.
- Maintains also PhantomTimeSeries, so we have special store usage metrics,
showing how historically resource counters were changing.
Note that the Limits processor in this controller has built-in multi-region
features, primary region for project creates/deletes LocalLimitTrackers,
but the final regions are maintaining Limit instances and PhantomTimeSeries.
6 - Project and Organization Deletion Process
Understand the project and organization deletion process.
When Project/Organization is deleted, we need to ensure that limit
values will return to the assigner. This is why AcceptedPlan instances
have assignee reference fields, with the ASYNC_CASCADE_DELETE
option.
When they are deleted, plans follow. This will delete PlanAssignments,
but as it was said, LimitPools are not given reserved values yet.
Instead, db-controllers should be deleting all child resources of the
assignee, like Project. This will decrease Limit usage values, till we
hit 0.
To prevent deletion of Limit/LimitPool instances before they reach zero
values, we utilize metadata.lifecycle.block_deletion
field, as below:
-
limits/server/v1/limit/limit_service.go
Take a look at the update function, UpdateMetadataDeletionBlockFlag.
-
limits/server/v1/limit/limit_pool_service.go
Take a look at the update function, UpdateMetadataDeletionBlockFlag.
This way LimitPool and Limit resources disappear only last. We achieve
some order of deletions, so it is not chaotic. The controller for
the assignee will confirm the reserved value of LimitPool is decreased
only after whole resource collections are truly deleted.