Auto-Generated Protobuf Files
Protobuf files describe:
- Resources - models, database indices, name patterns, views, etc.
- Request/Response object definitions (bodies)
- API groups, each with a list of methods
- Service package metadata information
Note that you can read about API just all by looking at protobuf files.
Example proto files for 3rd party app: https://github.com/cloudwan/inventory-manager-example/tree/master/proto/v1 You can also see files in the edgelq repository too.
Resource protobuf files
For each resource in the API specification, Goten will create 2 protobuf files:
-
<resource_name>.proto
This file will contain the proto definition of a single resource.
-
<resource_name>_change.proto
This file will contain the proto definition of the DIFF object of a resource.
Protobuf with Change object is used for Watch requests, real-time subscriptions.
Be aware that:
-
goten-bootstrap will always overwrite
<resource_name>_change.proto
You should never write to it.
-
File
<resource_name>.proto
will be generated for the first time only.If you change anything in the API skeleton later that would affect the proto file, you will need to either update the file manually in the way the bootstrap utility would, or rename the file and let a new one be generated. You will need to copy all manually written modifications back to the newly generated file. Typically it means resource fields and additional import files.
When a resource file is generated for the first time, it will have name and metadata fields, plus special annotations applicable for resources only. You will need to replace TODO sections in the resource.
The first notable annotation is google.api.resource
, like:
option (google.api.resource) = {
type : "inventory-manager.examples.edgelq.com/Site"
pattern : "projects/{project}/regions/{region}/sites/{site}"
};
You should note that this annotation will always show you a list of all possible name patterns. Whenever you change something later in the API specification (parents or scopeAttributes), you will need to modify this annotation manually.
Second, a more important annotation is the one provided by Goten, for example:
option (goten.annotations.resource) = {
id_pattern : "[a-zA-Z0-9_.-]{1,128}" // This is default value, this is set initially from api-skeleton idPattern param!
collection : "sites" // Always plural and lowerCamelJson
plural : "sites" // Equal to collection
parents : "Project" // If there are many parents, we will have many "parents:"
on_parent_deleted_behavior : ASYNC_CASCADE_DELETE // Highly recommended, typical in SPEKTRA Edge
scope_attributes : "goten.annotations/Region" // Set for regional resources
async_deletion : false // If set to true, resource will not disappear immediately after deletion.
};
This one shows basic properties like a list of parents, scope attributes, or what happens when a parent is deleted. Parent deletion will always need to be set for each resource. From SPEKTRA Edge’s perspective, we recommend however cascade deletion (and better to do this asynchronously). You may do this in-transaction deletion if you are certain there will be no more than 10 kid resources at once. Especially project kids should use asynchronous cascade deletion. We strive to make project deletion rather a smooth process (although warning: SOFT delete option is not implemented yet).
Parameter async_deletion
should have an additional note: When a resource is
deleted, by default its record is removed from the database. However, if
async_deletion
is true, then it will stay till all backreferences are cleaned
up (no resource points at us). In some cases it may take considerable time:
for example large project deletion.
We recommend setting async_deletion
to true for top resources, like Project.
References to other resources
Setting a reference to other resources is pretty straightforward, it follows this pattern:
message SomeResource {
option (google.api.resource) = { ... };
option (goten.annotations.resource) = { ... };
string reference_to_resource_from_current_service = 3 [
(goten.annotations.type).reference = {
resource: "OtherResource"
target_delete_behavior : BLOCK
}
];
string reference_to_resource_from_different_service = 4 [
(goten.annotations.type).reference = {
resource: "different.edgelq.com/DifferentResource"
target_delete_behavior : BLOCK
}
];
}
Note you always need to specify target deletion behavior. If you just want to
hold the resource name, but it is not supposed to be a true reference, then
you should use (goten.annotations.type).name.resource
annotation.
References to resources from different services or different regions will
implicitly switch to ASYNC versions of UNSET/CASCADE_DELETE
!
Views
Reading methods (Get, BatchGet, List, Watch, Search - if enabled) normally have
a field_mask
field in their request bodies. Field mask selects which fields
should be returned in the response, or the case of the watch, incremental
real-time updates. Apart from field mask field, there is another one: view
.
View indicates the default field mask that should be applied. If both view
and field_mask
are specified in a request, then their masks are just merged.
There are the following view types available: NAME
, BASIC
, DETAIL
, and
FULL
. The first one is a two-element field mask, with fields name
and
display_name
(if it is defined in a resource!). The last one should be
self-explanatory. Two other ones by default are undefined and if they are
used, they will work as FULL ones. Developers can define any 4 of them,
even NAME and FULL - those will be just overwritten. This can be done using
annotation goten.annotations.resource
.
message SomeResource {
option (goten.annotations.resource) = {
...
views : [
{
view : BASIC
fields : [
{path : "name"},
{path : "some_field"},
{path : "other_field"}
]
},
{
view : DETAIL
fields : [
{path : "name"},
{path : "some_field"},
{path : "other_field"},
{path : "outer.nested"}
]
}
]
};
}
Note that you need to specify fields using snake_case. You can specify nested fields too.
Database indices
List/Watch requests work on a “best effort” basis in principle. However, sometimes indices are needed for performance, or, like in the case of Firestore, to make certain queries even possible.
Database indices are declared in protobuf definitions in each resource. During startup, db-controller runtime uses libraries provided by Goten to ensure indices in protobuf match those in the database. Note that you should not create indices on your own unless for experimentation.
Let’s define some examples, for simplicity we show just name patterns and indices annotations, fields can be imagined:
message Device {
option (google.api.resource) = {
type : "example.edgelq.com/Device"
pattern : "projects/{project}/devices/{device}"
};
option (goten.annotations.indices) = {
composite : {
sorting_groups : [
{
name : "byDisplayName",
order_by : "display_name",
scopes : [ "projects/{project}/devices/-" ]
},
{
name : "bySerialNumber"
order_by : "info.serial_number"
scopes : [
"projects/-/devices/-",
"projects/{project}/devices/-"
]
}
]
filters : [
{
field_path : "info.model"
required : true
restricted_sorting_groups : [ "bySerialNumber" ]
},
{
field_path : "info.maintainer_group"
reference_patterns : [ "projects/{project}/maintanenceGroups/{maintanenceGroup}" ]
}
]
}
single : [ {field_path : "machine_type"} ]
};
}
There are two indices types: single-field and composite. Single should be pretty straightforward, you specify just the field path (can be nested with dots), and the index should be usable by this field. Composite indices are generated based on sorting groups combined with filters.
Composite indices are optimized for sorting - but as of now, only one sorting
field is supported. However, if the sorting field is different from the name,
then “name” is additionally added, to ensure sorting is stable. In the above
example, composite indices can be divided into two groups - those with sorting
by display_name
, or info.serial_number
.
Note that the sorting field path also is usable for filtering, therefore, if you just need a specific composite index for multiple fields for filtering, you can just pick some field that may be optionally used for sorting too. Apart from that, each sorting group has built-in filter support for name fields, for specified patterns only (scopes).
Attached filters can either be required (and if the filter is not specified in a query, it will not be indexed), or optional (each non-required filter doubles the amount of generated indices.)
Based on the above example, generated composite indices will be:
- filter (
name.projectId
) orderBy (display_name ASC
,name.deviceId ASC
) - filter (
name.projectId
) orderBy (display_name DESC
,name.deviceId DESC
) - filter (
name.projectId
,info.maintainer_group
) orderBy (display_name ASC
,name.deviceId ASC
) - filter (
name.projectId
,info.maintainer_group
) orderBy (display_name DESC
,name.deviceId DESC
) - filter (
info.model
) orderBy (info.serial_number ASC
,name.projectId ASC
,name.deviceId ASC
) - filter (
info.model
) orderBy (info.serial_number DESC
,name.projectId DESC
,name.deviceId DESC
) - filter (
name.projectId
,info.model
) orderBy (info.serial_number ASC
,name.deviceId ASC
) - filter (
name.projectId
,info.model
) orderBy (info.serial_number DESC
,name.deviceId DESC
) - filter (
info.model
,info.maintainer_group
) orderBy (info.serial_number ASC
,name.projectId ASC
,name.deviceId ASC
) - filter (
info.model
,info.maintainer_group
) orderBy (info.serial_number DESC
,name.projectId DESC
,name.deviceId DESC
) - filter (
name.projectId
,info.model
,info.maintainer_group
) orderBy (info.serial_number ASC
,name.deviceId ASC
) - filter (
name.projectId
,info.model
,info.maintainer_group
) orderBy (info.serial_number DESC
,name.deviceId DESC
)
When we sort by display_name
, to utilize the composite index, we should also
filter by the projectId
part of the name field. Additional sorting by
name.deviceId
part is added implicitly to any order. If we add
info.maintainer_group
to the filter, we will switch to a different composite
index.
If we just filter by display_name
(we can use > or < operators too!), and
add filter by projectId part of the name, then one of those first composite
indices will be used too.
When defining indices - be aware of multiplications. Each sorting group has two multipliers - the next multiply is the number of possible name patterns we add (scopes). Finally, for each non-required field, we multiply the number of indices by 2. Here we generated 12 composite indices and 1 single-field one. The amount of indices is important from the perspective of the database used, in Firestore we can have 200 indices typically per database, and in Mongo 64 per collection.
Cache indices
To improve performance & reduce database usage, Goten & SPEKTRA Edge utilize Redis as a database cache.
Service developers should carefully analyze which queries are mostly used, what is the update rate, etc. With goten cache, we support:
-
Get/BatchGet queries
caching is done by resource name. Invalidation happens for updated/deleted resources for specific instances.
-
List/Search queries
we cache by all query params (filter, parent name, order by, page, phrase in case of search, field mask). If a resource is updated/deleted/created, then we invalidate whole cached query groups by filter only. We will explain more with examples.
We don’t support cache for Watch requests.
To enable cache support for service it is required to:
- Provide cache annotation for each relevant resource in their proto files.
- In server code, during initialization, construct store objects with cache, it’s a very short amount of code.
Let’s define some indices, for simplicity, we show just name patterns and annotations specific to the cache:
message Comment {
option (google.api.resource) = {
type : "forum.edgelq.com/Comment"
pattern : "messages/{message}/comments/{comment}"
pattern : "topics/{topic}/messages/{message}/comments/{comment}"
};
option (goten.annotations.cache) = {
queries : [
{eq_field_paths : ["name"]},
{eq_field_paths : ["name", "user"]}
]
query_reference_patterns : [{
field_path : "name",
patterns : [
"messages/-/comments/-",
"topics/{topic}/messages/-/comments/-"
]
}]
};
};
By default, Goten generates this proto annotation for every resource when the resource is initiated for the first time, but a very minimal one, with the index for the name field only.
We will support caching for:
-
Get/BatchGet requests
it is enabled by default and the
goten.annotations.cache
annotation provides a way to disable it only. Users do not need to do anything here. -
Following List/Search queries which filter/parent SATISFY following filter conditions:
- Group 1:
name = "messages/-/comments/-”
- Group 2:
name = “topics/{topicId}/messages/-/comments/-”
- Group 3:
name = “messages/-/comments/-” AND user = “users/{userId}”
- Group 4:
name = “topics/{topicId}/messages/-/comments/-” AND user = “users/{userId}”
- Group 1:
Since caching by exact name is very simple, we will be discussing only list/search queries.
We have 4 groups of indices. This is because:
-
We have 2 query sets.
one for name and, the other for name with user. The name field has 2 name patterns.
-
Multiply 2 by 2, you have 4.
As a reminder, the presence of the “parent” field in List/Search requests already implies that the final filter will contain the “name” field.
Let’s put some example queries and how invalidation works then. Queries that will be cache-able:
-
LIST { parent = 'topics/t1/messages/m1' filter = '' }
It will belong to group 2.
-
LIST { parent = 'topics/t1/messages/-' filter = '' }
It will belong to group 2.
-
LIST { parent = 'messages/-' filter = '' }
It will belong to group 1.
-
LIST { parent = 'messages/m1' filter = '' }
It will belong to group 1.
-
LIST { parent = 'topics/t1/messages/m1' filter = 'user=”users/u1”' }
It will belong to groups 2 and 4.
-
LIST { parent = 'topics/t1/messages/-' filter = 'user=”users/-”' }
It will belong to group 2.
This query will not be cached: LIST { parent = 'topics/-/messages/-' filter = '' }
Note that exact queries may belong to more than one group. Also note that groups 3 and 4, which require a user, must be given full user reference without wildcards. If we wanted to enable caching also wildcards, then we would need to provide the following annotation:
option (goten.annotations.cache) = {
queries : [
{eq_field_paths : [ "name" ]},
{eq_field_paths : [ "name", "user" ]}
]
query_reference_patterns : [ {
field_path : "name",
patterns : [
"messages/-/comments/-",
"topics/{topic}/messages/-/comments/-"
]
}, {
field_path : "user",
patterns : [ "users/-" ]
} ]
};
The param that allows us to decide to which degree we allow for wildcards is
query_reference_patterns
. This param is actually “present” for every
name/reference field within the resource body that is present in the queries
param. The thing is, if the developer does not provide it, goten will assume
some default. That default is to allow ALL name patterns - but allow the last
segment of the name field to be a wildcard. In other words, the following
annotations are equivalent:
option (goten.annotations.cache) = {
queries : [
{eq_field_paths : [ "name" ]},
{eq_field_paths : [ "name", "user" ]}
]
};
option (goten.annotations.cache) = {
queries : [
{eq_field_paths : [ "name" ]},
{eq_field_paths : [ "name", "user" ]}
]
query_reference_patterns : [ {
field_path : "name",
patterns : [
"messages/{message}/comments/-",
"topics/{topic}/messages/{message}/comments/-"
]
}, {
field_path : "user",
patterns : [ "users/{user}" ]
} ]
};
Going back to our original 4 groups, let’s explain how invalidation works.
Suppose that the following resource is created:
Comment { name: “topics/t1/messages/m1/comments/c1”, user = “users/u1” }
.
Goten will need to delete the following cached query sets:
-
CACHED QUERY SET { name: “topics/t1/messages/-/comments/-” }
filter group 2
-
CACHED QUERY SET { name: “topics/t1/messages/m1/comments/-” }
filter group 2
-
CACHED QUERY SET { name: “topics/t1/messages/m1/comments/c1” }
filter group 2
-
CACHED QUERY SET { name: “topics/t1/messages/m1/comments/c1” user: “users/u1” }
filter group 4
-
CACHED QUERY SET { name: “topics/t1/messages/m1/comments/-” user: “users/u1” }
filter group 4
-
CACHED QUERY SET { name: “topics/t1/messages/-/comments/-” user: “users/u1” }
filter group 4
You can notice that actually, 2 cached query sets may belong to the same filter group - it’s just with a wildcard and with a message specified. All cached query sets are generated from created comments. If the topic/message/user was different, then we would also have different query sets.
We can say, that we have: 2 query field groups, multiplied by 2 patterns for the name field, multiplied by 1 pattern for the user field, multiplied by 3 variants with wildcards in the name pattern. It gives 12 cached query sets for 4 filter groups.
List/Search query is also classified into query sets. For example, a request
SEARCH { phrase = “Error” parent: “topics/t1/messages/m1” filter: “user = users/u2 AND metadata.tags CONTAINS xxx” }
would be put in the following cached query sets:
CACHED QUERY SET { name: “topics/t1/messages/m1/comments/-” user: “users/u2” }
Note that, unlike for resource instances, we are getting the biggest possible cached query set for actual queries. Thanks to that, if there is some update of comment for a specific user and message, then cached queries for the same message and OTHER users will not be invalidated. It’s worth considering this when designing proto-annotation. If a collection gets a lot of updates in general we are getting a lot of invalidations. In that case, it’s worth putting in more possible query field sets, so we are less affected by the high write rate. The more fields are specified, the less likely the update will cause invalidation.
The last remaining thing to mention regarding cache is what kind of filter
conditions are supported. At this moment we cache by two conditions:
Equality (=)
and IN
. In other words, request
SEARCH { phrase = “Error” parent: “topics/t1/messages/m1” filter: “user IN [users/u2, users/u3] AND metadata.tags CONTAINS xxx” }
would be put in the following cached query sets:
CACHED QUERY SET { name: “topics/t1/messages/m1/comments/-” user: “users/u2” }
CACHED QUERY SET { name: “topics/t1/messages/m1/comments/-” user: “users/u3” }
Note that IN queries have a bigger chance of invalidation, because the update of comments from 2 users would cause invalidation. But it’s still better than all users.
Search Indices
If the search feature was enabled in the API specification for a given resource, to make it work it is necessary to add annotation for a resource.
We need to tell:
- Which fields should be fully searchable
- Which fields should be sortable
- Which fields should be filter-able only
Each of those field groups we can define via search specification in the resource. For example, let’s define search spec for an imaginary resource called “Message” (should be easy to understand):
message Message {
option (google.api.resource) = {
type : "forum.edgelq.com/Message"
pattern : "messages/{message}"
pattern : "topics/{topic}/messages/{message}"
};
option (goten.annotations.search) = {
fully_searchable : [
"name", // Name is also a string
"user", // Some reference field (still string)
"content", // string
"metadata.labels", // map<string, string>
"metadata.annotations", // map<string, string>
"metadata.tags" // []string
]
filterable_only : [
"views_count", // integer
"metadata.create_time" // timestamp
]
sortable : [
"views", // integer
"metadata.create_time" // timestamp
]
};
}
Fully searchable fields will be text-indexed AND filterable. They do not only support string fields (name, content, user), they can also support more complex structures that contain strings internally (metadata tags, annotations, labels.) But generally, they should focus on strings. Filterable fields on the other hand can contain non-string elements like numbers, timestamps, booleans, etc. They will not be text-indexed, but can still be used in filters. As a general rule, developers should put string fields (and objects with strings) in a fully searchable category, otherwise is “filterable only”. Sortable fields are of course self-explanatory, they enable sorting for specific fields in both directions. However, during actual queries, only one field can be sorted at once.
Search backend in use may be different from service to service. However, it is the responsibility of the developer to ensure that their chosen backend will support ALL declared search annotations for all relevant resources.
API Group Protobuf Files
SPEKTRA Edge-based Service is a specific version represented by a single protobuf package. It contains multiple API groups, each containing a set of gRPC methods. By default, Goten creates one API group per resource, and its name is equal to that of a resource. By default, it contains CRUD actions, but the developer can add custom ones too in the API-skeleton file.
Files created by goten-bootstrap for each API group are the following:
-
<api_name>_service.proto
This file contains the definition of an API object with its actions from api-skeleton (with CRUD if applicable).
-
<api_name>_custom.proto
This file will contain definitions of requests/responses for custom actions. Each object contains a TODO section because again, this is something that goten cannot fully provide. Those custom files are created only when there are custom actions in the first place.
Files <api_name>_service.proto
are generated each time goten-bootstrap is
invoked. But <api_name>_custom.proto
is generated for the first time only.
If you for example add a custom action after the file exists,
the request/response pair will not be generated. Instead, you will either need
to rename (temporarily) existing files or add full objects manually. It is not
a big issue, however, because code-gen just provides empty messages with
an optionally single field inside, and a TODO section to populate the rest of
the request/response body.
All API groups within the same service will of course share the same endpoint, they will just have different paths and generated code will be packaged per API.
Files ending with _service.proto
should be inspected for beginners, or
debugging/verification, as those contain action annotations that influence
how the request is executed. Based on this example (snippet from inventory
manager):
rpc ListReaderAgents(ListReaderAgentsRequest) returns (ListReaderAgentsResponse) {
option (google.api.http) = {
get : "/v1/{parent=projects/*/regions/*}/readerAgents"
};
option (goten.annotations.method) = {
resource : "ReaderAgent"
is_collection : true
is_plural : true
verb : "list"
request_paths : {resource_parent : [ "parent" ]}
response_paths : {resource_body : [ "reader_agents" ]}
};
option (goten.annotations.tx) = {
read_only : true
transaction : NONE
};
option (goten.annotations.multi_region_routing) = {
skip_code_gen_based_routing : false
execute_on_owning_region : false
};
}
This declaration defines:
-
What is the request, what is the response
-
gRPC Transcoding via
google.api.http
annotationyou can see HTTP method, URL path, capture reference. In this example, we could send
HTTP GET /v1/projects/p1/regions/us-west2/readerAgents
to get a list of agents in project p1, region us-west2. It would set the value of the “parent” field in ListReaderAgentsRequest toprojects/p1/regions/us-west2
-
Annotation
goten.annotations.method
provides basic information (usually self-explanatory). Important fields are those forrequest_paths
andresponse_paths
Usage, Auditing, Authorization, and MultiRegion routing depend on these fields, and they need to exist in request/response objects.
-
Annotation (goten.annotation.tx) defines what transaction middleware does
How the database handle is opened. NONE uses the current connection handle. SNAPSHOT will need a separate session.
-
Annotation
goten.annotations.multi_region_routing
tells how the request is routed and if code-gen is used for it at all.In this case, since this is a reading request (List), we do not require a request to be executed on the region owning agents, it can be executed in the region where read-only copies are also available.
Note that all of this is copied/derived from the API specification.
Service Package Definition
Finally, among generated protobuf files there is one last time wrapping up
information about the service package (with one version):
<service_name>.proto
. It looks like:
// Goten Service InventoryManager
option (goten.annotations.service_pkg) = {
// Human friendly short name
name : "ServiceName"
// We will have meta.goten.com/Service resource with name services/service-name.edgelq.com
domain : "service-name.edgelq.com"
// Current version
version : "v1"
// All imported services
imported_services : {
domain : "imported.edgelq.com"
version : "v1"
proto_pkg : "ntt.imported.v1"
}
};
There can be only one file within a proto package like this.
Goten Protobuf Types and other Annotations
When modeling service in Goten with protobuf files, it is just required to use normal proto in version 3 syntax. There are worth mentioning additional elements to consider:
Set of custom types (you should have seen many of them in standard CRUD):
message ExampleSet {
// This string must conform to naming pattern of specified resource.
string name_type = 1 [(goten.annotations.type).name.resource = "ResourceName"];
// This string must conform to the naming pattern of specified resource. Also,
// references in Goten are validated against actual resources (if specified within
// resource).
string reference_type = 2 [(goten.annotations.type).reference = {
resource : "ResourceName"
target_delete_behavior : ASYNC_CASCADE_DELETE
}];
// This string must conform to parent naming pattern of specified resource.
string parent_name_type = 3 [(goten.annotations.type).parent_name.resource = "ResourceName"];
// This string contains token used for pagination (list/search/watch queries). Its contents
// are validated into specific value required by ResourceName.
string cursor_type = 4 [(goten.annotations.type).pager_cursor.resource = "ResourceName"];
// This should contain value like "field_name ASC". Field name must exist within specified ResourceName.
string order_by_type = 5 [(goten.annotations.type).order_by.resource = "ResourceName"];
// This should contain string with conditions using AND condition: We support equality conditions (like ==, >),
// IN, CONTAINS, CONTAINS-ANY, NOT IN, IS NULL... some specific queries may be unsupported by underlying
// database though. Field paths used must exist within ResourceName.
string filter_type = 6 [(goten.annotations.type).filter.resource = "ResourceName"];
// This is the only non-string custom type. This annotation forces all values within
// this mask to be valid within ResourceName.
google.protobuf.FieldMask field_mask_type = 7 [(goten.annotations.type).field_mask.resource = "ResourceName"];
}
When modeling resources/requests/responses, it is important to keep in mind any input validation, to avoid bugs or more malicious intent. You should use annotations from here: https://github.com/cloudwan/goten/blob/main/annotations/validate.proto
An example is here: https://github.com/cloudwan/goten/blob/main/compiler/validate/example.proto
As of now, we don’t apply default string maximum values (we may in the future), so it is worth considering upfront.