Auto-Generated Protobuf Files

How to understand the auto-generated service Protobuf files.

Protobuf files describe:

Resources - models, database indices, name patterns, views, etc.
Request/Response object definitions (bodies)
API groups, each with a list of methods
Service package metadata information

Note that you can read about API just all by looking at protobuf files.

Example proto files for 3rd party app: https://github.com/cloudwan/inventory-manager-example/tree/master/proto/v1 You can also see files in the edgelq repository too.

Resource protobuf files

For each resource in the API specification, Goten will create 2 protobuf files:

<resource_name>.proto

This file will contain the proto definition of a single resource.
<resource_name>_change.proto

This file will contain the proto definition of the DIFF object of a resource.

Protobuf with Change object is used for Watch requests, real-time subscriptions.

Be aware that:

goten-bootstrap will always overwrite <resource_name>_change.proto

You should never write to it.
File <resource_name>.proto will be generated for the first time only.

If you change anything in the API skeleton later that would affect the proto file, you will need to either update the file manually in the way the bootstrap utility would, or rename the file and let a new one be generated. You will need to copy all manually written modifications back to the newly generated file. Typically it means resource fields and additional import files.

When a resource file is generated for the first time, it will have name and metadata fields, plus special annotations applicable for resources only. You will need to replace TODO sections in the resource.

The first notable annotation is google.api.resource, like:

option (google.api.resource) = {
  type : "inventory-manager.examples.edgelq.com/Site"
  pattern : "projects/{project}/regions/{region}/sites/{site}"
};

You should note that this annotation will always show you a list of all possible name patterns. Whenever you change something later in the API specification (parents or scopeAttributes), you will need to modify this annotation manually.

Second, a more important annotation is the one provided by Goten, for example:

option (goten.annotations.resource) = {
  id_pattern : "[a-zA-Z0-9_.-]{1,128}"              // This is default value, this is set initially from api-skeleton idPattern param!
  collection : "sites"                              // Always plural and lowerCamelJson
  plural : "sites"                                  // Equal to collection
  parents : "Project"                               // If there are many parents, we will have many "parents:"
  on_parent_deleted_behavior : ASYNC_CASCADE_DELETE // Highly recommended, typical in SPEKTRA Edge
  scope_attributes : "goten.annotations/Region"     // Set for regional resources
  async_deletion : false                            // If set to true, resource will not disappear immediately after deletion.
};

This one shows basic properties like a list of parents, scope attributes, or what happens when a parent is deleted. Parent deletion will always need to be set for each resource. From SPEKTRA Edge’s perspective, we recommend however cascade deletion (and better to do this asynchronously). You may do this in-transaction deletion if you are certain there will be no more than 10 kid resources at once. Especially project kids should use asynchronous cascade deletion. We strive to make project deletion rather a smooth process (although warning: SOFT delete option is not implemented yet).

Parameter async_deletion should have an additional note: When a resource is deleted, by default its record is removed from the database. However, if async_deletion is true, then it will stay till all backreferences are cleaned up (no resource points at us). In some cases it may take considerable time: for example large project deletion.

We recommend setting async_deletion to true for top resources, like Project.

References to other resources

Setting a reference to other resources is pretty straightforward, it follows this pattern:

message SomeResource {
  option (google.api.resource) = { ... };

  option (goten.annotations.resource) = { ... };
  
  string reference_to_resource_from_current_service = 3 [
    (goten.annotations.type).reference = {
      resource: "OtherResource"
      target_delete_behavior : BLOCK
    }
  ];

  string reference_to_resource_from_different_service = 4 [
    (goten.annotations.type).reference = {
      resource: "different.edgelq.com/DifferentResource"
      target_delete_behavior : BLOCK
    }
  ];
}

Note you always need to specify target deletion behavior. If you just want to hold the resource name, but it is not supposed to be a true reference, then you should use (goten.annotations.type).name.resource annotation.

References to resources from different services or different regions will implicitly switch to ASYNC versions of UNSET/CASCADE_DELETE!

Views

Reading methods (Get, BatchGet, List, Watch, Search - if enabled) normally have a field_mask field in their request bodies. Field mask selects which fields should be returned in the response, or the case of the watch, incremental real-time updates. Apart from field mask field, there is another one: view. View indicates the default field mask that should be applied. If both view and field_mask are specified in a request, then their masks are just merged.

There are the following view types available: NAME, BASIC, DETAIL, and FULL. The first one is a two-element field mask, with fields name and display_name (if it is defined in a resource!). The last one should be self-explanatory. Two other ones by default are undefined and if they are used, they will work as FULL ones. Developers can define any 4 of them, even NAME and FULL - those will be just overwritten. This can be done using annotation goten.annotations.resource.

message SomeResource {
  option (goten.annotations.resource) = {
    ...
    views : [
      {
        view : BASIC
        fields : [
          {path : "name"},
          {path : "some_field"},
          {path : "other_field"}
        ]
      },
      {
        view : DETAIL
        fields : [
          {path : "name"},
          {path : "some_field"},
          {path : "other_field"},
          {path : "outer.nested"}
        ]
      }
    ]
  };
}

Note that you need to specify fields using snake_case. You can specify nested fields too.

Database indices

List/Watch requests work on a “best effort” basis in principle. However, sometimes indices are needed for performance, or, like in the case of Firestore, to make certain queries even possible.

Database indices are declared in protobuf definitions in each resource. During startup, db-controller runtime uses libraries provided by Goten to ensure indices in protobuf match those in the database. Note that you should not create indices on your own unless for experimentation.

Let’s define some examples, for simplicity we show just name patterns and indices annotations, fields can be imagined:

message Device {
  option (google.api.resource) = {
    type : "example.edgelq.com/Device"
    pattern : "projects/{project}/devices/{device}"
  };

  option (goten.annotations.indices) = {
    composite : {
      sorting_groups : [
        {
          name : "byDisplayName",
          order_by : "display_name",
          scopes : [ "projects/{project}/devices/-" ]
        },
        {
          name : "bySerialNumber"
          order_by : "info.serial_number"
          scopes : [
            "projects/-/devices/-",
            "projects/{project}/devices/-"
          ]
        }
      ]
      filters : [
        {
          field_path : "info.model"
          required : true
          restricted_sorting_groups : [ "bySerialNumber" ]
        },
        {
          field_path : "info.maintainer_group"
          reference_patterns : [ "projects/{project}/maintanenceGroups/{maintanenceGroup}" ]
        }
      ]
    }
    single : [ {field_path : "machine_type"} ]
  };
}

There are two indices types: single-field and composite. Single should be pretty straightforward, you specify just the field path (can be nested with dots), and the index should be usable by this field. Composite indices are generated based on sorting groups combined with filters.

Composite indices are optimized for sorting - but as of now, only one sorting field is supported. However, if the sorting field is different from the name, then “name” is additionally added, to ensure sorting is stable. In the above example, composite indices can be divided into two groups - those with sorting by display_name, or info.serial_number.

Note that the sorting field path also is usable for filtering, therefore, if you just need a specific composite index for multiple fields for filtering, you can just pick some field that may be optionally used for sorting too. Apart from that, each sorting group has built-in filter support for name fields, for specified patterns only (scopes).

Attached filters can either be required (and if the filter is not specified in a query, it will not be indexed), or optional (each non-required filter doubles the amount of generated indices.)

Based on the above example, generated composite indices will be:

filter (name.projectId) orderBy (display_name ASC, name.deviceId ASC)
filter (name.projectId) orderBy (display_name DESC, name.deviceId DESC)
filter (name.projectId, info.maintainer_group) orderBy (display_name ASC, name.deviceId ASC)
filter (name.projectId, info.maintainer_group) orderBy (display_name DESC, name.deviceId DESC)
filter (info.model) orderBy (info.serial_number ASC, name.projectId ASC, name.deviceId ASC)
filter (info.model) orderBy (info.serial_number DESC, name.projectId DESC, name.deviceId DESC)
filter (name.projectId, info.model) orderBy (info.serial_number ASC, name.deviceId ASC)
filter (name.projectId, info.model) orderBy (info.serial_number DESC, name.deviceId DESC)
filter (info.model, info.maintainer_group) orderBy (info.serial_number ASC, name.projectId ASC, name.deviceId ASC)
filter (info.model, info.maintainer_group) orderBy (info.serial_number DESC, name.projectId DESC, name.deviceId DESC)
filter (name.projectId, info.model, info.maintainer_group) orderBy (info.serial_number ASC, name.deviceId ASC)
filter (name.projectId, info.model, info.maintainer_group) orderBy (info.serial_number DESC, name.deviceId DESC)

When we sort by display_name, to utilize the composite index, we should also filter by the projectId part of the name field. Additional sorting by name.deviceId part is added implicitly to any order. If we add info.maintainer_group to the filter, we will switch to a different composite index.

If we just filter by display_name (we can use > or < operators too!), and add filter by projectId part of the name, then one of those first composite indices will be used too.

When defining indices - be aware of multiplications. Each sorting group has two multipliers - the next multiply is the number of possible name patterns we add (scopes). Finally, for each non-required field, we multiply the number of indices by 2. Here we generated 12 composite indices and 1 single-field one. The amount of indices is important from the perspective of the database used, in Firestore we can have 200 indices typically per database, and in Mongo 64 per collection.

Cache indices

To improve performance & reduce database usage, Goten & SPEKTRA Edge utilize Redis as a database cache.

Service developers should carefully analyze which queries are mostly used, what is the update rate, etc. With goten cache, we support:

Get/BatchGet queries

caching is done by resource name. Invalidation happens for updated/deleted resources for specific instances.
List/Search queries

we cache by all query params (filter, parent name, order by, page, phrase in case of search, field mask). If a resource is updated/deleted/created, then we invalidate whole cached query groups by filter only. We will explain more with examples.

We don’t support cache for Watch requests.

To enable cache support for service it is required to:

Provide cache annotation for each relevant resource in their proto files.
In server code, during initialization, construct store objects with cache, it’s a very short amount of code.

Let’s define some indices, for simplicity, we show just name patterns and annotations specific to the cache:

message Comment {
  option (google.api.resource) = {
    type : "forum.edgelq.com/Comment"
    pattern : "messages/{message}/comments/{comment}"
    pattern : "topics/{topic}/messages/{message}/comments/{comment}"
  };

  option (goten.annotations.cache) = {
    queries : [
      {eq_field_paths : ["name"]},
      {eq_field_paths : ["name", "user"]}
    ]
    query_reference_patterns : [{
      field_path : "name",
      patterns : [
        "messages/-/comments/-",
        "topics/{topic}/messages/-/comments/-"
      ]
    }]
  };
};

By default, Goten generates this proto annotation for every resource when the resource is initiated for the first time, but a very minimal one, with the index for the name field only.

We will support caching for:

Get/BatchGet requests

it is enabled by default and the goten.annotations.cache annotation provides a way to disable it only. Users do not need to do anything here.
Following List/Search queries which filter/parent SATISFY following filter conditions:
- Group 1: name = "messages/-/comments/-”
- Group 2: name = “topics/{topicId}/messages/-/comments/-”
- Group 3: name = “messages/-/comments/-” AND user = “users/{userId}”
- Group 4: name = “topics/{topicId}/messages/-/comments/-” AND user = “users/{userId}”

Since caching by exact name is very simple, we will be discussing only list/search queries.

We have 4 groups of indices. This is because:

We have 2 query sets.

one for name and, the other for name with user. The name field has 2 name patterns.
Multiply 2 by 2, you have 4.

As a reminder, the presence of the “parent” field in List/Search requests already implies that the final filter will contain the “name” field.

Let’s put some example queries and how invalidation works then. Queries that will be cache-able:

LIST { parent = 'topics/t1/messages/m1' filter = '' }

It will belong to group 2.
LIST { parent = 'topics/t1/messages/-' filter = '' }

It will belong to group 2.
LIST { parent = 'messages/-' filter = '' }

It will belong to group 1.
LIST { parent = 'messages/m1' filter = '' }

It will belong to group 1.
LIST { parent = 'topics/t1/messages/m1' filter = 'user=”users/u1”' }

It will belong to groups 2 and 4.
LIST { parent = 'topics/t1/messages/-' filter = 'user=”users/-”' }

It will belong to group 2.

This query will not be cached: LIST { parent = 'topics/-/messages/-' filter = '' }

Note that exact queries may belong to more than one group. Also note that groups 3 and 4, which require a user, must be given full user reference without wildcards. If we wanted to enable caching also wildcards, then we would need to provide the following annotation:

 option (goten.annotations.cache) = {
   queries : [
     {eq_field_paths : [ "name" ]},
     {eq_field_paths : [ "name", "user" ]}
   ]
   query_reference_patterns : [ {
     field_path : "name",
     patterns : [
       "messages/-/comments/-",
       "topics/{topic}/messages/-/comments/-"
     ]
   }, {
     field_path : "user",
     patterns : [ "users/-" ]
   } ]
 };

The param that allows us to decide to which degree we allow for wildcards is query_reference_patterns. This param is actually “present” for every name/reference field within the resource body that is present in the queries param. The thing is, if the developer does not provide it, goten will assume some default. That default is to allow ALL name patterns - but allow the last segment of the name field to be a wildcard. In other words, the following annotations are equivalent:

 option (goten.annotations.cache) = {
   queries : [
     {eq_field_paths : [ "name" ]},
     {eq_field_paths : [ "name", "user" ]}
   ]
 };


 option (goten.annotations.cache) = {
   queries : [
     {eq_field_paths : [ "name" ]},
     {eq_field_paths : [ "name", "user" ]}
   ]
   query_reference_patterns : [ {
     field_path : "name",
     patterns : [
       "messages/{message}/comments/-",
       "topics/{topic}/messages/{message}/comments/-"
     ]
   }, {
     field_path : "user",
     patterns : [ "users/{user}" ]
   } ]
 };

Going back to our original 4 groups, let’s explain how invalidation works. Suppose that the following resource is created: Comment { name: “topics/t1/messages/m1/comments/c1”, user = “users/u1” }.

Goten will need to delete the following cached query sets:

CACHED QUERY SET { name: “topics/t1/messages/-/comments/-” }

filter group 2
CACHED QUERY SET { name: “topics/t1/messages/m1/comments/-” }

filter group 2
CACHED QUERY SET { name: “topics/t1/messages/m1/comments/c1” }

filter group 2
CACHED QUERY SET { name: “topics/t1/messages/m1/comments/c1” user: “users/u1” }

filter group 4
CACHED QUERY SET { name: “topics/t1/messages/m1/comments/-” user: “users/u1” }

filter group 4
CACHED QUERY SET { name: “topics/t1/messages/-/comments/-” user: “users/u1” }

filter group 4

You can notice that actually, 2 cached query sets may belong to the same filter group - it’s just with a wildcard and with a message specified. All cached query sets are generated from created comments. If the topic/message/user was different, then we would also have different query sets.

We can say, that we have: 2 query field groups, multiplied by 2 patterns for the name field, multiplied by 1 pattern for the user field, multiplied by 3 variants with wildcards in the name pattern. It gives 12 cached query sets for 4 filter groups.

List/Search query is also classified into query sets. For example, a request SEARCH { phrase = “Error” parent: “topics/t1/messages/m1” filter: “user = users/u2 AND metadata.tags CONTAINS xxx” } would be put in the following cached query sets: CACHED QUERY SET { name: “topics/t1/messages/m1/comments/-” user: “users/u2” }

Note that, unlike for resource instances, we are getting the biggest possible cached query set for actual queries. Thanks to that, if there is some update of comment for a specific user and message, then cached queries for the same message and OTHER users will not be invalidated. It’s worth considering this when designing proto-annotation. If a collection gets a lot of updates in general we are getting a lot of invalidations. In that case, it’s worth putting in more possible query field sets, so we are less affected by the high write rate. The more fields are specified, the less likely the update will cause invalidation.

The last remaining thing to mention regarding cache is what kind of filter conditions are supported. At this moment we cache by two conditions: Equality (=) and IN. In other words, request SEARCH { phrase = “Error” parent: “topics/t1/messages/m1” filter: “user IN [users/u2, users/u3] AND metadata.tags CONTAINS xxx” } would be put in the following cached query sets:

CACHED QUERY SET { name: “topics/t1/messages/m1/comments/-” user: “users/u2” } CACHED QUERY SET { name: “topics/t1/messages/m1/comments/-” user: “users/u3” }

Note that IN queries have a bigger chance of invalidation, because the update of comments from 2 users would cause invalidation. But it’s still better than all users.

Search Indices

If the search feature was enabled in the API specification for a given resource, to make it work it is necessary to add annotation for a resource.

We need to tell:

Which fields should be fully searchable
Which fields should be sortable
Which fields should be filter-able only

Each of those field groups we can define via search specification in the resource. For example, let’s define search spec for an imaginary resource called “Message” (should be easy to understand):

message Message {
 option (google.api.resource) = {
   type : "forum.edgelq.com/Message"
   pattern : "messages/{message}"
   pattern : "topics/{topic}/messages/{message}"
 };

 option (goten.annotations.search) = {
  fully_searchable : [
   "name",                 // Name is also a string
   "user",                 // Some reference field (still string)
   "content",              // string
   "metadata.labels",      // map<string, string>
   "metadata.annotations", // map<string, string>
   "metadata.tags"         // []string
  ]
  filterable_only : [
   "views_count",          // integer
   "metadata.create_time"  // timestamp
  ]
  sortable : [
   "views",                // integer
   "metadata.create_time"  // timestamp
  ]
 };
}

Fully searchable fields will be text-indexed AND filterable. They do not only support string fields (name, content, user), they can also support more complex structures that contain strings internally (metadata tags, annotations, labels.) But generally, they should focus on strings. Filterable fields on the other hand can contain non-string elements like numbers, timestamps, booleans, etc. They will not be text-indexed, but can still be used in filters. As a general rule, developers should put string fields (and objects with strings) in a fully searchable category, otherwise is “filterable only”. Sortable fields are of course self-explanatory, they enable sorting for specific fields in both directions. However, during actual queries, only one field can be sorted at once.

Search backend in use may be different from service to service. However, it is the responsibility of the developer to ensure that their chosen backend will support ALL declared search annotations for all relevant resources.

API Group Protobuf Files

SPEKTRA Edge-based Service is a specific version represented by a single protobuf package. It contains multiple API groups, each containing a set of gRPC methods. By default, Goten creates one API group per resource, and its name is equal to that of a resource. By default, it contains CRUD actions, but the developer can add custom ones too in the API-skeleton file.

Files created by goten-bootstrap for each API group are the following:

<api_name>_service.proto

This file contains the definition of an API object with its actions from api-skeleton (with CRUD if applicable).
<api_name>_custom.proto

This file will contain definitions of requests/responses for custom actions. Each object contains a TODO section because again, this is something that goten cannot fully provide. Those custom files are created only when there are custom actions in the first place.

Files <api_name>_service.proto are generated each time goten-bootstrap is invoked. But <api_name>_custom.proto is generated for the first time only. If you for example add a custom action after the file exists, the request/response pair will not be generated. Instead, you will either need to rename (temporarily) existing files or add full objects manually. It is not a big issue, however, because code-gen just provides empty messages with an optionally single field inside, and a TODO section to populate the rest of the request/response body.

All API groups within the same service will of course share the same endpoint, they will just have different paths and generated code will be packaged per API.

Files ending with _service.proto should be inspected for beginners, or debugging/verification, as those contain action annotations that influence how the request is executed. Based on this example (snippet from inventory manager):

rpc ListReaderAgents(ListReaderAgentsRequest) returns (ListReaderAgentsResponse) {
  option (google.api.http) = {
    get : "/v1/{parent=projects/*/regions/*}/readerAgents"
  };
  option (goten.annotations.method) = {
    resource : "ReaderAgent"
    is_collection : true
    is_plural : true
    verb : "list"
    request_paths : {resource_parent : [ "parent" ]}
    response_paths : {resource_body : [ "reader_agents" ]}
  };
  option (goten.annotations.tx) = {
    read_only : true
    transaction : NONE
  };
  option (goten.annotations.multi_region_routing) = {
    skip_code_gen_based_routing : false
    execute_on_owning_region : false
  };
}

This declaration defines:

What is the request, what is the response
gRPC Transcoding via google.api.http annotation

you can see HTTP method, URL path, capture reference. In this example, we could send HTTP GET /v1/projects/p1/regions/us-west2/readerAgents to get a list of agents in project p1, region us-west2. It would set the value of the “parent” field in ListReaderAgentsRequest to projects/p1/regions/us-west2
Annotation goten.annotations.method provides basic information (usually self-explanatory). Important fields are those for request_paths and response_paths

Usage, Auditing, Authorization, and MultiRegion routing depend on these fields, and they need to exist in request/response objects.
Annotation (goten.annotation.tx) defines what transaction middleware does

How the database handle is opened. NONE uses the current connection handle. SNAPSHOT will need a separate session.
Annotation goten.annotations.multi_region_routing tells how the request is routed and if code-gen is used for it at all.

In this case, since this is a reading request (List), we do not require a request to be executed on the region owning agents, it can be executed in the region where read-only copies are also available.

Note that all of this is copied/derived from the API specification.

Service Package Definition

Finally, among generated protobuf files there is one last time wrapping up information about the service package (with one version): <service_name>.proto. It looks like:

// Goten Service InventoryManager
option (goten.annotations.service_pkg) = {
  // Human friendly short name
  name : "ServiceName"
  
  // We will have meta.goten.com/Service resource with name services/service-name.edgelq.com
  domain : "service-name.edgelq.com"

  // Current version
  version : "v1"
  
  // All imported services
  imported_services : {
    domain : "imported.edgelq.com"
    version : "v1"
    proto_pkg : "ntt.imported.v1"
  }
};

There can be only one file within a proto package like this.

Goten Protobuf Types and other Annotations

When modeling service in Goten with protobuf files, it is just required to use normal proto in version 3 syntax. There are worth mentioning additional elements to consider:

Set of custom types (you should have seen many of them in standard CRUD):

message ExampleSet {
  // This string must conform to naming pattern of specified resource.
  string name_type = 1 [(goten.annotations.type).name.resource = "ResourceName"];
  
  // This string must conform to the naming pattern of specified resource. Also,
  // references in Goten are validated against actual resources (if specified within
  // resource).
  string reference_type = 2 [(goten.annotations.type).reference = {
    resource : "ResourceName"
    target_delete_behavior : ASYNC_CASCADE_DELETE
  }];
  
  // This string must conform to parent naming pattern of specified resource.
  string parent_name_type = 3 [(goten.annotations.type).parent_name.resource = "ResourceName"];
  
  // This string contains token used for pagination (list/search/watch queries). Its contents
  // are validated into specific value required by ResourceName.
  string cursor_type = 4 [(goten.annotations.type).pager_cursor.resource = "ResourceName"];
  
  // This should contain value like "field_name ASC". Field name must exist within specified ResourceName.
  string order_by_type = 5 [(goten.annotations.type).order_by.resource = "ResourceName"];
  
  // This should contain string with conditions using AND condition: We support  equality conditions (like ==, >),
  // IN, CONTAINS, CONTAINS-ANY, NOT IN, IS NULL... some specific queries may be unsupported by underlying
  // database though. Field paths used must exist within ResourceName.
  string filter_type = 6 [(goten.annotations.type).filter.resource = "ResourceName"];
  
  // This is the only non-string custom type. This annotation forces all values within
  // this mask to be valid within ResourceName.
  google.protobuf.FieldMask field_mask_type = 7 [(goten.annotations.type).field_mask.resource = "ResourceName"];
}

When modeling resources/requests/responses, it is important to keep in mind any input validation, to avoid bugs or more malicious intent. You should use annotations from here: https://github.com/cloudwan/goten/blob/main/annotations/validate.proto

An example is here: https://github.com/cloudwan/goten/blob/main/compiler/validate/example.proto

As of now, we don’t apply default string maximum values (we may in the future), so it is worth considering upfront.