This document provides instructions on how this bootstrap utility works and by extension, helps you contribute here.
This is the multi-page printable view of this section. Click here to print.
Goten as a Compiler
- 1: goten-bootstrap
- 2: Goten Protobuf Compilers
- 3: Abstract Syntax Tree
- 4: Goten protobuf-go Extension
- 5: Goten TypeScript compiler
1 - goten-bootstrap
goten-bootstrap
executable?Utility goten-bootstrap
is a tool generating proto files from
the specification file, also known as api-skeleton. In the goten
repository, you can find the following files for the api-skeleton
schema:
annotations/bootstrap.proto
with JSON generated schema from it inschemas/api-skeleton.schema.json
. This is the place you can modify input to bootstrap.
Runtime entry (main.go
) can be found in the cmd/goten-bootstrap
directory. It imports package in compiler/bootstrap
directory, which
pretty much contains the whole code for the goten-bootstrap utility.
This is the place to explore if you want to modify generated protobuf
files.
In main.go
you can see two primary steps:
-
Initialize the Service package object and pass it to the generator.
During initialization, we validate input, populate defaults, and deduce all values.
-
It then attaching all implicit API groups per each resource.
First look at the Generator object initialized with NewGenerator
.
The relevant file is compiler/bootstrap/generate.go
, which contains
the ServiceGenerator struct with a single public method Generate
.
It takes parsed, validated, and initialized Service object
(as described in the service.go
file), then just generates all
relevant files, with API groups and resources using regular for loops.
Template protobuf files are all in tmpl
subdirectory.
See the initTmpls
function of ServiceGenerator: It collects all
template files as giant strings (because those are strings…),
parses them, and adds some set of functions that can be used within
{{ }}
. Those big strings are “render-able” objects, see
https://pkg.go.dev/text/template for more details, but normally I
find them self-explanatory. In those template strings, you see often:
-
{{functionName <ARGS>}}
The word is some function. It may be built-in like define, range, if, or it may be a function we provided. From
initTmpls
you may see functions likeuniqueResources
,formatActionReplacement
etc. Those are our functions. They may take arguments. -
{{$variable}}
This variable must be initialized somewhere using
:=
operator. Those are Golang objects under the hood! You can access even sub-fields with dots.
, or even call functions (but without arguments). -
{{.}}
This is a special kind of “current” active variable. In a given moment only one variable may be active. You may access its properties from regular variables like
{{ $otherVar := .Field1.Field2 }}
. -
With
{{
or}}
you may see dashes:
{{-
or-}}
. Their purpose is to remove whitespace (typically newline) behind or after them. It makes output nicer, but may occasionally render code non-compilable.
In generate.go
, see svcgen.tmpl.ExecuteTemplate(file, tmplName, data)
.
The first argument is the file writer object where the protobuf file will
be generated. The second argument is a string, for example,
resourceSchemaFile
. The third argument is an active variable that can
be accessed as {{.}}
, which we mentioned. For example, you should see
the following piece of code there:
if err := svcgen.genFile(
"resourceSchemaFile",
resource.Service.Proto.Package.CurrentVersion,
fileName,
resource,
svcgen.override,
); err != nil {
return fmt.Errorf("error generating resource file %s: %s", fileName, err)
}
The function genFile
passes resourceSchemaFile
as the second argument
to tmpl.ExecuteTemplate
, and object resource
is passed as the last
argument to tmpl.ExecuteTemplate
. This resource object is of type
Resource
which you can see in the file resource.go
.
How Golang templates are executed: Runtime will try to find the following piece of the template:
{{ define "resourceSchemaFile" }} ... {{ end }}
In this instance, you can find it in file tmpl/resource.tmpl.go
, it
starts with:
package tmpl
// language=gohtml
const ResourceTmplString = `
{{- define "resourceSchemaFile" -}}
{{- /*gotype: github.com/cloudwan/goten/annotations/bootstrap.Resource*/ -}}
{{- $resource := . }}
... stuff here....
{{ end }}
By convention, we try to provide what kind of object was passed as dot
.
under define, at least for main templates. Since range loops
override dot value, to avoid losing resource reference (and for
clarity), we often save current dot into a named variable.
Golang generates from the beginning of define
till it reaches relevant
{{ end }}
. When it sees {{ template "..." <ARG> }}
, it calls another
define
, and passes arg as the next “dot”. To pass multiple arguments,
we often provide a dictionary using the dict
function: {{ template "... name ..." dict <KEY1> <VALUE1> ... }}
. Dict accepts N arguments and
just makes a single object. You can see that we implemented this function
in initTmpls
! The generated final string is outputted to the specified
file writer. This is how all protobuf files are generated.
Note that ServiceGenerator skips certain templates depending on the overrideFile argument. This is why resources and custom files for API groups are generated only once, to avoid overriding developer code. Perhaps in the future, we should be able to do some merging. That’s all regarding the generation part.
Also, very important is parsing the api-skeleton service package schema
and wrapping it with the Service
object as defined in the service.go
file. Note that YAML in api-skeleton contains the definition of a Service
not in the compiler/bootstrap/service.go
, but
annotations/bootstrap/bootstrap.pb.go
file. See function
ParseServiceSkeletonFiles
in compiler/bootstrap/utils.go
. It loads
base bootstrap objects from yaml, and parses to according to the protobuf
definition, but then we wrap them with the proper Service object. After
we load all Service objects (including the next version and imported ones),
we are calling the Init
function of a Service. This is where we validate
all input properly, and where we put default values missing in api-skeleton.
The largest example is the function InitMainApi
, which is called from
service.go
for each resource owned by the service. It adds our implicit
APIs with full CRUD methods, it should be visible how all those “implicit”
features play out there. We try also to validate as much input as possible.
Any error messages must be wrapped with another error, so we return
the full message at the top.
2 - Goten Protobuf Compilers
Protobuf was developed by Google, and it has been implemented in many
languages, including Golang. Each supported language provides a protoc
compiler. For Golang, there exists protoc-gen-go, which takes protobuf
files and generates Golang files. This tool however is massively
insufficient compared to what we need in Goten: We have custom types,
extra functionality, and a full-blown framework generating almost all
the server code. We developed our protoc compilers which replace standard
protoc-gen-go. We have many protoc compilers, see the cmd
directory:
-
protoc-gen-goten-go
This is the main replacement of the standard
protoc-gen-go
. It generates all the base go files you can see throughout theresources/
andclient/
modules, typically. All the generated files ending with.pb.go
. -
protoc-gen-goten-client
It compiles some files in the
client
directory, except those ending with.pb.go
, which contain basic types. -
protoc-gen-goten-server
It generates those middleware and default core files under the
server
directory. -
protoc-gen-goten-controller
It generates controller packages, as described in the developer guide.
-
protoc-gen-goten-store
It generates files under the
store
directory. -
protoc-gen-goten-access
It compiles files in the
access
directory. -
protoc-gen-goten-resource
It focuses on protobuf objects annotated as resources but does not generate anything for “under” resources. It produces most of the files in the
resources
directory for each service. This includespb.access.go
,pb.collections.go
,pb.descriptor.go
,pb.filter.go
,pb.filterbuilder.go
,pb.name.go
,pb.namebuilder.go
,pb.pagination.go
,pb.query.go
,pb.view.go
,pb.change.go
. -
protoc-gen-goten-object
It provides additional optional types over protoc-gen-goten-go, those types are FieldPath, FieldMask, additional methods for merging, cloning, and diffing objects. You can see them in files ending with
pb.fieldmask.go
,pb.fieldpath.go
,pb.fieldpathbuider.go
,pb.object_ext.go
. This is done for resources or sub-objects used by resources. For example, in the goten repository, you can see files from this protoc compiler undertypes/meta/
directory. -
protoc-gen-goten-cli
It compiles files in the
cli
directory. -
protoc-gen-goten-validate
It generates
pb.validate.go
files you can typically find in theresources
directory, but it’s not necessarily limited there. -
protoc-gen-goten-versioning
It generates all versioning transformers under the
versioning
directory. -
protoc-gen-goten-doc
It generates markdown documentation files based on proto files (often docs directory).
-
protoc-gen-goten-jsonschema
It is a separate compiler for parsing
bootstrap.proto
into API skeleton JSON schema.
Depending on which files you want to be generated differently, or which you want to study, you need to start with relevant compiler.
Pretty much any compiler in the cmd
directory maps to some module
in the compiler
directory (there are exceptions like the ast
package!). For example:
- cmd/protoc-gen-goten-go maps to compiler/gengo.
- cmd/protoc-gen-goten-client maps to compiler/client.
Each of these compilers takes a set of protobuf files as the input. When you see some bash code like:
protoc \
-I "${PROTOINCLUDE}" \
"--goten-go_out=:${GOGENPATH}" \
"--goten-validate_out=${GOGENPATH}" \
"--goten-object_out=:${GOGENPATH}" \
"--goten-resource_out=:${GOGENPATH}" \
"--goten-access_out=:${GOGENPATH}" \
"--goten-cli_out=${GOGENPATH}" \
"--goten-versioning_out=:${GOGENPATH}" \
"--goten-store_out=datastore=firestore:${GOGENPATH}" \
"--goten-server_out=lang=:${GOGENPATH}" \
"--goten-client_out=:${GOGENPATH}" \
"--goten-doc_out=service=Meta:${SERVICEPATH}/docs/apis" \
"${SERVICEPATH}"/proto/v1/*.proto
It simply means we are calling many of those protoc utilities. In
the flag we pass proto include paths, so protos can be parsed correctly
and linked to others. In this shell, in the last line, we are passing
all files for which we want code to be generated. In this case, it is
all files we can find in the ${SERVICEPATH}"/proto/v1
directory.
3 - Abstract Syntax Tree
Let’s analyze one of the modules, protoc-get-goten-object, as an example, to understand the internals of the Goten protobuf compiler.
func main() {
pgs.Init(pgs.DebugEnv("DEBUG_PGV")).
RegisterModule(object.New()).
RegisterPostProcessor(utils.CalmGoFmt()).
Render()
}
For a starter, we utilize https://github.com/lyft/protoc-gen-star
library. It does the initial parsing of all proto objects for us. Then
it invokes a module for file generation. We are passing our object module,
from the compiler/object/object.go
file:
func New() *Module {
return &Module{ModuleBase: &pgs.ModuleBase{}}
}
Once it finishes generating files, all will be formatted by GoFmt. However,
as you should note, the main processing unit of the compiler is always in
the compiler/<name>/<name>.go
file. This pgs
library is calling the
following functions from the passed module, like the one in the object
directory: InitContext, then Execute.
Inside InitContext
, we are getting some BuildContext, but it only carries
arguments that we need to pass to the primary context object. All protobuf
compilers (access, cli, client, controller, gengo, object, resource…) use
the GoContext
object we are defining in the compiler/gengo/context.go
file. It is questionable if this file should be in the gengo directory.
It is there because the gengo compiler is the most basic for all Golang
compilers. GoContext inherits also a Context from the compiler/shared
directory. The idea is, that potentially we could support other programming
languages in some limited way. We do in SPEKTRA Edge we have a specialized
compiler for TypeScript.
Regardless, GoContext is necessary during compilation. Traditionally in Golang, Golang provides the same interface that behaves a bit differently depending on “who is asking and under what circumstances”.
Next (still in the InitContext
function), you can see that Module in
object.go
imports the gengo module from compiler/gengo
using
NewWithContext
. This function is used by us, never pgs library. We
always load lower-level modules from higher-level ones, because we will
need them. Now we conclude the InitContext
analysis.
The Pgsgo library then parses all proto files that were given in the input.
All parsed input files are remembered as “targets” (map[string]pgs.File
).
The library also collects information from all other imported files that
were mentioned via import statements. It accumulates them in the packages
variable of type map[string]pgs.Package
. Then it calls Execute
method
of the initial module. You can see in object.go
things like:
func (m *Module) Execute(
targets map[string]pgs.File,
packages map[string]pgs.Package,
) []pgs.Artifact {
m.ctx.InitGraph(targets, packages)
for _, file := range m.ctx.GetGraph().TargetFiles() {
...
}
}
Goten provides its annotation system on top of protobuf, and we start
seeing an effect here: Normally, an InitGraph call should not be needed.
We should be able to pass just generated artifacts from a given input.
However, in Goten, we call InitGraph to enrich all targets/packages that
were passed from pgsgo. One of the non-compiler directories in the compiler
directory is ast. File compiler/ast/graph.go
is the entry file, which
uses visitors to enrich all types.
Let’s stop with the object.go
file and jump to ast library for now.
Visitor wrapper invoked first wraps pgsgo types like:
-
pgsgo.Entity is wrapped as
ast.Entity
.It is a generic object, it can be a package, file, message, etc.
-
pgsgo.Package is wrapped with
ast.Package
.If the proto package contains the Goten Service definition, it becomes
ast.ServicePackage
. It describes the Goten-based service in a specific version! -
pgsgo.Message, which represents just a normal protobuf message
It becomes
ast.Object
in Goten. If this ast.Object specifies Resource annotation, it becomesast.Resource
, orast.ResourceChange
if describes Change! -
pgsgo.Service (API group in api-skeleton, service in proto)
It becomes
ast.API
in Goten ast package. -
pgsgo.Method (Action in api-skeleton)
It becomes
ast.Method
in Goten ast package. -
pgsgo.Enum becomes
ast.Enum
-
pgsgo.File becomes
ast.File
-
pgsgo.Field becomes
ast.Field
and so on.
The visitor wrapper also introduces our Goten-specific types. For example, look at this:
message SomeObject {
string some_ref_field = 1 [(goten.annotations.type).reference = {
resource : "SomeResource"
target_delete_behavior : BLOCK
}];
}
Library pgsgo will classify this field type as pgs.FieldType string. However, if you see any generated Golang file by us, you will see something like:
type SomeObject struct {
SomeRefField *some_resource.Reference
}
This is another Goten-specific change compared to some protoc-gen-go.
For this reason, in our AST library, we have structs like ast.Reference
,
ast.Name
, ast.Filter
, ast.FieldMask
ast.ParentName
etc.
After the visitor wrapper finishes its task, we have a visitor hydrator that establishes relationships between wrapped entities. As of the moment of this writing, there is a rooting visitor, but it’s not needed and I simply forgot to delete it. If you don’t see it, it means it’s already deleted.
This ast library is very important, because, in our templates for
Golang, we want to use enriched types, according to the Goten language!
You should be able to deduce the rest from ast library when you need
it. For now, let’s go back to the compiler/object/object.go
file, to
Execute the function.
Once we have our own enriched graph, we can start generating files, we check
each of the files we were given as targets. Of these, we filter out if there
are no objects defined, or if objects generated do not need extended
functionality defined, note that in client
packages in any service we
don’t define any pb.fieldpath.go
files and so on. We generate only for
resources and their sub-objects.
The next crucial element of Golang files generation is a call to
InitTemplate
. It should get the current module name for some friendly
error message, and entity target for which we want to generate files.
For example, let’s say we have resource SomeResource
in the
some_resource.proto
file. This is our target file (as ast.File
).
We will generate four files based on this single proto file:
- some_resource.pb.fieldpath.go
- some_resource.pb.fieldpathbuilder.go
- some_resource.pb.fieldmask.go
- some_resource.pb.object_ext.go
Note that if this proto file contains some other objects defined, they
will also be provided in generated files! For For this reason, we pass
the whole ast.File
to this InitTemplate
call.
It is worth looking inside InitTemplate
. There are some notable elements:
- We create some discardable additional GoContext for the current template set.
- There is an imports loader object, that automatically loads all
dependencies to the passed target object. By default, it is enough
to load direct entities. For example, for
ast.File
, those are files directly imported via import statements. - We are iterating all modules - our and those we imported. In the case
of
object.go
, we load Object and Gengo modules. We are callingWithContext
for each, basically, we enrich our temporary GoContex and we initialize the Golang template object we should already know well from bootstrap utility topic.
If you see some WithContext call, like in object.go
:
func (m *Module) WithContext(ctx *gengo.GoContext) *gengo.GoContext {
ctx.AddHelpers(&objectHelpers{ctx: ctx})
return ctx
}
What we do, is add a helper object. Struct objectHelpers
is defined in
the compiler/object/funcs.go
file. Since Object module loads also Gengo,
we should see that we have gengo helpers as well:
func (m *Module) WithContext(ctx *GoContext) *GoContext {
ctx.AddHelpers(&gengoHelpers{ctx: ctx})
return ctx
}
If you follow more deeply, you should reach the file
compiler/shared/context.go
file, see AddHelpers
and InitTemplate
.
When we add helpers, we store helpers in a map using the module namespace
as a key. In InitTemplate
we use this map to provide all functions we
can use in Golang large string templates! It means, that if you see
the following function calls in templates:
-
{{ object.FieldPathInterface … }}
It means we are calling the method “FieldPathInterface” on the “objectHelpers” object.
-
{{ gengo.GoFieldType … }}
It means we are calling the method “GoFieldType” on the “gengoHelpers” object.
We have typically one “namespace” function per each compiler: gengo, object, resource, client, server… This is a method to have a large set of functions available in templates.
Going back to the compiler/object/object.go
file. After InitTemplate
returns us a template, we are parsing all relevant templates from
the tmpl
subdirectory we want to use. Then, using AddGeneratorTemplateFile
we are adding all files to generate. We give the full file path (context
detects exact modules), then we pass the exact template to call, lookup
tries to find a matching template name: {{ define "... name ..." }}
.
The last argument to AddGeneratorTemplateFile
is the current
object (And we can reference by a dot .
). The rest should be well known
already from the bootstrap part.
This concludes protoc generation, all the other protoc compilers, while may be more complicated in detail, are following the same design. Knowing what files are generated by the compiler, you should be able to reach part of the code you want to change.
4 - Goten protobuf-go Extension
protobuf-go
Goten extension.Goten builds on top of protobuf, but the basic library for proto is not provided by us of course. One popular protobuf library for Golang can be found here: https://github.com/protocolbuffers/protobuf-go. It provides lots of utilities around protobuf:
- Parsing proto messages to and from binary format (proto wire).
- Parsing proto messages to and from JSON format (for being human-friendly)
- Copying, merging, comparing
- Access to proto option annotations
Parsing messages to binary format and back is especially important, this is how we send/receive messages over the network. However, it does not exactly work for Goten, because of our custom types.
If we have:
message SomeResource {
string name = 1 [(goten.annotations.type).name.resource = "SomeResource" ];
}
The native protobuf-go
library would map the “name” field into the Go
“string” type. But in Goten, we interpret this as a pointer to the Name
struct in the package relevant to the resource. Reference, Filter, OrderBy,
Cursor, FieldMask, and ParentName are all other custom types. Problematic
are strings, how to map them to non-strings if they have special annotations.
For this reason, we developed a fork called goten-protobuf: https://github.com/cloudwan/goten-protobuf.
The most important bit is the ProtoStringer interface defined in this file: https://github.com/cloudwan/goten-protobuf/blob/main/reflect/protoreflect/value.go.
This is the key difference between our fork and the official implementation. It’s worth to mention more or less how it works.
Look at any gengo-generated file, like
https://github.com/cloudwan/goten/blob/main/meta-service/resources/v1/service/service.pb.go.
If you scroll somewhere to the bottom, to the init()
function, are
registering all types we generated in this file. We also pass raw
descriptors. This is how we are passing information to the protobuf
library. It then populates its registry with all proto descriptors
and matches (via reflection) protobuf declarations with our Golang
struct definitions. If it detects that some field is a string in protobuf,
but it’s a struct in implementation, it will try to match with
ProtoStringer
, which should work, as long as the interface matches.
We tried to make minimal changes in our fork, but unfortunately, we sometimes need to sync from the main one.
Just by the way, we can use the following protobuf functions (interface
proto.Message
is implemented by ALL Go structs implemented on protobuf
message type), using google.golang.org/protobuf/proto
import:
-
proto.Size(proto.Message)
To detect the size of the message in binary format (proto wire)
-
proto.Marshal(proto.Message)
Serialize to the binary format.
-
proto.Unmarshal(in []byte, out proto.Message)
De-serialize from binary format.
-
proto.Merge(dst, src proto.Message)
It merges
src
intodst
. -
proto.Clone(proto.Message)
It makes a deep copy.
-
proto.Equal(a, b proto.Message)
It ompares messages.
More interestingly, we can extract annotations in Golang with this library, like:
import (
resourceann "github.com/cloudwan/goten/annotations/resource"
"github.com/cloudwan/goten/runtime/resource"
)
func IsRegionalResource(res resource.Resource) bool {
msgOpts := res.ProtoReflect().Descriptor().
Options().(*descriptorpb.MessageOptions)
resSpec := proto.GetExtension(msgOpts, resourceann.E_Resource).
(*resourceann.ResourceSpec)
// ... Now we have instance of ResourceSpec -> See goten/annotations/resource.proto file!
}
The function ProtoReflect()
is often used to reach out for object
descriptors. It sometimes gives some nice alternative to regular reflection
in Go (but has some corner cases where it breaks on our types… and we
fall back to reflect).
5 - Goten TypeScript compiler
In SPEKTRA Edge, we maintain also a TypeScript compiler that generates modules for the front end based on protobuf. It is fairly limited compared to generated Golang though. You can find compiler code in the SPEKTRA Edge repository: https://github.com/cloudwan/edgelq/tree/main/protoc-gen-npm-apis.
It generates code to https://github.com/cloudwan/edgelq/tree/main/npm.