Abstract Syntax Tree
Let’s analyze one of the modules, protoc-get-goten-object, as an example, to understand the internals of the Goten protobuf compiler.
func main() {
pgs.Init(pgs.DebugEnv("DEBUG_PGV")).
RegisterModule(object.New()).
RegisterPostProcessor(utils.CalmGoFmt()).
Render()
}
For a starter, we utilize https://github.com/lyft/protoc-gen-star
library. It does the initial parsing of all proto objects for us. Then
it invokes a module for file generation. We are passing our object module,
from the compiler/object/object.go
file:
func New() *Module {
return &Module{ModuleBase: &pgs.ModuleBase{}}
}
Once it finishes generating files, all will be formatted by GoFmt. However,
as you should note, the main processing unit of the compiler is always in
the compiler/<name>/<name>.go
file. This pgs
library is calling the
following functions from the passed module, like the one in the object
directory: InitContext, then Execute.
Inside InitContext
, we are getting some BuildContext, but it only carries
arguments that we need to pass to the primary context object. All protobuf
compilers (access, cli, client, controller, gengo, object, resource…) use
the GoContext
object we are defining in the compiler/gengo/context.go
file. It is questionable if this file should be in the gengo directory.
It is there because the gengo compiler is the most basic for all Golang
compilers. GoContext inherits also a Context from the compiler/shared
directory. The idea is, that potentially we could support other programming
languages in some limited way. We do in SPEKTRA Edge we have a specialized
compiler for TypeScript.
Regardless, GoContext is necessary during compilation. Traditionally in Golang, Golang provides the same interface that behaves a bit differently depending on “who is asking and under what circumstances”.
Next (still in the InitContext
function), you can see that Module in
object.go
imports the gengo module from compiler/gengo
using
NewWithContext
. This function is used by us, never pgs library. We
always load lower-level modules from higher-level ones, because we will
need them. Now we conclude the InitContext
analysis.
The Pgsgo library then parses all proto files that were given in the input.
All parsed input files are remembered as “targets” (map[string]pgs.File
).
The library also collects information from all other imported files that
were mentioned via import statements. It accumulates them in the packages
variable of type map[string]pgs.Package
. Then it calls Execute
method
of the initial module. You can see in object.go
things like:
func (m *Module) Execute(
targets map[string]pgs.File,
packages map[string]pgs.Package,
) []pgs.Artifact {
m.ctx.InitGraph(targets, packages)
for _, file := range m.ctx.GetGraph().TargetFiles() {
...
}
}
Goten provides its annotation system on top of protobuf, and we start
seeing an effect here: Normally, an InitGraph call should not be needed.
We should be able to pass just generated artifacts from a given input.
However, in Goten, we call InitGraph to enrich all targets/packages that
were passed from pgsgo. One of the non-compiler directories in the compiler
directory is ast. File compiler/ast/graph.go
is the entry file, which
uses visitors to enrich all types.
Let’s stop with the object.go
file and jump to ast library for now.
Visitor wrapper invoked first wraps pgsgo types like:
-
pgsgo.Entity is wrapped as
ast.Entity
.It is a generic object, it can be a package, file, message, etc.
-
pgsgo.Package is wrapped with
ast.Package
.If the proto package contains the Goten Service definition, it becomes
ast.ServicePackage
. It describes the Goten-based service in a specific version! -
pgsgo.Message, which represents just a normal protobuf message
It becomes
ast.Object
in Goten. If this ast.Object specifies Resource annotation, it becomesast.Resource
, orast.ResourceChange
if describes Change! -
pgsgo.Service (API group in api-skeleton, service in proto)
It becomes
ast.API
in Goten ast package. -
pgsgo.Method (Action in api-skeleton)
It becomes
ast.Method
in Goten ast package. -
pgsgo.Enum becomes
ast.Enum
-
pgsgo.File becomes
ast.File
-
pgsgo.Field becomes
ast.Field
and so on.
The visitor wrapper also introduces our Goten-specific types. For example, look at this:
message SomeObject {
string some_ref_field = 1 [(goten.annotations.type).reference = {
resource : "SomeResource"
target_delete_behavior : BLOCK
}];
}
Library pgsgo will classify this field type as pgs.FieldType string. However, if you see any generated Golang file by us, you will see something like:
type SomeObject struct {
SomeRefField *some_resource.Reference
}
This is another Goten-specific change compared to some protoc-gen-go.
For this reason, in our AST library, we have structs like ast.Reference
,
ast.Name
, ast.Filter
, ast.FieldMask
ast.ParentName
etc.
After the visitor wrapper finishes its task, we have a visitor hydrator that establishes relationships between wrapped entities. As of the moment of this writing, there is a rooting visitor, but it’s not needed and I simply forgot to delete it. If you don’t see it, it means it’s already deleted.
This ast library is very important, because, in our templates for
Golang, we want to use enriched types, according to the Goten language!
You should be able to deduce the rest from ast library when you need
it. For now, let’s go back to the compiler/object/object.go
file, to
Execute the function.
Once we have our own enriched graph, we can start generating files, we check
each of the files we were given as targets. Of these, we filter out if there
are no objects defined, or if objects generated do not need extended
functionality defined, note that in client
packages in any service we
don’t define any pb.fieldpath.go
files and so on. We generate only for
resources and their sub-objects.
The next crucial element of Golang files generation is a call to
InitTemplate
. It should get the current module name for some friendly
error message, and entity target for which we want to generate files.
For example, let’s say we have resource SomeResource
in the
some_resource.proto
file. This is our target file (as ast.File
).
We will generate four files based on this single proto file:
- some_resource.pb.fieldpath.go
- some_resource.pb.fieldpathbuilder.go
- some_resource.pb.fieldmask.go
- some_resource.pb.object_ext.go
Note that if this proto file contains some other objects defined, they
will also be provided in generated files! For For this reason, we pass
the whole ast.File
to this InitTemplate
call.
It is worth looking inside InitTemplate
. There are some notable elements:
- We create some discardable additional GoContext for the current template set.
- There is an imports loader object, that automatically loads all
dependencies to the passed target object. By default, it is enough
to load direct entities. For example, for
ast.File
, those are files directly imported via import statements. - We are iterating all modules - our and those we imported. In the case
of
object.go
, we load Object and Gengo modules. We are callingWithContext
for each, basically, we enrich our temporary GoContex and we initialize the Golang template object we should already know well from bootstrap utility topic.
If you see some WithContext call, like in object.go
:
func (m *Module) WithContext(ctx *gengo.GoContext) *gengo.GoContext {
ctx.AddHelpers(&objectHelpers{ctx: ctx})
return ctx
}
What we do, is add a helper object. Struct objectHelpers
is defined in
the compiler/object/funcs.go
file. Since Object module loads also Gengo,
we should see that we have gengo helpers as well:
func (m *Module) WithContext(ctx *GoContext) *GoContext {
ctx.AddHelpers(&gengoHelpers{ctx: ctx})
return ctx
}
If you follow more deeply, you should reach the file
compiler/shared/context.go
file, see AddHelpers
and InitTemplate
.
When we add helpers, we store helpers in a map using the module namespace
as a key. In InitTemplate
we use this map to provide all functions we
can use in Golang large string templates! It means, that if you see
the following function calls in templates:
-
{{ object.FieldPathInterface … }}
It means we are calling the method “FieldPathInterface” on the “objectHelpers” object.
-
{{ gengo.GoFieldType … }}
It means we are calling the method “GoFieldType” on the “gengoHelpers” object.
We have typically one “namespace” function per each compiler: gengo, object, resource, client, server… This is a method to have a large set of functions available in templates.
Going back to the compiler/object/object.go
file. After InitTemplate
returns us a template, we are parsing all relevant templates from
the tmpl
subdirectory we want to use. Then, using AddGeneratorTemplateFile
we are adding all files to generate. We give the full file path (context
detects exact modules), then we pass the exact template to call, lookup
tries to find a matching template name: {{ define "... name ..." }}
.
The last argument to AddGeneratorTemplateFile
is the current
object (And we can reference by a dot .
). The rest should be well known
already from the bootstrap part.
This concludes protoc generation, all the other protoc compilers, while may be more complicated in detail, are following the same design. Knowing what files are generated by the compiler, you should be able to reach part of the code you want to change.