This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Goten as a Compiler

Understanding the compiler aspect of the Goten framework.

This document provides instructions on how this bootstrap utility works and by extension, helps you contribute here.

1 - goten-bootstrap

What is goten-bootstrap executable?

Utility goten-bootstrap is a tool generating proto files from the specification file, also known as api-skeleton. In the goten repository, you can find the following files for the api-skeleton schema:

  • annotations/bootstrap.proto with JSON generated schema from it in schemas/api-skeleton.schema.json. This is the place you can modify input to bootstrap.

Runtime entry (main.go) can be found in the cmd/goten-bootstrap directory. It imports package in compiler/bootstrap directory, which pretty much contains the whole code for the goten-bootstrap utility. This is the place to explore if you want to modify generated protobuf files.

In main.go you can see two primary steps:

  1. Initialize the Service package object and pass it to the generator.

    During initialization, we validate input, populate defaults, and deduce all values.

  2. It then attaching all implicit API groups per each resource.

First look at the Generator object initialized with NewGenerator.

The relevant file is compiler/bootstrap/generate.go, which contains the ServiceGenerator struct with a single public method Generate. It takes parsed, validated, and initialized Service object (as described in the service.go file), then just generates all relevant files, with API groups and resources using regular for loops. Template protobuf files are all in tmpl subdirectory.

See the initTmpls function of ServiceGenerator: It collects all template files as giant strings (because those are strings…), parses them, and adds some set of functions that can be used within {{ }}. Those big strings are “render-able” objects, see https://pkg.go.dev/text/template for more details, but normally I find them self-explanatory. In those template strings, you see often:

  • {{functionName <ARGS>}}

    The word is some function. It may be built-in like define, range, if, or it may be a function we provided. From initTmpls you may see functions like uniqueResources, formatActionReplacement etc. Those are our functions. They may take arguments.

  • {{$variable}}

    This variable must be initialized somewhere using := operator. Those are Golang objects under the hood! You can access even sub-fields with dots ., or even call functions (but without arguments).

  • {{.}}

    This is a special kind of “current” active variable. In a given moment only one variable may be active. You may access its properties from regular variables like {{ $otherVar := .Field1.Field2 }}.

  • With {{ or }}

    you may see dashes: {{- or -}}. Their purpose is to remove whitespace (typically newline) behind or after them. It makes output nicer, but may occasionally render code non-compilable.

In generate.go, see svcgen.tmpl.ExecuteTemplate(file, tmplName, data). The first argument is the file writer object where the protobuf file will be generated. The second argument is a string, for example, resourceSchemaFile. The third argument is an active variable that can be accessed as {{.}}, which we mentioned. For example, you should see the following piece of code there:

if err := svcgen.genFile(
  "resourceSchemaFile",
  resource.Service.Proto.Package.CurrentVersion,
  fileName,
  resource,
  svcgen.override,
); err != nil {
    return fmt.Errorf("error generating resource file %s: %s", fileName, err)
}

The function genFile passes resourceSchemaFile as the second argument to tmpl.ExecuteTemplate, and object resource is passed as the last argument to tmpl.ExecuteTemplate. This resource object is of type Resource which you can see in the file resource.go.

How Golang templates are executed: Runtime will try to find the following piece of the template:

{{ define "resourceSchemaFile" }} ... {{ end }}

In this instance, you can find it in file tmpl/resource.tmpl.go, it starts with:

package tmpl

// language=gohtml
const ResourceTmplString = `
{{- define "resourceSchemaFile" -}}
{{- /*gotype: github.com/cloudwan/goten/annotations/bootstrap.Resource*/ -}}
{{- $resource := . }}

... stuff here....

{{ end }}

By convention, we try to provide what kind of object was passed as dot . under define, at least for main templates. Since range loops override dot value, to avoid losing resource reference (and for clarity), we often save current dot into a named variable.

Golang generates from the beginning of define till it reaches relevant {{ end }}. When it sees {{ template "..." <ARG> }}, it calls another define, and passes arg as the next “dot”. To pass multiple arguments, we often provide a dictionary using the dict function: {{ template "... name ..." dict <KEY1> <VALUE1> ... }}. Dict accepts N arguments and just makes a single object. You can see that we implemented this function in initTmpls! The generated final string is outputted to the specified file writer. This is how all protobuf files are generated.

Note that ServiceGenerator skips certain templates depending on the overrideFile argument. This is why resources and custom files for API groups are generated only once, to avoid overriding developer code. Perhaps in the future, we should be able to do some merging. That’s all regarding the generation part.

Also, very important is parsing the api-skeleton service package schema and wrapping it with the Service object as defined in the service.go file. Note that YAML in api-skeleton contains the definition of a Service not in the compiler/bootstrap/service.go, but annotations/bootstrap/bootstrap.pb.go file. See function ParseServiceSkeletonFiles in compiler/bootstrap/utils.go. It loads base bootstrap objects from yaml, and parses to according to the protobuf definition, but then we wrap them with the proper Service object. After we load all Service objects (including the next version and imported ones), we are calling the Init function of a Service. This is where we validate all input properly, and where we put default values missing in api-skeleton. The largest example is the function InitMainApi, which is called from service.go for each resource owned by the service. It adds our implicit APIs with full CRUD methods, it should be visible how all those “implicit” features play out there. We try also to validate as much input as possible. Any error messages must be wrapped with another error, so we return the full message at the top.

2 - Goten Protobuf Compilers

Understanding the Goten protobuf compilers.

Protobuf was developed by Google, and it has been implemented in many languages, including Golang. Each supported language provides a protoc compiler. For Golang, there exists protoc-gen-go, which takes protobuf files and generates Golang files. This tool however is massively insufficient compared to what we need in Goten: We have custom types, extra functionality, and a full-blown framework generating almost all the server code. We developed our protoc compilers which replace standard protoc-gen-go. We have many protoc compilers, see the cmd directory:

  • protoc-gen-goten-go

    This is the main replacement of the standard protoc-gen-go. It generates all the base go files you can see throughout the resources/ and client/ modules, typically. All the generated files ending with .pb.go.

  • protoc-gen-goten-client

    It compiles some files in the client directory, except those ending with .pb.go, which contain basic types.

  • protoc-gen-goten-server

    It generates those middleware and default core files under the server directory.

  • protoc-gen-goten-controller

    It generates controller packages, as described in the developer guide.

  • protoc-gen-goten-store

    It generates files under the store directory.

  • protoc-gen-goten-access

    It compiles files in the access directory.

  • protoc-gen-goten-resource

    It focuses on protobuf objects annotated as resources but does not generate anything for “under” resources. It produces most of the files in the resources directory for each service. This includes pb.access.go, pb.collections.go, pb.descriptor.go, pb.filter.go, pb.filterbuilder.go, pb.name.go, pb.namebuilder.go, pb.pagination.go, pb.query.go, pb.view.go, pb.change.go.

  • protoc-gen-goten-object

    It provides additional optional types over protoc-gen-goten-go, those types are FieldPath, FieldMask, additional methods for merging, cloning, and diffing objects. You can see them in files ending with pb.fieldmask.go, pb.fieldpath.go, pb.fieldpathbuider.go, pb.object_ext.go. This is done for resources or sub-objects used by resources. For example, in the goten repository, you can see files from this protoc compiler under types/meta/ directory.

  • protoc-gen-goten-cli

    It compiles files in the cli directory.

  • protoc-gen-goten-validate

    It generates pb.validate.go files you can typically find in the resources directory, but it’s not necessarily limited there.

  • protoc-gen-goten-versioning

    It generates all versioning transformers under the versioning directory.

  • protoc-gen-goten-doc

    It generates markdown documentation files based on proto files (often docs directory).

  • protoc-gen-goten-jsonschema

    It is a separate compiler for parsing bootstrap.proto into API skeleton JSON schema.

Depending on which files you want to be generated differently, or which you want to study, you need to start with relevant compiler.

Pretty much any compiler in the cmd directory maps to some module in the compiler directory (there are exceptions like the ast package!). For example:

  • cmd/protoc-gen-goten-go maps to compiler/gengo.
  • cmd/protoc-gen-goten-client maps to compiler/client.

Each of these compilers takes a set of protobuf files as the input. When you see some bash code like:

protoc \
    -I "${PROTOINCLUDE}" \
    "--goten-go_out=:${GOGENPATH}" \
    "--goten-validate_out=${GOGENPATH}" \
    "--goten-object_out=:${GOGENPATH}" \
    "--goten-resource_out=:${GOGENPATH}" \
    "--goten-access_out=:${GOGENPATH}" \
    "--goten-cli_out=${GOGENPATH}" \
    "--goten-versioning_out=:${GOGENPATH}" \
    "--goten-store_out=datastore=firestore:${GOGENPATH}" \
    "--goten-server_out=lang=:${GOGENPATH}" \
    "--goten-client_out=:${GOGENPATH}" \
    "--goten-doc_out=service=Meta:${SERVICEPATH}/docs/apis" \
    "${SERVICEPATH}"/proto/v1/*.proto

It simply means we are calling many of those protoc utilities. In the flag we pass proto include paths, so protos can be parsed correctly and linked to others. In this shell, in the last line, we are passing all files for which we want code to be generated. In this case, it is all files we can find in the ${SERVICEPATH}"/proto/v1 directory.

3 - Abstract Syntax Tree

Understanding the internals of the Goten compiler.

Let’s analyze one of the modules, protoc-get-goten-object, as an example, to understand the internals of the Goten protobuf compiler.

func main() {
	pgs.Init(pgs.DebugEnv("DEBUG_PGV")).
		RegisterModule(object.New()).
		RegisterPostProcessor(utils.CalmGoFmt()).
		Render()
}

For a starter, we utilize https://github.com/lyft/protoc-gen-star library. It does the initial parsing of all proto objects for us. Then it invokes a module for file generation. We are passing our object module, from the compiler/object/object.go file:

func New() *Module {
	return &Module{ModuleBase: &pgs.ModuleBase{}}
}

Once it finishes generating files, all will be formatted by GoFmt. However, as you should note, the main processing unit of the compiler is always in the compiler/<name>/<name>.go file. This pgs library is calling the following functions from the passed module, like the one in the object directory: InitContext, then Execute.

Inside InitContext, we are getting some BuildContext, but it only carries arguments that we need to pass to the primary context object. All protobuf compilers (access, cli, client, controller, gengo, object, resource…) use the GoContext object we are defining in the compiler/gengo/context.go file. It is questionable if this file should be in the gengo directory. It is there because the gengo compiler is the most basic for all Golang compilers. GoContext inherits also a Context from the compiler/shared directory. The idea is, that potentially we could support other programming languages in some limited way. We do in SPEKTRA Edge we have a specialized compiler for TypeScript.

Regardless, GoContext is necessary during compilation. Traditionally in Golang, Golang provides the same interface that behaves a bit differently depending on “who is asking and under what circumstances”.

Next (still in the InitContext function), you can see that Module in object.go imports the gengo module from compiler/gengo using NewWithContext. This function is used by us, never pgs library. We always load lower-level modules from higher-level ones, because we will need them. Now we conclude the InitContext analysis.

The Pgsgo library then parses all proto files that were given in the input. All parsed input files are remembered as “targets” (map[string]pgs.File). The library also collects information from all other imported files that were mentioned via import statements. It accumulates them in the packages variable of type map[string]pgs.Package. Then it calls Execute method of the initial module. You can see in object.go things like:

func (m *Module) Execute(
    targets map[string]pgs.File,
    packages map[string]pgs.Package,
) []pgs.Artifact {
    m.ctx.InitGraph(targets, packages)
    
    for _, file := range m.ctx.GetGraph().TargetFiles() {
        ...
    }
}

Goten provides its annotation system on top of protobuf, and we start seeing an effect here: Normally, an InitGraph call should not be needed. We should be able to pass just generated artifacts from a given input. However, in Goten, we call InitGraph to enrich all targets/packages that were passed from pgsgo. One of the non-compiler directories in the compiler directory is ast. File compiler/ast/graph.go is the entry file, which uses visitors to enrich all types.

Let’s stop with the object.go file and jump to ast library for now.

Visitor wrapper invoked first wraps pgsgo types like:

  • pgsgo.Entity is wrapped as ast.Entity.

    It is a generic object, it can be a package, file, message, etc.

  • pgsgo.Package is wrapped with ast.Package.

    If the proto package contains the Goten Service definition, it becomes ast.ServicePackage. It describes the Goten-based service in a specific version!

  • pgsgo.Message, which represents just a normal protobuf message

    It becomes ast.Object in Goten. If this ast.Object specifies Resource annotation, it becomes ast.Resource, or ast.ResourceChange if describes Change!

  • pgsgo.Service (API group in api-skeleton, service in proto)

    It becomes ast.API in Goten ast package.

  • pgsgo.Method (Action in api-skeleton)

    It becomes ast.Method in Goten ast package.

  • pgsgo.Enum becomes ast.Enum

  • pgsgo.File becomes ast.File

  • pgsgo.Field becomes ast.Field

and so on.

The visitor wrapper also introduces our Goten-specific types. For example, look at this:

message SomeObject {
  string some_ref_field = 1 [(goten.annotations.type).reference = {
    resource : "SomeResource"
    target_delete_behavior : BLOCK
  }];
}

Library pgsgo will classify this field type as pgs.FieldType string. However, if you see any generated Golang file by us, you will see something like:

type SomeObject struct {
	SomeRefField *some_resource.Reference
}

This is another Goten-specific change compared to some protoc-gen-go. For this reason, in our AST library, we have structs like ast.Reference, ast.Name, ast.Filter, ast.FieldMask ast.ParentName etc.

After the visitor wrapper finishes its task, we have a visitor hydrator that establishes relationships between wrapped entities. As of the moment of this writing, there is a rooting visitor, but it’s not needed and I simply forgot to delete it. If you don’t see it, it means it’s already deleted.

This ast library is very important, because, in our templates for Golang, we want to use enriched types, according to the Goten language! You should be able to deduce the rest from ast library when you need it. For now, let’s go back to the compiler/object/object.go file, to Execute the function.

Once we have our own enriched graph, we can start generating files, we check each of the files we were given as targets. Of these, we filter out if there are no objects defined, or if objects generated do not need extended functionality defined, note that in client packages in any service we don’t define any pb.fieldpath.go files and so on. We generate only for resources and their sub-objects.

The next crucial element of Golang files generation is a call to InitTemplate. It should get the current module name for some friendly error message, and entity target for which we want to generate files. For example, let’s say we have resource SomeResource in the some_resource.proto file. This is our target file (as ast.File). We will generate four files based on this single proto file:

  1. some_resource.pb.fieldpath.go
  2. some_resource.pb.fieldpathbuilder.go
  3. some_resource.pb.fieldmask.go
  4. some_resource.pb.object_ext.go

Note that if this proto file contains some other objects defined, they will also be provided in generated files! For For this reason, we pass the whole ast.File to this InitTemplate call.

It is worth looking inside InitTemplate. There are some notable elements:

  • We create some discardable additional GoContext for the current template set.
  • There is an imports loader object, that automatically loads all dependencies to the passed target object. By default, it is enough to load direct entities. For example, for ast.File, those are files directly imported via import statements.
  • We are iterating all modules - our and those we imported. In the case of object.go, we load Object and Gengo modules. We are calling WithContext for each, basically, we enrich our temporary GoContex and we initialize the Golang template object we should already know well from bootstrap utility topic.

If you see some WithContext call, like in object.go:

func (m *Module) WithContext(ctx *gengo.GoContext) *gengo.GoContext {
	ctx.AddHelpers(&objectHelpers{ctx: ctx})
	return ctx
}

What we do, is add a helper object. Struct objectHelpers is defined in the compiler/object/funcs.go file. Since Object module loads also Gengo, we should see that we have gengo helpers as well:

func (m *Module) WithContext(ctx *GoContext) *GoContext {
	ctx.AddHelpers(&gengoHelpers{ctx: ctx})
	return ctx
}

If you follow more deeply, you should reach the file compiler/shared/context.go file, see AddHelpers and InitTemplate. When we add helpers, we store helpers in a map using the module namespace as a key. In InitTemplate we use this map to provide all functions we can use in Golang large string templates! It means, that if you see the following function calls in templates:

  • {{ object.FieldPathInterface … }}

    It means we are calling the method “FieldPathInterface” on the “objectHelpers” object.

  • {{ gengo.GoFieldType … }}

    It means we are calling the method “GoFieldType” on the “gengoHelpers” object.

We have typically one “namespace” function per each compiler: gengo, object, resource, client, server… This is a method to have a large set of functions available in templates.

Going back to the compiler/object/object.go file. After InitTemplate returns us a template, we are parsing all relevant templates from the tmpl subdirectory we want to use. Then, using AddGeneratorTemplateFile we are adding all files to generate. We give the full file path (context detects exact modules), then we pass the exact template to call, lookup tries to find a matching template name: {{ define "... name ..." }}. The last argument to AddGeneratorTemplateFile is the current object (And we can reference by a dot .). The rest should be well known already from the bootstrap part.

This concludes protoc generation, all the other protoc compilers, while may be more complicated in detail, are following the same design. Knowing what files are generated by the compiler, you should be able to reach part of the code you want to change.

4 - Goten protobuf-go Extension

Understanding the protobuf-go Goten extension.

Goten builds on top of protobuf, but the basic library for proto is not provided by us of course. One popular protobuf library for Golang can be found here: https://github.com/protocolbuffers/protobuf-go. It provides lots of utilities around protobuf:

  • Parsing proto messages to and from binary format (proto wire).
  • Parsing proto messages to and from JSON format (for being human-friendly)
  • Copying, merging, comparing
  • Access to proto option annotations

Parsing messages to binary format and back is especially important, this is how we send/receive messages over the network. However, it does not exactly work for Goten, because of our custom types.

If we have:

message SomeResource {
  string name = 1 [(goten.annotations.type).name.resource = "SomeResource" ];
}

The native protobuf-go library would map the “name” field into the Go “string” type. But in Goten, we interpret this as a pointer to the Name struct in the package relevant to the resource. Reference, Filter, OrderBy, Cursor, FieldMask, and ParentName are all other custom types. Problematic are strings, how to map them to non-strings if they have special annotations.

For this reason, we developed a fork called goten-protobuf: https://github.com/cloudwan/goten-protobuf.

The most important bit is the ProtoStringer interface defined in this file: https://github.com/cloudwan/goten-protobuf/blob/main/reflect/protoreflect/value.go.

This is the key difference between our fork and the official implementation. It’s worth to mention more or less how it works.

Look at any gengo-generated file, like https://github.com/cloudwan/goten/blob/main/meta-service/resources/v1/service/service.pb.go. If you scroll somewhere to the bottom, to the init() function, are registering all types we generated in this file. We also pass raw descriptors. This is how we are passing information to the protobuf library. It then populates its registry with all proto descriptors and matches (via reflection) protobuf declarations with our Golang struct definitions. If it detects that some field is a string in protobuf, but it’s a struct in implementation, it will try to match with ProtoStringer, which should work, as long as the interface matches.

We tried to make minimal changes in our fork, but unfortunately, we sometimes need to sync from the main one.

Just by the way, we can use the following protobuf functions (interface proto.Message is implemented by ALL Go structs implemented on protobuf message type), using google.golang.org/protobuf/proto import:

  • proto.Size(proto.Message)

    To detect the size of the message in binary format (proto wire)

  • proto.Marshal(proto.Message)

    Serialize to the binary format.

  • proto.Unmarshal(in []byte, out proto.Message)

    De-serialize from binary format.

  • proto.Merge(dst, src proto.Message)

    It merges src into dst.

  • proto.Clone(proto.Message)

    It makes a deep copy.

  • proto.Equal(a, b proto.Message)

    It ompares messages.

More interestingly, we can extract annotations in Golang with this library, like:

import (
    resourceann "github.com/cloudwan/goten/annotations/resource"
    "github.com/cloudwan/goten/runtime/resource"
)

func IsRegionalResource(res resource.Resource) bool {
    msgOpts := res.ProtoReflect().Descriptor().
                   Options().(*descriptorpb.MessageOptions)
    resSpec := proto.GetExtension(msgOpts, resourceann.E_Resource).
                     (*resourceann.ResourceSpec)
	// ... Now we have instance of ResourceSpec -> See goten/annotations/resource.proto file!
}

The function ProtoReflect() is often used to reach out for object descriptors. It sometimes gives some nice alternative to regular reflection in Go (but has some corner cases where it breaks on our types… and we fall back to reflect).

5 - Goten TypeScript compiler

Understabnding the TypeScript Goten compiler module.

In SPEKTRA Edge, we maintain also a TypeScript compiler that generates modules for the front end based on protobuf. It is fairly limited compared to generated Golang though. You can find compiler code in the SPEKTRA Edge repository: https://github.com/cloudwan/edgelq/tree/main/protoc-gen-npm-apis.

It generates code to https://github.com/cloudwan/edgelq/tree/main/npm.