runtime/observability
In the Goten repo, there is a observability module located at runtime/observability. This module is for:
- Store tracing spans (Jaeger and Google Tracing supported)
- Audit (Service
audit.edgelq.com
). - Monitoring usage (Metrics are stored in
monitoring.edgelq.com
).
In the Goten repo, this module is rather small, in observer.go
we have
Observer for spans. Goten also stores in context for the current gRPC call
object called CallTracker
(call_tracker.go
). This generic tracker is
used by the Audit and Monitoring usage reporter.
Goten also provides a global registry, where listeners can tap in to monitor all calls.
The mentioned module is however just a small base, more proper code is in SPEKTRA Edge repository, directory common/serverenv/observability:
-
InitCloudTracing
It initializes span tracing. It registers a global instance, but is picked when we register the proper module, in file common/serverenv/grpc/server.go. See function
NewGrpcServer
, optiongrpc.StatsHandler
. This is where tracing is added to the server. TODO: We will need to migrate Audit and Monitoring related, too. Also, I believe we should move theCallTracker
initialization there altogether! -
InitServerUsageReporter
It initializes usage tracking. First, it stores a global usage reporter, that periodically sends usage time series data. It also stores standard observers, usage for store and API. They are registered in Goten observability module, to catch all calls.
-
InitAuditing
It creates a logs exporter, as defined in the
audit/logs_exporter
module, then registers within the goten observability module.
Usage tracking
In the file common/serverenv/observability/observability.go, inside
function InitServerUsageReporter
, we initialize two modules:
-
Usage reporter
It is a periodic job, that checks all recorders from time to time, and exports usage as time series. It’s defined in common/serverenv/usage/reporter.go.
-
usageCallObserver object
With the
RegisterStdUsageReporters
call, it is registered within Goten observability (gotenobservability.RegisterCallObserver)(defined in common/serverenv/usage/std_recorders.
Reporter is supposed to be generic, there is a possibility to add more
recorders. In the std_recorders
directory, we just add a standard observer
for API calls, we track usage on the API Server AND local store usage.
If you look at Reporter implementation, note that we are using always
the same Project ID. This is Service Project ID, global for the whole
Service, shared across all Deployments for this service. Each Service
maintains usage metrics in its project. By convention, if we want to
distinguish usage across user projects, we have a label for it,
user_project_id
. This is a common convention. See files
std_recorders/api_recorder.go
and std_recorders/storage_recorder.go
,
find RetrieveResults
calls. We are providing a user_project_id
label
for all time series.
Let’s describe standard usage trackers. For this, the central point is
usageCallObserver
, defined in the call_observer.go
file. If you look
at it, it catches all unnecessary requests/responses plus streams
(new/closed streams, new client or server messages). Its responsibilities
are:
- Insert store usage tracker in the context (via CallTracker).
- Extract usage project IDs from requests or responses (where possible).
- Notify API and storage recorders when necessary, storage recorder needs periodic flushing for streaming especially.
To track actual store usage, there is a dedicated store plugin, SPEKTRA Edge
repository, file common/store_plugins/usage_observer.go
. It gets
a store usage tracker and increments values when necessary!
In summary, this implementation serves to provide metrics for fixtures
defined in monitoring/fixtures/v4/per_service_metric_descriptor.yaml
.
Audit
The audit is initialized in general in the
common/serverenv/observability/observability.go
file, inside the function
InitAuditing. It calls NewLogsExporter from the
audit/logs_exporter
package.
Then, inside RegisterExporter, defined in file
common/serverenv/auditing/exporter.go
, we are hooking up two objects
into Goten observability modules:
-
auditMsgVersioningObserver
It’s responsible for catching all request/response versioning transformations.
-
auditCallObserver
It’s responsible for catching all unary and streaming calls.
Of course, tracking API and versioning is not enough, we also need to
export ResourceChangeLogs somehow. For this, we have also an additional
store plugin in the file common/store_plugins/audit_observer.go
file!
It tracks changes happening in the store and pings Exporter when necessary.
When the transaction is about to be committed, we call
MarkTransactionAsReady. It may look a bit innocent, but it is not,
see implementation. We are calling OnPreCommit, which is creating
ResourceChangeLog resources! If we do not succeed, then we return
an error, it will break the entire transaction in result. This is to
ensure that ResourceChangeLogs are always present, even if we fail
to commit ActivityLogs later on, so something is still there in audit.
The reason why we have a separate common/serverenv/auditing
directory
from the audit service, was some kind of idea that we should have
an interface in the “common” part, but implementation should be elsewhere.
This was an unnecessary abstraction, especially since we don’t expect
other exporters here (and we want to maintain functionality and be able
to break it). But for now, it is still there and probably will stay due
to low harm.
Implementation of the audit log exporter should be fairly simple, see
the audit/logs_exporter/exporter.go
file in the SPEKTRA Edge repository.
Basically:
-
IsStreamReqAuditable and IsUnaryReqAuditable are used to determine whether we want to track this call. If not, no further calls will be made.
-
OnPreCommit and OnCommitResult are called to send ResourceChangeLog. Those are synchronous calls, they don’t exist until the Audit finishes processing. Note that it will extend a bit duration of store transactions!
-
OnUnaryReqStarted and OnUnaryReqFinished are called for unary requests and responses.
-
OnRequestVersioning and OnResponseVersioning are called for unary requests when their bodies are transformed between API versions. The function of it is to extract potential labels from updated requests or responses. Activity logs recorded will still be done for the old version.
-
OnStreamStarted and OnStreamFinished should be self-explanatory.
-
OnStreamExportable notifies when ActivityLog can be generated. It is used to send ActivityLogs before the call finishes.
-
OnStreamClientMessage and OnStreamServerMessage add client/server messages to ActivityLogs.
-
OnStreamClientMsgVersioning and OnStreamServerMsgVersioning notify the exporter when client or server messages are transformed to different API versions.
Notable elements:
- Audit log exporter can sample unary requests when deciding whether to audit or not.
- While ResourceChangeLog is sent synchronously and extends call duration, ActivityLogs does not. The audit exporter maintains a set of workers for streaming and unary calls, they have a bit of a different implementation. They work asynchronously.
- Stream and unary log workers will try to accumulate a small batch of activity logs before sending, them to save on IO work. They have timeouts based on log size and time.
- Stream and unary log workers will retry failed logs, but if they accumulate too much, they will start dropping.
- Unary log workers send ActivityLogs for finished calls only.
- Streaming log workers can send ActivityLogs for ongoing calls. If this happens, many Activity log fields like labels are no longer updateable. But request/responses and exit codes will be appended as Activity Log Events.