This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

SPEKTRA Edge Wide-Column Store Usage

Understanding the wide-column store usage in SPEKTRA Edge.

Wide-column store provides alternative data storage compared to the document storage from Goten. It is focused on wide-column data stores, and currently, we support Google Bigtable and ScyllaDB as the backend engine. However, from the user’s perspective, it is worth mentioning the properties and features of this particular database interface:

  • this is a no-relational db with no joins or transactions. It has a simple interface: Query and Save (can overwrite). There is no “Get” equivalent and does not support streaming reading.
  • it is semi-schemaless, it mostly stores data in binary and non-queryable format. However, it allows to definition of column families and a set of filterable field keys.
  • data is stored using timestamp order. When saving, it is required to provide a timestamp per each column value. Query always requires a time range.
  • data is automatically GC-ed after the defined TTL per column family.
  • optimized for large data sets and throughput.
  • possible query operators: EQUAL, IN, NOT EQUAL, NOT IN
  • a single filter can contain many conditions connected with the AND operator.
  • query itself may have many filters - effectively supporting OR operator.

This store is fully implemented in the SPEKTRA Edge repository. The store interface and implementation can be found in common/widecolumn/store/store.go. The package for this file also contains primary user-visible objects.

We have a couple of use cases (resources!) within the SPEKTRA Edge repository where we use this storage:

  • in the Monitoring service, we use it for TimeSeries
  • in the Audit service, we use it for ActivityLogs and ResourceChangeLogs
  • in the Logging service, we use it for Logs

For each of these cases, we are wrapping wide-column store into the proper interface.

In the service specification for these resources, we are disabling metadata and standard CRUD.

1 - Wide-Column Data Models

Understanding the wide-column user and storage data models.

User Data Model

Input/output of widecolumn store is KeyedValues from store package, file common/widecolumn/store/keyed_values.go. It can be thought of as a resource. It is combined of:

  • Single Key object (uid package).

    It is used to uniquely identify resources.

  • Array of Column values (data entry package, object ColumnValue).

    Each column value has:

    • Column Family name (string)
    • Timestamp
    • Optional binary key
    • Data (binary data)

KeyedValues can hold multiple column values with timestamps ranging from hours, days, weeks, months, and potentially years. Therefore, the Write method only saves a range of values for a single key (normally it appends!). The query just grabs partial data only for a given range.

Column family and optional binary keys are identifiers of column values for the same timestamp (so they don’t overwrite each other).

Note that this makes “a resource called KeyedValues” a time-series-like resource. In this context, Key represents some header and column values (with timestamps) contain a series of values across an always-increasing time range.

Imagine there is a temperature sensor with some serial number. It sends the temperature every minute. In that case, we can define the following Key: serial_number=1234, product=temp_sensor and Column Family: temperatures. Then each temp can be represented as a double value and associated with a specific timestamp. Double can be encoded as binary data. We don’t need an optional binary key in this case.

Keys

A key is a unique identifier of a column value series. Key definitions and manipulation are provided by the uid package. It is a set of key-value pairs (KV type). KV without Value part is just a “KId” object, key identifier. It may be better thought of as key field identifier, because “the real” Key is an array of these objects.

Each KV pair has an equivalent of a UID object: UID allows to mapping of Keys and KeyValue pairs between string and binary representation UIDs are represented as paths in up-to depth three tree each node may be allocated using parent counter and atomic operations hierarchy:

  1. Root UID len 1; (always 1), str: _
  2. Key UID len 2; (1, kid), example str: _#serial_number
  3. Value UID len 3; (1, kid, vid), example str: _#serial_number#1234

It is worth mentioning that UID with depth 1 is the same as the empty root value. UID with length 2 is just a KId, then full size is equivalent to full KV.

It is important to say that UID is more focused on how data is internally stored in the underlying backend (bigtable or scylla).

Going back to the temperature sensor example, we can notice our Key object has two KV pairs:

  • serial_number=1234
  • product=temp_sensor

But if you look at the structure of KV you may notice that it is a pair of integer values. This is because the storage of integers is more efficient than strings. Especially if the given key value repeats many times across time and different keys. Therefore, each type in the uid package has a “String” equivalent:

  • KV has SKV
  • KId has SKId
  • UID has StrUID

Note that the main store interface, apart from Query/Write, provides also functions for:

  • Allocating string key values.
  • Resolving strings to integers back and forth.

Structure KeyedValues provides Key and SKey as fields. They are meant to be equivalent. However, it is worth to note:

  • When executing the Query method, all KeyedValues will only have a Key value set. It is assumed that SKey may not always be needed. Besides, it is more efficient to resolve all skeys once the query is completed in bulk.
  • When executing the Write method, the store will check if the Key is defined. If not, it will default to SKey and allocate/resolve at runtime. However, it is recommended to use Key whenever possible for performance reasons.

Storage Data Model

Data is stored in a bit different format than it is presented in KeyedValues. Object KeyedValues is transformed into a set of DataEntry objects from the dataentry package. DataEntry is a combination of:

  • Row object (see dataentry/row.go)
  • Array of ColumnValue objects (see dataentry/column_value.go)

It may look like Row is equivalent to Key from KeyedValues, but it i not. There is a transformation going on:

  • Key from KeyedValues is being split into two keys, promoted and tail key. This split is defined by the TableIndex object from the uid package. As you may have figured out, this is to help query data in a fast and efficient way when the filter defines a set of keys, then the store will try to pick up the index with the promoted key set most closely to the filter.

    We are indexing!

  • Column value timestamps are transformed into RowSequence objects (see dataentry/row.go file). Those column values that have the same sequence are grouped. Otherwise, for each unique sequence, a new Row is created, containing the promoted key, tail key, and sequence number. Then it gets assigned column values that have the same sequence number.

  • Note that KeyedValues are created from IndexedKeyedValues when writing, each TableIndex will create DataEntry object and those indices are full replicas!

Example: Imagine we have the following KeyedValue (single):

  • Key: serial_number=1234, product=temp_sensor
  • Values:
    • temperatures: 123.4, 2020-01-01T00:00:00Z
    • temperatures: 124.4, 2020-01-01T00:01:00Z

Then it will be transformed into the following DataEntry objects, provided that serial_number is used as an index:

  • DataEntry 1:
    • Row:
      • Promoted Key: serial_number=1234
      • Tail key: product=temp_sensor
      • Sequence: fromTimestamp(2020-01-01T00:00:00Z)
    • Values:
      • temperatures: 123.4, 2020-01-01T00:00:00Z
  • DataEntry 2:
    • Row:
      • Promoted Key: serial_number=1234
      • Tail key: product=temp_sensor
      • Sequence: fromTimestamp(2020-01-01T00:01:00Z)
    • Values:
      • temperatures: 124.4, 2020-01-01T00:01:00Z

When data is saved to the underlying DB, repeated fields from “Values” like timestamp may be dropped, as we already have them in the Row object. Promoted/Tail key understanding is important to write good indices! In this example, we assumed a single promoted index, but if we had more, we would have more replicas.

2 - SPEKTRA Edge Wide-Column Versions

Understanding the SPEKTRA Edge wide-column versions.

We have right now two supported storage versions: v2 and v3. Version v2 is old and has lots of influence from the Monitoring service that had widecolumn implementation internally for its private use. Later Audit and Logging services were added but v2 still specifies some fields relevant for monitoring only. It was written also having bigtable in mind, where you can query by not only row range, but column values range. This is however not efficient with ScyllaDB and likely other CQL-based DBs. Moreover, this is not needed, a query from monitoring only needs data from a specific column key, and Audit/logging always queries by whole range only. There is no plan to utilize this feature, so it is better to have it removed. On top of that, sequence numbers in v3 have nanosecond precision, while in v2 only seconds. This may allow us to drop timestamps from column values completely and invalidates the reason why we have many column values in Audit/Logging anyway.

Summary:

  • v3 has Sequence in nanosecond precision, v2 in seconds.
  • Row in v2 contains: promoted key, tail key, sequence number, empty Aggregation key, empty start timestamp, and alignment period. v3 has promoted key, tail key, and sequence number only.
  • Column Value in v2 has extra elements removed in v3: Alignment period, start time.
  • Query in v2 allows to specify column key range, v3 not. We are using this feature right now in Monitoring (Aligner is used to specify the column to pick the correct value type). However, we don’t specify a multi-value range, column range start is equal to column value range, always. Therefore, for v3, to keep the same functionality, we need to move Aligner as part of Key in KeyedValues.

TODO:

  • Monitoring, Audit, and Logging write/query to v2. We need to implement v3 implementations for these.
  • All three services will need to double-write for some time. Queries should be executed on V3 when possible, but v2 should be used as a fallback. Writes should be executed on both v2 and v3. This is to allow for a smooth transition from v2 to v3. We should use the config file to indicate when v2 should be written or from which data point v3 is valid. Specific stores (like TimeSeriesStore) should internally have two references to StoreV2 & StoreV3 of widecolumn.

3 - SPEKTRA Edge Wide-Column Usage Example

Understanding the SPEKTRA Edge wide-column usage through example.
package example

import (
  "context"
  "fmt"
  "time"

  wcde "github.com/cloudwan/edgelq/common/widecolumn/dataentry"
  wcstore "github.com/cloudwan/edgelq/common/widecolumn/store"
  wcuid "github.com/cloudwan/edgelq/common/widecolumn/uid"
)

func RunDemo(ctx context.Context, store wcstore.WCStoreV3) error {
  // Create reusable fields descriptor. Define two key fields:
  // serial_number (no flags - 0), no default value, required, not
  // negative filters allowed product (no flags - 0), no default value,
  // required, not negative filters allowed.
  fieldsDesc := wcuid.NewKeyDescriptor().
    AddSKId("serial_number", 0, "", true, false).
    AddSKId("product", 0, "", true, false)

  // Allocate string - int mappings for later. Now keep fieldsDesc "forever".
  // This part can be completed at the startup.
  strUIDs := fieldsDesc.GetUnresolvedStrUIDs()
  uids, err := store.ResolveOrAllocateStrUIDs(ctx, strUIDs)
  if err != nil {
    return err
  }
  fieldsDesc.ResolveSKVsInBulk(strUIDs, uids)

  // Create specific key - err is if for example key is repeated or required
  // key is not provided. Undefined key are stored as unclassified - but they
  // can be classified later providing more field descriptors.
  describedSKey, err := fieldsDesc.NewDescribedSKeyBuilder().
    Add("serial_number", "1234").
    Add("product", "temp_sensor").Build()
  if err != nil {
    return err
  }

  // Prepare indices by which data must be split into. We want to index
  // by serial_number only.
  indices := []*wcuid.TableIndex{wcuid.NewTableIndex([]wcuid.UID{
    wcuid.KIdUID(
      fieldsDesc.GetFieldDescriptorBySKey("serial_number").GetKey(),
      0,
    ),
  })}

  // Prepare data to write. We could have allocated skey into key, but
  // we dont need.
  indexedValues := make([]wcstore.IndexedKeyedValues[*wcde.ColumnValueV3], 0)
  indexedValues = append(indexedValues, wcstore.IndexedKeyedValues[*wcde.ColumnValueV3]{
    Indices: indices,
    KeyedValues: wcstore.KeyedValues[*wcde.ColumnValueV3]{
      SKey: describedSKey.FormatRawKey(),
      Values: []*wcde.ColumnValueV3{
        {
          Family: "temperatures",
          Time:   time.Date(2020, 1, 1, 0, 0, 0, 0, time.UTC),
          Data:   marshalDouble(123.4),
        },
        {
          Family: "temperatures",
          Time:   time.Date(2020, 1, 1, 0, 1, 0, 0, time.UTC),
          Data:   marshalDouble(124.4),
        },
      },
    },
  })

  // Save the data
  err = store.Write(ctx, indexedValues)
  if err != nil {
    return err
  }

  // Prepare query - we will get only one of two points saved.
  // We filter by specific serial number and all keys should be automatically
  // validated (true flag)
  filter := wcstore.NewFilterFromSKVs([]wcuid.SKV{
    wcuid.NewSKIV("serial_number", "1234"),
  }, true)
  query := &wcstore.QueryV3{
    StartTime: time.Date(2020, 1, 1, 0, 0, 0, 0, time.UTC),
    EndTime:   time.Date(2020, 1, 1, 0, 0, 0, 0, time.UTC),
    OrGroups:  []*wcstore.QueryV3OrGroup{
      {
        Indices:      indices,
        Filter:       filter,
        ColumnFamily: "temperatures",
      },
    },
  }

  tempsByKey := make(map[wcuid.Key][]float64, 0)
  uniqueKeys := make([]*wcuid.Key, 0)
  err = store.Query(
    ctx,
    query,
    func(
      ctx context.Context,
      seq wcde.RowSequenceV3,
      entries []wcstore.KeyedValues[*wcde.ColumnValueV3],
      isDone bool,
    ) (bool, error) {
      for _, entry := range entries {
        temps, ok := tempsByKey[*entry.Key]
        if !ok {
          temps = make([]float64, 0)
          uniqueKeys = append(uniqueKeys, entry.Key)
        }
        for _, v := range entry.Values {
          temps = append(temps, unmarshalDouble(v.Data))
        }
        tempsByKey[*entry.Key] = temps
      }
    
      // Return true to continue, false to stop
      return true, nil
    },
  )
  if err != nil {
    return err
  }
  
  skeys, err := store.ResolveKeys(ctx, uniqueKeys)
  if err != nil {
    return err
  }
  for i, key := range uniqueKeys {
    temps := tempsByKey[*key]
    descSKey, _ := fieldsDesc.ValidateSKey(skeys[i], false)
    serialNumber := descSKey.GetSVid("serial_number")
    fmt.Sprintf(
      "Serial number %s has %d temperatures: %v",
      serialNumber,
      len(temps),
      temps,
    )
  }
  return nil
}

It is important to learn how to use indices efficiently, basically to successfully query data, a filter query must define all fields that are defined in at least one index. It is like looking for a proper data partition. Tail keys are filtered out in memory (if there are filter conditions for them), but data is still being pulled.

Note also an important quirk about widecolumn storage: We need to specify indices both when writing and when querying. Widecolumn can detect what index is most optimal, but it does not remember them. There are no functions for saving/getting indices. It is the responsibility of a user to provide indices when saving, and a list of available when querying!

It is generally “fine”, however - all our implementations use descriptors which we need to get from the local database when validating user input anyway! If we get it from the database, then it makes no sense to have extra DB lookups when reading indices. They are used when writing, querying, and also validating.

4 - SPEKTRA Edge Wide-Column Annotations

Understanding the SPEKTRA Edge Wide-Column Annotations.

Those four resource types in SPEKTRA Edge (TimeSerie, ActivityLog, ResourceChangeLog, Log) are still declared as Goten resources, but have several opt-outs that make them unique. See in SPEKTRA Edge repo:

  • Audit API-Skeleton: audit/proto/api-skeleton-v1.yaml, resources: ActivityLog, ResourceChangeLog
  • Logging API-Skeleton: logging/proto/api-skeleton-v1.yaml, resources: Log
  • Monitoring API-Skeleton: monitoring/proto/api-skeleton-v4.yaml, resources: TimeSerie

See opt-out settings: They don’t have a metadata field, standard CRUD is disabled, and pagination and resource change are also not present. For TimeSerie, we are also disabling the name field (more for historical reasons, to match to Google Stackdriver API in version v3). We also disabled “basicActions” for all (by specifying *), but this option is not necessary, since disabling standard CRUD ensures that no basic actions (like Create, Get, BatchGet…) are generated.

As a result, Goten will not generate anything for these resources in the access or store packages, nor server will have any basics related to them. Some standard types in the resources package will also be missing.

They will still get Descriptor instances, and they will still satisfy the gotenresource.Resource interface (as defined in the runtime/resource/resource.go file in the Goten repo), but for example EnsureMetadata will return nil value. Function SupportsMetadata in the gotenresource.Descriptor interface will return false.

This is how we are taking control of resources from Goten to us, and we can use a different storage type.

5 - SPEKTRA Edge Monitoring Time Series

Understanding the SPEKTRA Edge monitoring service time series.

You can see the proper TimeSeries storage interface in the file monitoring/ts_store/v4/store.go, TimeSeriesStore. We will study here QueryTimeSeries and SaveTimeSeries, which are most important for our case.

TimeSeries are the most special across all widecolumn types: when a user makes queries, the same time series can be completely different: when group by is in use (which is almost always the case!), then time series are merged forming dynamically new ones than submitted! This should be already known from the User guide though.

As per user documentation, it should be clear that we use MonitoredResourceDescriptor and MetricDescriptor instances to create final identifiers of TimeSeries (keys), and a list of promoted indices for faster queries. Before we go into explaining saving/deleting, it is worth checking the initTsData function in the monitoring/ts_store/v4/store.go file. This is where we initialize:

  • tsKeyDesc, which contains the common part of the descriptor for all time series, regardless of labels in metric or resource descriptors
  • Common index prefix, consisting of project and metric.type fields from TimeSerie.

Additional notes:

  • TimeSeries storage has no concept of regional replication, each region contains its data only.
  • This document describes only TimeSeries storage, but not the full monitoring design, which is described separately.

Overall, object TimeSerie is mapped into KeyedValues from WideColumn, but note this is a bit higher level object. Each mapping requires some level of design. This is done in the following way:

  • TimeSerie fields project, region, metric, and resource are mapped to various String key-values, forming together uid.SKey. This SKey is then mapped to some uid.Key, which is a short integer representation. Note that uid.Key is the identifier of KeyedValues.
  • TimeSerie field key is binary encoded uid.Key, which has “compressed” project, region, metric and resource fields.
  • TimeSerie field points, which is a repeated array, is converted into Widecolumn dataentry.ColumnValue, one by one.

Single Point in TimeSerie is mapped to dataentry.ColumnValue like this:

  • TypedValue, which holds an actual value, is mapped to binary data (bytes). We will need to marshal.
  • The timestamp of the key naturally maps to time in dataentry.ColumnValue. Note that AlignmentPeriod is, apart from 0 value, dividable by 60 seconds. This means that the timestamp will be some multiplication of the AP value.
  • Family in dataentry.ColumnValue will contain AlignmentPeriod (in string format). It means that values from a single AP will be stored in separate column families (tables)!
  • Interface dataentry.ColumnValue also has an optional key, in Monitoring, we will store their perSeriesAligner value from Point Aggregation. For example, this will allow us to save ALIGN_RATE (some double) and ALIGN_SUMMARY (some distribution) with the same key, and same timestamp, in the same family column, but next to each other.

In monitoring, we will not save in the database raw points (Alignment period = 0 seconds). We will store only aligned values.

Saving

The writing time series is implemented fully in the monitoring/ts_store/v4/store_writing.go file.

There is a possibility, when we save time series, that the TimeSerie object contains already a key field value, in which case we don’t need to resolve it ourselves. However, we will need to verify it is correct still! The client may choose to submit TimeSeries with binary keys to make a final request a bit smaller (they can drop project, region, metric, and resource fields).

Moreover, from TimeSerie, we can and must get metric and resource descriptors, from which we can compute finally indices. This way, we can wrap KeyedValues into IndexedKeyedValues, as required by widecolumn storage. Note that those two descriptor resources describe what fields are possible to be in TimeSerie object in general, they regulate metric.labels and resource.labels. If we map MonitoredResourceDescriptor/MetricDescriptor as widecolumn store types, they would be mapped to uid.KeyFieldsDescriptor!

In implementation, each TimeSerie we will wrap into CreateTssOperationResult, type defined in the monitoring/ts_store/v4/store.go file, see it. This will contain params of KeyedValues, along with associated descriptor resources. Then, metric and resource descriptor resources will be wrapped together with uid.KeyFieldsDescriptor types, to which they de facto map.

When we save time series, we map TimeSerie into CreateTssOperationResult already in initiateCreateOperations. Inside this function, we validate basic properties of the TimeSerie object, project, metric, and resource type fields. We use the field descriptor tsKeyDesc, which was initialized in the store constructor. At this initial stage, we don’t know the exact metric and resource descriptor types, so we just validate basic properties only! If the binary key is provided, we are initializing descKey instance, otherwise descSKey. The former one is better for performance, but not always possible. Note that at this stage we have described keys, and validated base properties, but descriptors still have work to do.

In the next stage, we grab descriptor references, see getWrappedDescriptors. It does not make any resolutions yet, the same descriptors may be used across many TimeSerie objects, so we don’t want to do more resolutions than necessary. With Goten resource descriptors wrapped with uid.KeyFieldsDescriptor, we are resolving in resolveAndPopulateDescriptors function, where we finally get field descriptors as required in uid format. This will allow us to execute final, proper validation, and compute indices for widecolumn.

Proper validation is done in functions defaultAndValidateHeaders and defaultAndValidatePoints. In the second function, we are also generating final column values used by the widecolumn storage interface! However, note some “traps” in defaultAndValidatePoints:

if ap == 0 && !md.GetStorageConfig().GetStoreRawPoints() {
    continue
}

Raw points, unaligned that clients are sending, with AP equal to 0 seconds, we are skipping saving in the database, second condition is pretty much always evaluated to true! It will be more explained in the monitoring design doc.

With data validated, and with output columns populated, we can now ensure that the output raw key in CreateTssOperationResult is present. If the binary key was not submitted when saving TimeSerie (field key), then we will need to use resolver to allocate string to integer pair. Mapping is of course saved in widecolumn storage. See the ensureOutputKeysAreAllocated function.

Next, with column values and raw keys, we need to wrap KeyedValues into indexed ones IndexedKeyedValues. This is what we finally pass to widecolumn storage. Inside keyed values are duplicated per each index and saved in underlying storage.

Querying

When we query time series we need:

  • Convert time series query params into WideColumn query object (mapping).
  • Create a batch processor, that maps KeyedValues from Widecolumn storage into TimeSerie objects.

This implementation is fully in monitoring/ts_store/v4/store_querying.go.

Widecolumn store, unlike regular Goten document one, supports OR groups, but it is more like executing multiple queries at once. Each query group represents a filter with a set of AND conditions, plus can be executed on different indices. We need to deal with this specific WideColumn interface trait, where we must specify indices when saving and querying.

When executing a query, we gather all input parameters and convert them into a tssQuery object, with tssQueryGroup as one OR group. This is not the format required by widecolumn but by some intermediary. See function createTssQuery.

We support two types of queries for TimeSeries:

  • With filter specifying one or more binary keys (WHERE key = "..." OR WHERE key IN [...]). Each key forms one “OR” group, with just a long list of AND conditions.
  • With filter specifying a set of metric/resource conditions (WHERE metric.type = "..." AND resource.type = "..." ...). However, we also support IN conditions for those types. Resource types may also be omitted optionally (but defaults are assumed then). For each combination of metric + resource type, we create one OR group.

One group query must specify exactly one metric and one resource type, because each combined pair defines its own set of promoted indices, we MUST NOT combine them! This is reflected in createTssQuery.

For each OR group, we are grabbing descriptors, using which we can finally verify if conditions in filters are defined correctly, if a group by fields specifies existing fields, and we can compute indices we know we can query.

Then, we map the query to widecolumn one. From notable elements:

  • We are passing a reducing mask for fields, it will make output uid.Key have some fields “reduced”!
  • We most of the time ignore perSeriesAligner passed by the user, and switch to ALIGN_SUMMARY. It is because when we save, we use almost exclusively ALIGN_SUMMARY. Other types almost always can be derived from summary, so there is no point in maintaining all.

Finally, we execute the query with our batch processor. Thanks to the reducing mask, the field Key for each KeyedValues has already reduced the list of key values, reduced keys are in the rest key set. This is how we implement groupBy in monitoring. Each entry in the resultEntries map field of queryResultProcessor will represent the final TimeSerie object. However, this is a very CPU and RAM-intensive task, because widecolumn storage returns values still as we saved them, all individual instances! If for a single timestamp, we have thousands of entries sharding the same key, then we will merge thousands of points to have one point, and we repeat per each timestamp. At least a wide column guarantees that, when querying, results will be returned only with increasing seq value. If we query with AP = 300s, we will get points for let’s say noon, then 12:05, and so on. When we see the sequence jump, then we know we can add the final point to the increasing list.

Still, this is a serious scaling issue if we try to merge large collections into a single reduced key.

Since perSeriesAligner is different from what we passed to widecolumn compared to what the user requested when we convert ColumnValue into TimeSerie point (function buildPointFromWCColumnValue), we need to extract proper value from the summary object.

Once TimeSerie is obtained, we resolve in bulk all integer keys into a string, and we can form output time series.

6 - SPEKTRA Edge ActivityLogs and ResourceChangeLogs

Understanding the SPEKTRA Edge activity and resource change logs.

The audit store for logs is implemented in the audit/logs_store/v1/store.go file. This interface also uses the WideColumn store under the hood. We are mapping to KeyedValues each ActivityLog and ResourceChangeLog.

Activity Logs

For ActivityLog, mapping is the following:

  • A lot of fields are used to create uid.Key/uid.SKey: scope, request_id, authentication, request_metadata, request_routing, authorization, service, method, resource.name, resource.difference.fields, category, labels.
  • Name field contains scope and binary key uid.Key. However, when logs are saved, this is often not present in the request (value is allocated on the server side).
  • Each event in the events array is mapped to a single dataentry.ColumnValue. Then, we have two additional special Column values: fields resource.difference.before and resource.difference.after.

ActivityLog typically has 3 events: client message, server message, and exit code. It may be much longer for streaming calls, and be pretty long.

This is how ActivityLog Event maps to ColumnValue in WideColumn:

  • The whole event is marshaled into binary data and passed to ColumnValue.
  • Family field is always equal to one static value, ALStoreColumnType. It should be noted that all ActivityLogs use one single ActivityLog column family!
  • We extract the time value from ActivityLog Event and use it as a Time in ColumnValue. Note that in V2 format WideColumn only uses second precision for times!
  • Additional column key will contain event type (client msg, server msg…), and nanoseconds part of the timestamp. This is important if ActivityLog contains streaming messages and we have more than one of a single type within a second! This is how the column key type protects against overwrites.

Pre&Post object diffs from ActivityLogs will be mapped to ColumnValue:

  • The whole object is marshaled into binary data and passed to ColumnValue.
  • Column family is just equal to the const value of ALStoreColumnType.
  • Timestamp is equal to the first event from ActivityLog
  • Column key contains the type (before OR after field values) with nanoseconds from the timestamp. This timestamp is not necessary, this is just to provide a similar format to those of events.

Resource Change Logs

ResourceChangeLogs have the following mapping to KeyedValues:

  • A lot of fields are used to create uid.Key/uid.SKey: scope, request_id, authentication, service, resource.type, resource.name, resource.pre.labels, resource.post.labels.
  • Name field contains scope and binary key uid.Key. However, when logs are saved, this is often not present in the request (value is allocated on the server side).
  • ResourceChangeLogs are a bit unique - but we marshal the whole of them to binary data, and they are forming ColumnValue types.

Each ResourceChangeLog typically has two ColumnValues because we are saving it twice: The first time, before the transaction concludes (so we have a chance to protest before allowing commitment), then after the transaction concludes.

In summary, ColumnValue is formed this way:

  • Binary data contains the whole log marshaled
  • Column family is set to the constant variable value of StoreColumnType.
  • Time is extracted from request time (first client message received).
  • Column key is also used, we have one value StoreColumnPreCommitType when the transaction.state field of ResourceChangeLog is equal to PRE_COMMITTED, otherwise, it is StoreColumnFinalizedType.

If you check the NewStore constructor in the audit/logs_store/v1/store.go file, you will notice that, unlike in monitoring store, we have quite big uid.KeyFieldsDescriptor instances for Resource change and Activity logs, and a ready set of indices, not just a common prefix.

If you analyzed monitoring time series storage querying and writing, then checking the same for Audit logs will be generally simpler. They follow the same principles with some differences:

  • In monitoring we had two descriptors per TimeSeries, in Audit we have one descriptor for activity and another for resource change logs.
  • Specific for resource change logs: We call SaveResourceChangeLogsCommitState and use internally again SaveResourceChangeLogs, which is used for saving logs PRE COMMIT state.
  • For both Activity and ResourceChange logs we don’t require descriptors, usually labels are empty sets anyway, we have already large sets of promoted indices and labels, and this is useful for bootstrap processing when descriptors may not be there yet.
  • When we query resource change logs, we don’t need to resolve any integer keys, because whole logs were saved, see onBatch function. We only need to handle separate entries for the commit state.
  • When we query logs, we will get all logs up to second precision. It means, that even if we have a super large amount of logs in a single second, we cannot split them, continuation tokens (next page tokens) must be using second precision, as required by V2 storage format.
  • Because logs are sorted by timestamp, but with second precision, we need to re-sort again anyway.

Things could be improved with the v3 SPEKTRA Edge wide-column version.

7 - SPEKTRA Edge Logging Store

Understanding the SPEKTRA Edge loggging store.

The logging store for logs is implemented in the logging/logs_store/v1/store.go file. This interface also uses the WideColumn store under the hood. We are mapping to KeyedValues all Log instances. Unlike in Audit, these logs can truly be very long, for one uid.Key mapping we can have logs lasting days, months, and years.

From the Log instance, we extract fields scope, region, service, version, log_descriptor, and labels, which are converted to uid.Key.

Regarding ColumnValue, the whole Log body is marshaled into a binary array, and the rest of the columns are extracted in the following way:

  • Column family is just one for all logs
  • Time is taken from the log timestamp, but we use second precision only.
  • To prevent log overrides, in the column key we store the nanosecond part of the log timestamp.

In this case, we would also benefit from the v3 version where timestamps have nanosecond precision.

Logs saving/querying is like for other described here storages, and source code should be more or less readable.