SPEKTRA Edge Monitoring Pipeline Design

Understanding the SPEKTRA Edge monitoring pipeline design.

With the data layout of the wide-column data store explained in the previous wide-column store page, let’s talk about the monitoring pipeline aspect of the SPEKTRA Edge monitoring system.

Unlike in Logging or Audit services, usage of WideColumn is not the only specific trait of Monitoring TimeSerie resource.

When a client submits a TimeSerie object, the point values match those declared in the metric descriptor. For example, if we have something like:

- name: ...
  type: ...
  displayName: ...
  metricKind: GAUGE
  valueType: INT64
  # Other fields...

Then, given TimeSerie will have points from single writer (let’s assume that it sends one point per 30 seconds):

points:
- interval:
    endTime: 12:04:24
  value:
    int64Value:
      123
- interval:
    endTime: 12:04:54
  value:
    int64Value:
      98
- interval:
    endTime: 12:05:24
  value:
    int64Value:
      121
- interval:
    endTime: 12:05:54
  value:
    int64Value:
      103
- interval:
    endTime: 12:06:24
  value:
    int64Value:
      105
- interval:
    endTime: 12:06:54
  value:
    int64Value:
      106

However, unlike logs, querying will not return the same data points, in fact, it is likely not possible at all, unless we enable raw storage (unaligned). Request QueryTimeSeries typically requires an aggregation field provides, with alignmentPeriod ranging from one minute to one day, and perSeriesAligner equal to some supported value, like ALIGN_SUMMARY, ALIGN_MEAN etc. For example, if we cuttle monitoring service like:

cuttle monitoring query time-serie \
  --project '...' \
  --filter '...' \
  --interval '...' \
  --aggregation '{"alignmentPeriod":"60s","perSeriesAligner":"ALIGN_SUMMARY"}' \
  -o json | jq .

Then, for these points, we should expected output like:

points:
- interval:
    endTime: 12:05:00
  value:
    distributionValue:
      count: 2
      mean: 110.5
      sumOfSquaredDeviation: 312.5
      range:
        min: 98
        max: 123
      bucketOptions:
        dynamicBuckets:
          compression: 100.0
          means: [98, 123]
      bucketCounts: [1, 1]
- interval:
    endTime: 12:06:00
  value:
    distributionValue:
      count: 2
      mean: 112
      sumOfSquaredDeviation: 162
      range:
        min: 103
        max: 121
      bucketOptions:
        dynamicBuckets:
          compression: 100.0
          means: [103, 121]
      bucketCounts: [1, 1]
- interval:
    endTime: 12:07:00
  value:
    distributionValue:
      count: 2
      mean: 105.5
      sumOfSquaredDeviation: 0.5
      range:
        min: 105
        max: 106
      bucketOptions:
        dynamicBuckets:
          compression: 100.0
          means: [105, 106]
      bucketCounts: [1, 1]

Note that:

  • All points across one-minute intervals are merged into distributions.
  • Point at 12:07:00 contains all data points from 12:06:00.001 till 12:07:00.000.
  • Distribution type is much more descriptive than other types. For example, if we queried from ALIGN_MEAN, then we would get doubleValue instead of distributionValue, with mean values only.
  • We can specify more APs: three minutes, five minutes, 15 minutes, …, 1 day. Each larger value is going to have more data points merged.

If you check file monitoring/ts_store/v4/store_writing.go, you should note that:

  • Each AlignmentPeriod has its own Column Family:

    ap := dp.GetAggregation().GetAlignmentPeriod().AsDuration()
    cf := tss.columnFamiliesByAp[ap]
    
  • We typically don’t store raw data points (AP = 0):

    if ap == 0 && !md.GetStorageConfig().GetStoreRawPoints() {
        continue
    }
    
  • Now, when we query (monitoring/ts_store/v4/store_querying.go), we query for specific column family:

    colFamily := tss.columnFamiliesByAp[
      q.aggregation.GetAlignmentPeriod().AsDuration(),
    ]
    if colFamily == "" {
        return nil, status.Errorf(
          codes.Unimplemented,
          "unsupported alignment period %s",
          q.aggregation.GetAlignmentPeriod(),
        )
    }
    
  • When we query, we are changing per series aligner from query to other type in storage (function createWCQuery).

To summarize, the data that we query, and the data that the client submits are not the same, and this document describes what is going on in Monitoring service.


Monitoring Pipeline Data Transformation

Understanding the SPEKTRA Edge monitoring pipeline data transformation.

Monitoring Pipeline Streaming Jobs

Understanding the SPEKTRA Edge monitoring pipeline streaming jobs.