SPEKTRA Edge Wide-Column Versions

Understanding the SPEKTRA Edge wide-column versions.

We have right now two supported storage versions: v2 and v3. Version v2 is old and has lots of influence from the Monitoring service that had widecolumn implementation internally for its private use. Later Audit and Logging services were added but v2 still specifies some fields relevant for monitoring only. It was written also having bigtable in mind, where you can query by not only row range, but column values range. This is however not efficient with ScyllaDB and likely other CQL-based DBs. Moreover, this is not needed, a query from monitoring only needs data from a specific column key, and Audit/logging always queries by whole range only. There is no plan to utilize this feature, so it is better to have it removed. On top of that, sequence numbers in v3 have nanosecond precision, while in v2 only seconds. This may allow us to drop timestamps from column values completely and invalidates the reason why we have many column values in Audit/Logging anyway.

Summary:

v3 has Sequence in nanosecond precision, v2 in seconds.
Row in v2 contains: promoted key, tail key, sequence number, empty Aggregation key, empty start timestamp, and alignment period. v3 has promoted key, tail key, and sequence number only.
Column Value in v2 has extra elements removed in v3: Alignment period, start time.
Query in v2 allows to specify column key range, v3 not. We are using this feature right now in Monitoring (Aligner is used to specify the column to pick the correct value type). However, we don’t specify a multi-value range, column range start is equal to column value range, always. Therefore, for v3, to keep the same functionality, we need to move Aligner as part of Key in KeyedValues.

TODO:

Monitoring, Audit, and Logging write/query to v2. We need to implement v3 implementations for these.
All three services will need to double-write for some time. Queries should be executed on V3 when possible, but v2 should be used as a fallback. Writes should be executed on both v2 and v3. This is to allow for a smooth transition from v2 to v3. We should use the config file to indicate when v2 should be written or from which data point v3 is valid. Specific stores (like TimeSeriesStore) should internally have two references to StoreV2 & StoreV3 of widecolumn.