039: Time series management with Timescale

Time series databases (TSDB) are optimized for storing and processing large amount of telemetry data with regards to time and measurement dimensions.

The weekly mood

Even if I am not yet much involved or my team not always able to contribute, I know there is definitely a lot happening in our overall architecture.

While I sometimes have the impression that our management "only" reacts to market and customer demand, individual teams are definitely striving for Innovation under the hood.

Design Thinking

I heard that one team adopted Design Thinking via the Double Diamond design process model from the British Design Council. In this divergence-convergence model, the first Diamond is usually triggered by a problem and sets the focus on ideas via a Discovery and a Definition phase, whereas the second Diamond sets the focus on solutions via a Development and Delivery phase. The crossing point of the two Diamonds is called the Design brief, basically a collection of documents that is typically used for sharing a problem statement with other teams, getting approval on further investment from management, or handing-over implementation to another instance.

The Design brief is not just a decision, it is a point-of-non-return. I also figured-out that this is precisely the latest point in time where We, as an architecture team, are actually involved. At this point you not only have to be smart enough to understand the topic and what is actually going on since weeks or months, but also assertive enough to ask the right Questions or raise the right Concerns. This is the sake of an Architect to always be prepared to this, look over the edge of the plate, and eventually be seen as a living policy.

Telemetry storage

As part of our Observability track, our engineering team has been looking for a storage system allowing us to store and analyse a large volume of historical telemetry data collected externally. The solution should preferably be available as a managed service.

As we know different kinds of storage systems exist, and for good reasons:

Relational Database (RD)

organizes physical data in a self-balanced B-tree
represents the state of objects and relations in a physical table
offers the user a logical modelling and query layer plus use-case specific performance optimizations (OLTP vs OLAP)
ex. PostgreSQL, Oracle DB, MS SQL Server, Snowflake, Clickhouse

Distributed File System (DFS)

organizes schemaless data in physical blocks or logical objects
Schema-on-Read approach based on columnar file formats (ex. Parquet)
Need for an additional access / query layer (ex. Hive, Druid, Delta, Athena)
ex. HDFS, AWS S3, Azure Storage, GCP Storage

Non-relational Database (NoSQL)

distributed systems with own domain (Logs, Documents, Key-Value, Graphs) query language
organize physical data in Log-Structured Merge-tree (LSM)
good at partitioning but lacks of maturity for indexing and aggregating data
ex. ElasticSearch, BigQuery, MongoDB, Cassandra, RavenDB

Time Series Database (TSDB)

using timely sorted records to represent how a behaviour changes over time
optimized for storing and reading timestamped machine data (ex. metrics) at minimal cost
ex. Influx, Timescale, Warp10, Graphite, Amazon Timestreams (Preview)

We intentionally let (expensive and complex) in-memory databases appart from the discussion, as well as other technologies already part of our architecture but designed to solve different problems e.g.:

Prometheus used in conjonction with Thanos for scraping internal metrics
Kafka used for proxying and streaming external metrics

In terms of managed service, note that the following providers offer fully managed, high-end observability platforms: Splunk, Datadog, NewRelic, Azure Time series Insights.

I also truly like the minimalistic approach of Grafana Cloud but it is currently lacking a reliable Kafka subscriber as a relevant data source for our particular use-case.

As a result of the Discovery phase, a shortlist has been created and assessed from a number of different options. The team is now starting a Proof-of-Concept (PoC) with the best candidate: Timescale.

What is Timescale?

Timescale is the company behind TimescaleDB, an open-source TSDB created in 2017, written in C-language and based on PostgreSQL (i.e. TimescaleDB is actually a PostgreSQL extension). In addition to the community software version, Timescale Cloud is a managed database service offering that is hosted in the Cloud infrastructure of your choice among different AWS, Azure and GCP regions.

TimescaleDB optimizes throughput on both primary (ex. timestamp) and secondary index key (ex. identifier) via vertical scalability (scale-up) and swap prevention. It natively supports compression and applys a physical chunk storage technique called "time/space chunking"based on adaptive intervals. The user always interacts with the logical model via what they call hypertables.

Scale-up

Current version (1.X) officially supports single-node deployments only. It has passable (but not high) ingestion performance (according to Timescale own benchmark against PostgreSQL, over 1 billion rows datasets at a constant rate of 100 thousand inserts per second).

TimescaleDB can be setup in Kubernetes in order to achieve High-Availability (HA). Each instance of TimescaleDB HA image holds a database replica and runs a Patroni agent used for master election. An external Load Balancer (LB) is required for routing all incoming trafic to the current master, as well as an external file storage for backups.

After all, it feels a bit expensive to have so much redundancy through both replicas and backups on the storage layer, but no fault-tolerance and resiliency on the computation layer.

Scale-out

Developmemt version supports multi-node deployments and horizontal scalability. Let us test the experimental setup of TimescaleDB v2beta version in a distributed mode, which will be part of the managed offering coming in the next couple of months, and was recently announced to stay open.

First create a dedicated namespace.

$ kubectl create namespace timescaledb
namespace/timescaledb created kubens timescaledb
Active namespace is "timescaledb".

Clone the project source and install chart release.

$ git clone git@github.com:timescale/timescaledb-kubernetes.git
Cloning into 'timescaledb-kubernetes'...
$ helm upgrade --install my-release \
    timescaledb-kubernetes/charts/timescaledb-multinode

There are a number of Helm parameters documented by the administration guide. The default release is creating 3 datanodes and 1 access node as part of Kubernetes Statefulsets (volumes claimed at restart).

$ kubectl get pods,svc,statefulsets -l release=my-release
NAME                                READY   STATUS      RESTARTS   AGE
pod/my-release-timescaledb-access-0 1/1     Running     0          18m
pod/my-release-timescaledb-data-0   1/1     Running     0          18m
pod/my-release-timescaledb-data-1   1/1     Running     0          17m
pod/my-release-timescaledb-data-2   1/1     Running     0          17m

NAME                                  TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
service/my-release-timescaledb        LoadBalancer   10.152.183.149   <pending>     5432:30600/TCP   18m
service/my-release-timescaledb-data   ClusterIP      None             <none>        5432/TCP         18m

NAME                                             READY   AGE
statefulset.apps/my-release-timescaledb-access   1/1     18m
statefulset.apps/my-release-timescaledb-data     3/3     18m

Nice looking log ;-)

$ kubectl logs my-release-timescaledb-data-0
	WELCOME TO
	 _____ _                               _     ____________  
	|_   _(_)                             | |    |  _  \ ___ \ 
	  | |  _ _ __ ___   ___  ___  ___ __ _| | ___| | | | |_/ / 
	  | | | |  _ ` _ \ / _ \/ __|/ __/ _` | |/ _ \ | | | ___ \ 
	  | | | | | | | | |  __/\__ \ (_| (_| | |  __/ |/ /| |_/ /
	  |_| |_|_| |_| |_|\___||___/\___\__,_|_|\___|___/ \____/
	               Running version 2.0.0-beta5

We can now connect to TimescaleDB using a regular PostgreSQL client.

$ psql -h 10.152.183.149 -U postgres
Password for user postgres: tea
psql (10.12 (Ubuntu 10.12-0ubuntu0.18.04.1), server 12.3 (Debian 12.3-1.pgdg100+1))Type "help" for help.
postgres=# \dx
           List of installed extensions
    Name     |   Version   |   Schema   |    Description                            
-------------+-------------+------------+-------------------------------------
 plpgsql     | 1.0         | pg_catalog | PL/pgSQL procedural language
 timescaledb | 2.0.0-beta5 | public     | Scalable inserts and complex queries
(2 rows)

postgres=# CREATE DATABASE timescale;
CREATE DATABASE
postgres=# \l
           List of databases
   Name    |  Owner   | Encoding | Collate |  Ctype  
-----------+----------+----------+---------+---------
 example   | postgres | UTF8     | C.UTF-8 | C.UTF-8 
 postgres  | postgres | UTF8     | C.UTF-8 | C.UTF-8 
 timescale | postgres | UTF8     | C.UTF-8 | C.UTF-8 
(3 rows)

postgres=# \q

Prometheus connector

In a previous post about Kubernetes monitoring with Prometheus, we looked at what it takes to collect, aggregate and visualize metrics stored by default on-disk. Let us now have a look at the Prometheus integration for Timescale, in order to persist and analyse long-term internal metrics.

We'll start from last achievement with Grafana already connected to Prometheus.

$ microk8s.enable prometheus

Clone project source and install chart release. The connector expects the database hostname and user credentials.

$ git clone git@github.com:timescale/timescale-prometheus.git# small change to fit my current k8s version
$ sed -i 's/apiVersion: v2/apiVersion: v1/' \
    timescale-prometheus/helm-chart/Chart.yaml

# disable ssl mode between connector and timescaledb
$ sed -i 's/value: require/value: disable/' \
    timescale-prometheus/helm-chart/templates/deployment-connector.yaml

# a secret is mandatory for the connector to connect to timescaledb
$ kubectl create secret generic my-release-secret --from-literal=postgres=tea
$ helm upgrade --install my-release-obs timescale-prometheus/helm-chart \
  --set connection.password.secretTemplate="my-release-secret" \
  --set connection.host.nameTemplate="my-release-timescaledb.timescaledb.svc.cluster.local"

The connector now integrates Timescale to (re-)expose Prometheus HTTP API via the URL http://my-release-obs-timescale-prometheus-connector.timescaledb.svc.cluster.local:9201/api/v1. The endpoint can be consumed as a Prometheus data source by our Grafana dashboards (without /api/v1). Alternatively, your may prefer to directly connect to the PostgreSQL data source. However, first option lets you take advantage of Prometheus model simplicity while automatically converting a PromQL query to a SQL query that runs on a distributed execution engine in the background.

Last step is to instruct Prometheus to use our connector. The prometheus.yaml configuration is contained by the Kubernetes secret prometheus-k8s in gzip compressed form.

$ kubectl -n monitoring get secret prometheus-k8s \
    -o 'go-template={{index .data "prometheus.yaml.gz"}}' \
    | base64 -d | gzip -dk > prometheus.yaml
$ head -10 prometheus.yaml
global:
  evaluation_interval: 30s
  scrape_interval: 30s
  external_labels:
    prometheus: monitoring/k8s
    prometheus_replica: $(POD_NAME)
rule_files:
- /etc/prometheus/rules/prometheus-k8s-rulefiles-0/*.yaml
scrape_configs:
- job_name: monitoring/alertmanager/0

We just need to extend the configuration file with remote_write section to store metrics, with remote_read to query them, and then path the secret with the new configuration.

$ cat << EOF >> prometheus.yaml 
remote_write:
  - url: "http://my-release-obs-timescale-prometheus-connector.timescaledb.svc.cluster.local:9201/write"

remote_read:
  - url: "http://my-release-obs-timescale-prometheus-connector.timescaledb.svc.cluster.local:9201/read"
EOF
# prevent prometheus operator to overwrite custom-configuration
$ kubectl -n monitoring patch prometheus k8s --type="json" \
    -p '[{"op": "remove", "path": "/spec/serviceMonitorSelector"}]'
$ kubectl -n monitoring patch secret prometheus-k8s \
    -p "{\"data\":{\"prometheus.yaml.gz\":\"$(gzip -c prometheus.yaml | base64 -w 0)\"}}"

If we switch any USE dashboard from prometheus to prometheus-timescale datasource, we can see metrics starting to flow in. If we have a look at the database, we can see that a couple of tables (and a lot of views) were created.

$ psql -h 10.152.183.176 -U postgres -d timescalePassword for user postgres: teapsql (10.12 (Ubuntu 10.12-0ubuntu0.18.04.1), server 12.3 (Debian 12.3-1.pgdg100+1))timescale=# \dt                     List of relations    Schema     |          Name          | Type  |  Owner   ---------------+------------------------+-------+---------- _prom_catalog | default                | table | postgres _prom_catalog | label                  | table | postgres _prom_catalog | label_key              | table | postgres _prom_catalog | label_key_position     | table | postgres _prom_catalog | metric                 | table | postgres _prom_catalog | series                 | table | postgres public        | prom_installation_info | table | postgres public        | prom_schema_migrations | table | postgres(8 rows)

For better performance although the connector works fine as is, Timescale recommends to additionaly setup Timescale-Prometheus Extra Extension...

$ helm upgrade --install my-release \
    timescaledb-kubernetes/charts/timescaledb-multinode/ \
    --set image.repository="timescaledev/timescale_prometheus_extra" \
    --set image.tag="latest-pg12"

...which is unfortunately not supported on v2beta so far:

$ head -2 timescale-prometheus/extension/Dockerfile ARG PG_VERSION_TAG=pg12ARG TIMESCALEDB_VERSION=1.7.1

Kafka connector

Our external metrics do not come from Prometheus but actually from a Kafka topic.

Confluent maintains a JDBC sink connector for Kafka connect, basically a runtime service which has already been validated by Timescale here.

There is also a community adapter allowing to directly ingest metrics from Kafka to TimescaleDB, but the project does not seem to be much active.

Take-away

Finally, we may clean the environment.

$ helm del --purge my-release-obs
$ helm del --purge my-release
$ microk8s.disable prometheus

To conclude, Timescale might be a good choice for storing and analysing a large amount of metrics, provided that the managed service is able to scale out and integrate in our infrastructure at a reasonable cost. However, i noticed that it is lacking support for real-time streaming, as well as Azure/EMEA as one potential strategic combination of IaaS/Region. At the end this might be ok for us, but we definitely need to be aware of that.

The newbie cloud architect diary

Search This Blog

039: Time series management with Timescale

Labels

Comments

Post a Comment