Skip to content

Quickstart guide

Today you're going to record page clicks for a website. You won't store the page's url, instead you're going to use an integer to represent each page. In TSGrid all identifiers (pages, smart meters, IOT devices etc) are referred to as sensors and the sensor id must be an Integer. We will assume you already have a relational database system which stores metadata about each page, along with a reference to the sensor id used in TSGrid.

Postman collection

We have prepared a Postman v2 collection which contains the HTTP queries used in this guide. You can download and import it, then follow the steps below:

https://tsgrid-postman.s3.eu-west-2.amazonaws.com/Quickstart.postman_collection.json

Installation

TSGrid is a JVM based application (written in Scala). For production use we recommend a stable Linux distro.

System requirements

  • OSX or Linux
  • JDK 8 or 11

Danger

There are known bugs in versions of OpenJDK 11 prior to 11.06. We therefore recommend running on the latest 11.06+ version if using OpenJDK

Download

Download the binary from our Github repository. Look for the releases page and download a TSGrid-xxx.dist.tar.gz or TSGrid-xxx.dist.zip archive

Binary install

  1. Install a suitable JDK and set the JAVA_HOME environment variable
  2. Extract the TSGrid-xxx.dist.tar.gz or TSGrid-xxx.dist.zip archive
  3. Set the TSGRID_ROOT_DB environment variable e.g. /var/tsgrid_db
  4. Run the bin/tsgrid-server script

You should see output similar to this in your console:

[info] o.h.b.c.nio1.NIO1SocketServerGroup INFO    - Service bound to address /0:0:0:0:0:0:0:0:8080
[info] o.h.server.blaze.BlazeServerBuilder INFO    -
[info]   _   _   _        _ _
[info]  | |_| |_| |_ _ __| | | ___
[info]  | ' \  _|  _| '_ \_  _(_-<
[info]  |_||_\__|\__| .__/ |_|/__/
[info]              |_|
[info] o.h.server.blaze.BlazeServerBuilder INFO    - http4s v0.21.2 on blaze v0.14.11 started at http://[::]:8080/

Note

There are many other deployment options for TSGrid including daemon mode, docker, kubernetes etc

Creating a database

Make a POST request to http://localhost:8080/db passing a JSON document which describes the new database:

Request:

POST /db HTTP/1.1
Host: localhost:8080
Content-Type: application/json

{
  "name": "page-visits",
  "reading_type": "instant",
  "data_type": "int",
  "ttl_timestamp": "ingestion_time",
  "resolutions": {
    "raw": {
      "ttl": 5
    },
    "hourly": {
      "ttl": 90
    },
    "daily": {
      "ttl": -1
    }
  },
  "measurements": {
    "clicks": {
      "aggregation": "sum"
    }
  },
  "rollup_tz_offset": "utc"
}

Response:

Content-Type: application/json
Content-Length: 20

{
  "status": "success"
}

Let's look at what these properties mean:

  1. name - The database name. The name will be used as an identifier in subsequent REST calls, so it should be short, meaningful and composed of only alphanumeric characters, - and _

  2. reading_type - You are passing a reading type of instant which means each data point has a single timestamp.

  3. data_type - You use an int to store a count of page clicks.

  4. ttl_timestamp - Each reading (including aggregated readings) can have a time to live associated with it. You can use either event_time (the timestamp when the measurement was taken) or ingestion_time (the time it was inserted into TSGrid).

  5. resolutions - The raw data (the data you ingest) will be purged after 5 days. TSGrid will automatically roll-up the raw data into hourly and daily "buckets". The hourly data will be purged after 90 days, and the daily data will never be purged.

  6. aggregation - TSGrid will sum the values during the roll-up process, so the daily aggregation will represent the total clicks during the day.

  7. rollup_tz_offset - You passed a value of utc, so the system will convert all timestamps into utc before aggregating/rolling-up the data. If you query for a "day" of data you will get the total number of page clicks between midnight utc (inclusive) and midnight utc (exclusive)

Inserting some data

You will insert 2 readings for each page. You insert data into TSGrid by making a PUT request to /db/{db_name}/data

Request:

PUT /db/page-visits/data HTTP/1.1
Host: localhost:8080
Content-Type: application/json

[
  {
    "sensor_id": 1,
    "time": "2020-01-01T00:00:00Z",
    "values": {
      "clicks": 1
    }
  },
  {
    "sensor_id": 1,
    "time": "2020-01-01T01:00:00Z",
    "values": {
      "clicks": 1
    }
  },
  {
    "sensor_id": 2,
    "time": "2020-01-01T00:00:00Z",
    "values": {
      "clicks": 1
    }
  },
  {
    "sensor_id": 2,
    "time": "2020-01-01T01:00:00Z",
    "values": {
      "clicks": 1
    }
  }
]

Response:

Content-Type: application/json
Content-Length: 20

{
  "status": "success",
  "readings_inserted": 4
}

Pre-aggregating data

TSGrid automatically rolls-up/pre-aggregates data into different resolutions in the background. The schedule is configurable but by default it is set to aggregate data every hour. You don't want to wait an hour, so you will force the aggregation to happen by making a POST request to /admin/{db_name}/aggregate?sync=true

Note

TSGrid can also aggregate on the sensor axis - e.g. taking all the 09:00 readings and rolling them into a single 09:00 reading. This aggregation happens on the fly during the query call.

Request:

POST /admin/page-visits/aggregate?sync=true HTTP/1.1
Host: localhost:8080
Content-Type: application/json

Response:

Content-Type: application/json
Content-Length: 20

{
  "status": "success",
  "readings_aggregated": 4
}

Tip

The sync parameter tells TSGrid to block until the aggregation is complete. Obviously you should not do this on a production database as the request will surely timeout!

Querying

You query for data by making a GET request to /db/{db_name}/query, passing a JSON document in the body of the request. TSGrid will return a stream of readings for each sensor, using HTTP chunked encoding:

Raw data

Raw data can be queried by passing a resolution of Raw:

Request:

GET /db/page-visits/query HTTP/1.1
Host: localhost:8080
Content-Type: application/json

{
  "sensor_ids": [1,2],
  "from": "2020-01-01T00:00:00Z",
  "until": "2020-01-03T00:00:00Z",
  "resolution": "raw",
  "group_results": false
}

Let's look at the query parameters to understand what they mean

  1. sensor_ids - Similar to a SQL IN clause. We want to fetch data for both pages
  2. from - Page clicks that happened on or after midnight 2020-01-01
  3. until - Page clicks that happened before midnight 2020-01-03
  4. resolution - We want the raw data we ingested
  5. group_results - TSGrid could aggregate the metrics for both pages together to produce site wide totals. In this case we want per page metrics, so we set each this property to false.

Response:

Content-Type: application/json
Transfer-Encoding: chunked

[
  {
    "sensor_id": 1,
    "time": "2020-01-01T00:00:00Z",
    "values": {
      "clicks": 1
    }
  },
  {
    "sensor_id": 1,
    "time": "2020-01-01T01:00:00Z",
    "values": {
      "clicks": 1
    }
  },
  {
    "sensor_id": 2,
    "time": "2020-01-01T00:00:00Z",
    "values": {
      "clicks": 1
    }
  },
  {
    "sensor_id": 2,
    "time": "2020-01-01T01:00:00Z",
    "values": {
      "clicks": 1
    }
  }
]

Rolled-up data

Pre-aggregated data can be queried by passing a resolution other than raw. Aggregated readings will always be returned with from and until timestamps. When you initially created the database, you told TSGrid to use raw, hourly and daily resolutions. The raw data is unchanged but TSGrid will now have rolled-up the data into hourly and resolutions. You will now ask for the data at daily resolution.

Note

This should not be confused with aggregation on the sensor axis which is specified by the group_results property in the query. In this case you passed a group_results property of false. You are therefore expecting one daily reading for each web page.

Request:

GET /db/page-visits/query HTTP/1.1
Host: localhost:8080
Content-Type: application/json

{
  "sensor_ids": [1,2],
  "from": "2020-01-01T00:00:00Z",
  "until": "2020-01-03T00:00:00Z",
  "resolution": "daily",
  "group_results": false
}

Response:

Content-Type: application/json
Transfer-Encoding: chunked

[
  {
    "sensor_id": 1,
    "from": "2020-01-01T00:00:00Z",
    "until": "2020-01-02T00:00:00Z",
    "values": {
      "clicks": 2
    }
  },
  {
    "sensor_id": 2,
    "from": "2020-01-01T00:00:00Z",
    "until": "2020-01-02T00:00:00Z",
    "values": {
      "clicks": 2
    }
  }
]

Grouping sensors

The queries you have performed so far have always been per sensor (web page). You know how many people clicked on page 1 and page 2, and you've queried the daily totals for each page. TSGrid can also aggregate on the sensor axis, we call this a grouped result. When you set the group_results flag to true TSGrid will take the per sensor results (at whatever resolution you specify) and aggregate them together. You will now query for daily site wide page clicks between 2020-01-01 and 2020-01-02:

Request:

GET /db/page-visits/query HTTP/1.1
Host: localhost:8080
Content-Type: application/json

{
  "sensor_ids": [1,2],
  "from": "2020-01-01T00:00:00Z",
  "until": "2020-01-03T00:00:00Z",
  "resolution": "daily",
  "group_results": true
}

Response:

Content-Type: application/json
Transfer-Encoding: chunked

[
  {
    "from": "2020-01-01T00:00:00Z",
    "until": "2020-01-02T00:00:00Z",
    "values": {
      "clicks": 4
    }
  }
]

There are a couple of things to note here:

  1. The page/sensor level results have been aggregated together, and you now see 40 total page clicks
  2. There is no sensor_id field in the response as the results represent an aggregation of all pages/sensor_ids passed in the query

Next steps

Checkout the full documentation to learn more about types of readings, aggregation modes, timezone support etc