Skip to content

TSXor: A Simple Time Series Compression Algorithm

Notifications You must be signed in to change notification settings

andybbruno/TSXor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TSXor: A Simple Time Series Compression Algorithm

Developed @ ISTI CNR - HPC Lab (Pisa)

Full paper here

Code

Introduction

TSXor, a simple yet effective encoder/decoder for time series that achieves high compression ratios and fast decompression speed. TSXor exploits the redundancy/similarity between close-in-time values through a window that acts as a cache.

How it works

Window

TSXor compares each vn value with its preceding W ≤ 127 values, logically corresponding to the values seen in the time range [tn−W , tn−1]. The goal is to compress vn relative to this “window” containing the previous W values. We distinguish between 3 cases, namely Reference, XOR, and Exception.

Reference

If vn is equal to a value in the window, just output its position p in the window. Since the window contains at most 127 values, 1 byte suffices to write the position with the most significant bit always equal to 0.

If the window does not contain vn, then we search for the value u in the window such that x = vnu has the largest number of leading (LZ) and trailing (TZ) zeros bytes. Let p be the position of u in the window. We first write p + 128 using 1 byte. In this case the most significant bit will always be 1 because of sum, which allows us to distinguish this case from the Reference case.

XOR

If LZ +TZ ≥ 2, we output a byte where 4 bits are dedicated to TZ and the other 4 bits to the length (in bytes) of the segment of x between the leading and trailing zero bytes. We then write such middle bytes.

Exception

We output an exception code, i.e., the value 255 using 1 byte, followed by the plain double-precision representation of vn using 8 bytes.

Builiding the code

The code has been tested both on Linux and MacOS.

No dependencies are needed.

Just clone this repo and execute:

make all

Input data format

The algorithm can process any .csv file containing numbers only. You need first to convert the .csv into a .bin file using the csv_to_bin utility as follows:

cd util

./csv_to_bin.o path/to/MY_DATASET.csv

Please note: the first column will be interpreted as the timestamp, the rest will be interpreted as values.

Run

Compression

To run a compression test of a .bin file, execute the following commands:

cd test

./compression.o path/to/MY_DATASET.bin

This will produce a file called compressed_data.tsx

Decompression

To decompress the file compressed_data.tsx, run the command:

./decompression.o

Benchmarks

The following tables show the comparison between TSXor with respect to Gorilla by Facebook and FPC by Burtscher and Ratanaworabhan. The experiments were run on an Ubuntu 18.04 machine with Intel i7-7700 CPU @ 3.60GHz.

Compression Speeds (MB/s)

FPC Gorilla TSXor
AMPds2 339,28 703,72 66,59
Bar Crawl 423,71 466,49 28,74
Max-Planck 313,40 870,58 51,74
Kinect 166,28 696,10 17,14
Oxford-Man 170,27 630,33 15,43
PAMAP 181,59 521,41 45,05
UCI Gas Sensor Array 286,94 654,32 21,93

Decompression Speeds (MB/s)

FPC Gorilla TSXor
AMPds2 411,29 666,52 1173,65
Bar Crawl 436,12 447,42 709,68
Max-Planck 355,30 858,68 1057,00
Kinect 287,18 635,74 665,47
Oxford-Man 221,80 573,67 604,54
PAMAP 223,86 487,41 949,28
UCI Gas Sensor Array 454,91 578,41 642,40

Compression Ratios

FPC Gorilla TSXor
AMPds2 1,10x 2,03x 6,39x
Bar Crawl 1,20x 1,44x 2,36x
Max-Planck 1,06x 2,97x 4,84x
Kinect 1,09x 1,41x 1,37x
Oxford-Man 1,06x 1,28x 1,30x
PAMAP 1,01x 1,38x 4,85x
UCI Gas Sensor Array 1,19x 1,23x 3,50x

Authors

Notes

This is a beta version. Use it at your own risk.