Table design

Before you start building data tables with Chainslake, we encourage you to read this document. It will provide you with important information about data table design, which is what makes Chainslake effective in managing and processing data.

Where is the data stored?

The data in the Chainslake tables is stored in HDFS (you can read more about it here). The reason for this choice is because of HDFS’s unlimited capacity, fault tolerance, and low cost. In the Cloud version (to be deployed in the future), Chainslake will use AWS S3, which is fully compatible with the current system.

Using Delta table format

This is an open source table format, developed by Databricks, you can read more here. By applying optimization techniques on Delta tables, querying data on Chainslake tables is very fast and high performance even on very large tables.

Track data processing progress on each table

This feature helps Chainslake always know exactly where each table’s data is being processed, completely eliminating the possibility of duplicate or missing data. Additionally, this helps Chainslake synchronize data between tables before they are joined together, ensuring the accuracy of the results.

Frequent type of table

Indicates how often the table is updated. Configured by the user when the table is built and executed by Chainslake. There are 4 frequent types: block, minute, hour, day. In which block uses block number to track the progress of the table, while minute, hour, day use timestamp in number.