Tutorial 1: Build data app to get data from Bitcoin chain

This article will guide you step by step to build a Data App to get data from Bitcoin including environment setup, programming, testing locally and submitting the app.

Content

Environment Setup
Data builder Setup
Build Bitcoin data app
Create job and add it to pipeline
Create PR to review your App

Environment Setup

To start building a Data App, you need to install some software on your computer. Since your application will be running on the Chainslake platform, to avoid conflicts, we recommend using the following recommended software versions:

openjdk 11

# Add in your ~/.bashrc
export JAVA_HOME=<YOUR_PATH_JDK>
export PATH=$PATH:$JAVA_HOME/bin 

Check java version

$ java --version
openjdk 11.0.24 2024-07-16
OpenJDK Runtime Environment Temurin-11.0.24+8 (build 11.0.24+8)
OpenJDK 64-Bit Server VM Temurin-11.0.24+8 (build 11.0.24+8, mixed mode)

Install Coursier then install Scala and SBT by this command

$ cs install scala:2.12.18 && cs install scalac:2.12.18
$ cs install sbt:1.9.0

Download and install Spark 3.5.1

$ wget https://archive.apache.org/dist/spark/spark-3.5.1/spark-3.5.1-bin-hadoop3.tgz
$ tar -xzvf spark-3.5.1-bin-hadoop3.tgz

# Add in your ~/.bashrc
export SPARK_HOME=~/installs/spark-3.5.1-bin-hadoop3
export PATH=$PATH:$SPARK_HOME/bin

Test spark

$ spark-shell
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.5.1
      /_/
         
Using Scala version 2.12.18 (OpenJDK 64-Bit Server VM, Java 11.0.24)
Type in expressions to have them evaluated.
Type :help for more information.

You can use any IDE you want, however we recommend using Visual Studio Code together with Intelij

Data builder Setup

To start programming, you need to use the Data-builder tool. This is the repository managed by Chainslake, you will use this repository to develop and submit your app to the Chainslake platform. You just need to add this folk repository to your account then clone the code from this new repo to your computer and you can start developing your app.

$ git clone [email protected]:<YOUR_GITHUB_USERNAME>/data-builder.git
$ cd data-builder 
$ code # Open data-builder in Visual Studio Code

Open data-builder/spark directory in Intelij IDE, Intellij will automatically recognize the project configuration and download the necessary libraries.

Now, let’s start programming the Data App to get Bitcoin data!!!

Build Bitcoin data app

Create Bitcoin package

Create bitcoin package inside chainslake package then create two object Main and JobFactory like bellow:

Main.scala:

package chainslake.bitcoin
import org.apache.spark.sql.SparkSession
import java.io.FileInputStream
import java.util.Properties

object Main {
  def main(args: Array[String]) {
    val properties = new Properties
    val spark = SparkSession.builder
      .enableHiveSupport()
      .getOrCreate()

    val configFile = spark.conf.get("spark.app_properties.chainslake_home_dir") + "/jobs/" + spark.conf.get("spark.app_properties.config_file")
    properties.load(new FileInputStream(configFile))

    spark.conf.getAll.foreach(pair => {
      if (pair._1.startsWith("spark.app_properties")) {
        properties.setProperty(pair._1.substring(21), pair._2)
      }
    })

    JobFactory.createJob(properties.getProperty("app_name")).run(spark, properties)
    spark.stop()
    print("Application stopped!!!")

  }
}

JobFactory.scala:

package chainslake.bitcoin
import chainslake.bitcoin.origin.TransactionBlocks
import chainslake.job.JobInf

object JobFactory {
  def createJob(name: String): JobInf = {
    name match {
    }
  }
}

Create package origin inside bitcoin package, then create TransactionBlocks object in origin package:

TransactionBlocks.scala

package chainslake.bitcoin.origin
import org.apache.spark.sql.SparkSession
import java.util.Properties

object TransactionBlocks extends TaskRun {

  protected def onProcess(spark: SparkSession, outputTable: String, fromBlock: Long, toBlock: Long, properties: Properties): Unit = {
    
  }

}

Add transaction_block to JobFactory

JobFactory.scala:

import chainslake.bitcoin.origin.TransactionBlocks
...
name match {
    case "bitcoin_origin.transaction_blocks" => TransactionBlocks
}
...

Check data of Bitcoin RPC

To test the data from bitcoin RPC, I will create a test file RPCTest in the test directory

RPCTest.scala:

package chainslake.bitcoin
import org.scalatest.funsuite.AnyFunSuite
class RPCTest extends AnyFunSuite {

    // You can use Intelij to run these tests.

    val rpcUrl = "https://rpc.ankr.com/btc"

    test("Get block height") {
        val response = Http(rpcUrl).header("Content-Type", "application/json")
        .postData(s"""{"method":"getblockcount","params":[],"id":"curltest","jsonrpc":"1.0"}""").asString
        println(response.body)
        // {"id":"curltest","jsonrpc":"2.0","result":876818}
    }

    test("Get block hash") {
        val blockNumber = 876818
        val response = Http(rpcUrl).header("Content-Type", "application/json")
        .postData(s"""{"method":"getblockhash","params":[$blockNumber],"id":"curltest","jsonrpc":"1.0"}""").asString
        println(response.body)
        // {"id":"curltest","jsonrpc":"2.0","result":"0000000000000000000114036b4c6aff5d21d097ec3620539e7e40cae729b794"}
    }

    test("Get block by block hash") {
        val blockHash = "0000000000000000000114036b4c6aff5d21d097ec3620539e7e40cae729b794"
        val response = Http(rpcUrl).header("Content-Type", "application/json")
        .postData(s"""{"method":"getblock","params":["$blockHash", 2],"id":"curltest","jsonrpc":"1.0"}""").asString
        println(response.body)
    }
}

Build data models

Models.scala:

package chainslake.bitcoin

case class ResponseRawNumber(
                            var jsonrpc: String,
                            var id: String,
                            var result: Long
                        )

case class ResponseRawString(
                            var jsonrpc: String,
                            var id: String,
                            var result: String
                        )

case class OriginBlock(
                  var block_date: Date,
                  var block_number: Long,
                  var block_time: Timestamp,
                  var block: String
                )

case class ResponseRawBlock(
                             var jsonrpc: String,
                             var id: String,
                             var result: RawBlock
                        )
case class RawBlock(
                   var time: Long
                   )

Code logic for app

Now we will go back to coding our application. First, we need to override the run method to add some important configuration to the table like frequent_type and the name of the output table. You should read the Table design documentation to understand the frequent_types that Chainslake supports.

TransactionBlocks.scala

...
override def run(spark: SparkSession, properties: Properties): Unit = {
    val chainName = properties.getProperty("chain_name")
    properties.setProperty("frequent_type", "block")
    properties.setProperty("list_input_tables", "node")
    val database = chainName + "_origin"
    try {
      spark.sql(s"create database if not exists $database")
    } catch {
      case e: Exception => e.getMessage
    }
    processTable(spark, chainName + "_origin.transaction_blocks", properties)
}
...

Next is to add code to the onProcess method. You can see the full source code here.

Build and test App

To build and run the app locally, go to the test folder and create a test script as follows:

run_bitcoin_test.sh

./build.sh
spark-submit --class chainslake.bitcoin.Main \
    --deploy-mode client \
    --name BitcoinOriginTransactionBlocks \
    --master local[4] \
    --driver-memory 4g \
    --conf "spark.app_properties.app_name=bitcoin_origin.transaction_blocks" \
    --conf "spark.app_properties.start_number=876788" \
    --conf "spark.app_properties.number_partitions=4" \
    --conf "spark.app_properties.end_number=876818" \
    --conf "spark.app_properties.rpc_list=https://bitcoin.drpc.org/" \
    --conf "spark.app_properties.config_file=bitcoin/application.properties" \
    --conf "spark.app_properties.chainslake_home_dir=../../" \
    --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" \
    --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog" \
    --conf spark.databricks.delta.retentionDurationCheck.enabled=false \
    --conf spark.scheduler.mode=FAIR \
    --jars ../lib/chainslake-job.jar \
    --packages com.esaulpaugh:headlong:9.2.0,org.web3j:abi:4.5.10,org.web3j:core:4.5.10,io.delta:delta-spark_2.12:3.2.0,org.scalaj:scalaj-http_2.12:2.4.2,com.github.ajrnz:scemplate_2.12:0.5.1 \
    chainslake-app.jar

You also need to create a configuration file for this Job in the data-builder/jobs/bitcoin directory.

application.properties

chain_name=bitcoin
max_number_partition=4
max_time_run=1
run_mode=backward
start_number=0
end_number=-1
number_block_per_partition=6
max_retry=10
is_alert=true

repair_mode=false
repair_name=none
start_repair_number=0
end_repair_number=-1
number_index_columns=3
is_vacuum=false
origin_table=bitcoin_origin.transaction_blocks

Note: number_block_per_partition=6 because there are 6 bitcoin blocks created every hour

Try run test app

$ ./run_bitcoin_test.sh

If the job runs successfully, you can check the data using the spark-sql tool in the test directory.

$ ./spark-sql.sh
spark-sql> select count(*) from bitcoin_origin.transaction_blocks;
count(1)
31

So the app development is done. So the app development is done. Now for the app to execute you need to create a job and add the job to Chainslake pipeline.

Create job and add it to pipeline

You need to read the Jobs pipeline documentation to understand the concepts of Jobs and Pipelines.

transaction_blocks.sh

$CHAINSLAKE_HOME_DIR/spark/script/chainslake-run.sh --class chainslake.bitcoin.Main \
    --name BitcoinOriginTransactionBlocks \
    --master local[4] \
    --driver-memory 4g \
    --conf "spark.app_properties.app_name=bitcoin_origin.transaction_blocks" \
    --conf "spark.app_properties.rpc_list=$BITCOIN_RPCS" \
    --conf "spark.app_properties.config_file=bitcoin/application.properties"

chainslake.py:

...
RUN_DIR = os.environ.get("CHAINSLAKE_HOME_DIR") + "/jobs/bitcoin"

bitcoin_origin_transaction_blocks = BashOperator(
    task_id="bitcoin_origin.transaction_blocks",
    bash_command=f"cd {RUN_DIR} && ./origin/transaction_blocks.sh "
)
...

Create PR to review your App

After you finish your app, you need to commit the code and push it to your data-builder repository, then you create a Pull Request to the main branch of the data-builder to trigger the review process of your app.

Good luck!