Tutorial 1: Build data app to get data from Bitcoin chain
This article will guide you step by step to build a Data App to get data from Bitcoin including environment setup, programming, testing locally and submitting the app.
Content
- Environment Setup
- Data builder Setup
- Build Bitcoin data app
- Create job and add it to pipeline
- Create PR to review your App
Environment Setup
To start building a Data App, you need to install some software on your computer. Since your application will be running on the Chainslake platform, to avoid conflicts, we recommend using the following recommended software versions:
# Add in your ~/.bashrc
export JAVA_HOME=<YOUR_PATH_JDK>
export PATH=$PATH:$JAVA_HOME/bin
Check java version
$ java --version
openjdk 11.0.24 2024-07-16
OpenJDK Runtime Environment Temurin-11.0.24+8 (build 11.0.24+8)
OpenJDK 64-Bit Server VM Temurin-11.0.24+8 (build 11.0.24+8, mixed mode)
- Install Coursier then install Scala and SBT by this command
$ cs install scala:2.12.18 && cs install scalac:2.12.18
$ cs install sbt:1.9.0
- Download and install Spark 3.5.1
$ wget https://archive.apache.org/dist/spark/spark-3.5.1/spark-3.5.1-bin-hadoop3.tgz
$ tar -xzvf spark-3.5.1-bin-hadoop3.tgz
# Add in your ~/.bashrc
export SPARK_HOME=~/installs/spark-3.5.1-bin-hadoop3
export PATH=$PATH:$SPARK_HOME/bin
Test spark
$ spark-shell
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.5.1
/_/
Using Scala version 2.12.18 (OpenJDK 64-Bit Server VM, Java 11.0.24)
Type in expressions to have them evaluated.
Type :help for more information.
- You can use any IDE you want, however we recommend using Visual Studio Code together with Intelij
Data builder Setup
To start programming, you need to use the Data-builder tool. This is the repository managed by Chainslake, you will use this repository to develop and submit your app to the Chainslake platform. You just need to add this folk repository to your account then clone the code from this new repo to your computer and you can start developing your app.
$ git clone [email protected]:<YOUR_GITHUB_USERNAME>/data-builder.git
$ cd data-builder
$ code # Open data-builder in Visual Studio Code
Open data-builder/spark
directory in Intelij IDE, Intellij will automatically recognize the project configuration and download the necessary libraries.
Now, let’s start programming the Data App to get Bitcoin data!!!
Build Bitcoin data app
Create Bitcoin package
Create bitcoin
package inside chainslake
package then create two object Main
and JobFactory
like bellow:
package chainslake.bitcoin
import org.apache.spark.sql.SparkSession
import java.io.FileInputStream
import java.util.Properties
object Main {
def main(args: Array[String]) {
val properties = new Properties
val spark = SparkSession.builder
.enableHiveSupport()
.getOrCreate()
val configFile = spark.conf.get("spark.app_properties.chainslake_home_dir") + "/jobs/" + spark.conf.get("spark.app_properties.config_file")
properties.load(new FileInputStream(configFile))
spark.conf.getAll.foreach(pair => {
if (pair._1.startsWith("spark.app_properties")) {
properties.setProperty(pair._1.substring(21), pair._2)
}
})
JobFactory.createJob(properties.getProperty("app_name")).run(spark, properties)
spark.stop()
print("Application stopped!!!")
}
}
package chainslake.bitcoin
import chainslake.bitcoin.origin.TransactionBlocks
import chainslake.job.JobInf
object JobFactory {
def createJob(name: String): JobInf = {
name match {
}
}
}
Create package origin
inside bitcoin
package, then create TransactionBlocks
object in origin
package:
package chainslake.bitcoin.origin
import org.apache.spark.sql.SparkSession
import java.util.Properties
object TransactionBlocks extends TaskRun {
protected def onProcess(spark: SparkSession, outputTable: String, fromBlock: Long, toBlock: Long, properties: Properties): Unit = {
}
}
Add transaction_block to JobFactory
import chainslake.bitcoin.origin.TransactionBlocks
...
name match {
case "bitcoin_origin.transaction_blocks" => TransactionBlocks
}
...
Check data of Bitcoin RPC
To test the data from bitcoin RPC, I will create a test file RPCTest
in the test directory
package chainslake.bitcoin
import org.scalatest.funsuite.AnyFunSuite
class RPCTest extends AnyFunSuite {
// You can use Intelij to run these tests.
val rpcUrl = "https://rpc.ankr.com/btc"
test("Get block height") {
val response = Http(rpcUrl).header("Content-Type", "application/json")
.postData(s"""{"method":"getblockcount","params":[],"id":"curltest","jsonrpc":"1.0"}""").asString
println(response.body)
// {"id":"curltest","jsonrpc":"2.0","result":876818}
}
test("Get block hash") {
val blockNumber = 876818
val response = Http(rpcUrl).header("Content-Type", "application/json")
.postData(s"""{"method":"getblockhash","params":[$blockNumber],"id":"curltest","jsonrpc":"1.0"}""").asString
println(response.body)
// {"id":"curltest","jsonrpc":"2.0","result":"0000000000000000000114036b4c6aff5d21d097ec3620539e7e40cae729b794"}
}
test("Get block by block hash") {
val blockHash = "0000000000000000000114036b4c6aff5d21d097ec3620539e7e40cae729b794"
val response = Http(rpcUrl).header("Content-Type", "application/json")
.postData(s"""{"method":"getblock","params":["$blockHash", 2],"id":"curltest","jsonrpc":"1.0"}""").asString
println(response.body)
}
}
Build data models
package chainslake.bitcoin
case class ResponseRawNumber(
var jsonrpc: String,
var id: String,
var result: Long
)
case class ResponseRawString(
var jsonrpc: String,
var id: String,
var result: String
)
case class OriginBlock(
var block_date: Date,
var block_number: Long,
var block_time: Timestamp,
var block: String
)
case class ResponseRawBlock(
var jsonrpc: String,
var id: String,
var result: RawBlock
)
case class RawBlock(
var time: Long
)
Code logic for app
Now we will go back to coding our application. First, we need to override the run method to add some important configuration to the table like frequent_type
and the name of the output table. You should read the Table design documentation to understand the frequent_types that Chainslake supports.
...
override def run(spark: SparkSession, properties: Properties): Unit = {
val chainName = properties.getProperty("chain_name")
properties.setProperty("frequent_type", "block")
properties.setProperty("list_input_tables", "node")
val database = chainName + "_origin"
try {
spark.sql(s"create database if not exists $database")
} catch {
case e: Exception => e.getMessage
}
processTable(spark, chainName + "_origin.transaction_blocks", properties)
}
...
Next is to add code to the onProcess method. You can see the full source code here.
Build and test App
To build and run the app locally, go to the test
folder and create a test script as follows:
- run_bitcoin_test.sh
./build.sh
spark-submit --class chainslake.bitcoin.Main \
--deploy-mode client \
--name BitcoinOriginTransactionBlocks \
--master local[4] \
--driver-memory 4g \
--conf "spark.app_properties.app_name=bitcoin_origin.transaction_blocks" \
--conf "spark.app_properties.start_number=876788" \
--conf "spark.app_properties.number_partitions=4" \
--conf "spark.app_properties.end_number=876818" \
--conf "spark.app_properties.rpc_list=https://bitcoin.drpc.org/" \
--conf "spark.app_properties.config_file=bitcoin/application.properties" \
--conf "spark.app_properties.chainslake_home_dir=../../" \
--conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" \
--conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog" \
--conf spark.databricks.delta.retentionDurationCheck.enabled=false \
--conf spark.scheduler.mode=FAIR \
--jars ../lib/chainslake-job.jar \
--packages com.esaulpaugh:headlong:9.2.0,org.web3j:abi:4.5.10,org.web3j:core:4.5.10,io.delta:delta-spark_2.12:3.2.0,org.scalaj:scalaj-http_2.12:2.4.2,com.github.ajrnz:scemplate_2.12:0.5.1 \
chainslake-app.jar
You also need to create a configuration file for this Job in the data-builder/jobs/bitcoin
directory.
chain_name=bitcoin
max_number_partition=4
max_time_run=1
run_mode=backward
start_number=0
end_number=-1
number_block_per_partition=6
max_retry=10
is_alert=true
repair_mode=false
repair_name=none
start_repair_number=0
end_repair_number=-1
number_index_columns=3
is_vacuum=false
origin_table=bitcoin_origin.transaction_blocks
Note:
number_block_per_partition=6
because there are 6 bitcoin blocks created every hour
Try run test app
$ ./run_bitcoin_test.sh
If the job runs successfully, you can check the data using the spark-sql
tool in the test directory.
$ ./spark-sql.sh
spark-sql> select count(*) from bitcoin_origin.transaction_blocks;
count(1)
31
So the app development is done. So the app development is done. Now for the app to execute you need to create a job and add the job to Chainslake pipeline.
Create job and add it to pipeline
You need to read the Jobs pipeline documentation to understand the concepts of Jobs and Pipelines.
$CHAINSLAKE_HOME_DIR/spark/script/chainslake-run.sh --class chainslake.bitcoin.Main \
--name BitcoinOriginTransactionBlocks \
--master local[4] \
--driver-memory 4g \
--conf "spark.app_properties.app_name=bitcoin_origin.transaction_blocks" \
--conf "spark.app_properties.rpc_list=$BITCOIN_RPCS" \
--conf "spark.app_properties.config_file=bitcoin/application.properties"
...
RUN_DIR = os.environ.get("CHAINSLAKE_HOME_DIR") + "/jobs/bitcoin"
bitcoin_origin_transaction_blocks = BashOperator(
task_id="bitcoin_origin.transaction_blocks",
bash_command=f"cd {RUN_DIR} && ./origin/transaction_blocks.sh "
)
...
Create PR to review your App
After you finish your app, you need to commit the code and push it to your data-builder repository, then you create a Pull Request to the main branch of the data-builder to trigger the review process of your app.
Good luck!