New Dataset on ZKsync Era Launches for Research
Access a year's worth of ZKsync data for blockchain research.
Maria Inês Silva, Johnnatan Messias, Benjamin Livshits
― 8 min read
Table of Contents
- Challenges with Blockchain Data
- Why ZKsync Era Data?
- Availability of ZKsync Data
- Supporting Research and Analysis
- Organizing the Paper
- Dataset Details
- Blockchain Structure
- Example Analyses
- Gas Usage and Transaction Fees
- Events and Contract Deployments
- Swap Events
- User Behavior Analysis
- Future Research Directions
- MEV and Arbitrage
- User Activity Analysis
- Data Science and Analytics
- Conclusion
- Original Source
- Reference Links
Blockchain technology provides a way to record Transactions in a secure and transparent manner. While the data on blockchains is generally open for anyone to access, actually using this data can be hard and expensive, especially for researchers. This is particularly true for Layer 2 (L2) systems, like ZKsync.
In an effort to help, we have put together a dataset that contains one year's worth of activity from ZKsync, a specific L2 scaling solution for Ethereum. This dataset is now available for anyone to use. In this article, we explain how we created this dataset, show some examples of what can be done with it, and describe future research possibilities. The code related to this dataset can be found online, making it easier for others to replicate our work.
Challenges with Blockchain Data
One of the main advantages of blockchain technology is that it allows for decentralization, which means no single person or organization has control over the network. However, the data from these blockchains is often hard to access, especially for people who are not technically skilled. This difficulty can slow down the progress of blockchain research and its adoption.
Currently, if someone wants to get blockchain data, they have a few options. They can set up an archive node for Ethereum or a full node for Bitcoin, but this requires advanced technology and can be costly. For many people, this option may not be practical. They can also use RPC providers, but this might also be a struggle for those without technical knowledge and might lead to high costs.
Another option is to rely on external platforms, like Etherscan or Dune. While these platforms can be useful, they can also be expensive and may not meet the needs of everyone, especially researchers who require easy access to data.
We believe that everyone interested in blockchain data should have an easy way to access this information without the need to worry about technology or high costs. Our dataset provides valuable insights for various research purposes, such as understanding transaction patterns, designing airdrops, analyzing market trends, and more.
Why ZKsync Era Data?
ZKsync Era is an L2 scaling solution that was launched in March 2023. It helps the Ethereum blockchain process transactions more efficiently by using zero-knowledge proofs (ZKPs). This technology allows multiple transactions to be processed at once, which helps keep transaction costs low and encourages more users to participate.
As of July 2024, ZKsync Era ranks among the top L2 solutions, with billions of dollars locked within its ecosystem. Rollups like ZKsync have become crucial for scaling Ethereum, as they introduce new capabilities and attract many users. However, some questions still need answers, and research in this area is still limited.
Given the rising importance of L2 solutions, our dataset aims to help researchers study ZKsync more closely. By providing access to this data, we hope to encourage further research on ZKsync and L2 solutions in general.
Availability of ZKsync Data
To support research, we made our ZKsync dataset publicly available on a GitHub repository. This dataset contains one year of information covering blocks, transactions, Receipts, and Logs. Detailed information about the dataset can be found in the repository.
Supporting Research and Analysis
In our dataset, we provide potential applications that can help researchers and users alike. For example, our dataset can assist those wanting to study transaction fees and how gas is used, events triggered by transactions, token swaps, and other relevant activities.
We recognize that gathering data from RPCs and external services can often be slow and expensive. Therefore, we made our ZKsync dataset easy to download and use, providing sample code to assist researchers in processing and analyzing the data. This means reproducibility is possible, allowing others to verify our results.
Organizing the Paper
This article is structured to provide a clear understanding of the dataset and its significance. The sections include:
- Details about the dataset and how it was created.
- Examples of analyses that can be performed.
- Challenges in data gathering, along with our solutions.
- Future research directions that can be explored with this dataset.
Dataset Details
Our ZKsync dataset covers the time from February 14, 2023, to March 24, 2024. It includes over 300 million transactions and about 1.6 million contracts deployed during this period. This dataset offers a comprehensive overview of all the activities that occurred on ZKsync since its launch.
The data is collected from our ZKsync Era archive node. Initially raw, the data underwent a pre-processing step where we formatted it into a parquet format. This makes it easy to access using popular libraries in Python, such as Pandas and Polars. Given the massive amount of data, we used Polars for better performance and memory handling.
Blockchain Structure
To understand the ZKsync data, it's essential to grasp how blocks and transactions interact within the blockchain.
Blocks
Blocks are chunks of data in a blockchain, each identified by a unique hash. They contain transaction information and metadata. Blocks are linked in a chain, ensuring that transactions are secure, consensus is achieved, and data is processed efficiently.
Transactions
Transactions represent actions on the blockchain, like transferring assets or executing contracts. Each transaction is initiated by a user and is verified by a network of nodes. Once validated, transactions are bundled into blocks. ZKsync aggregates transactions and processes them off the main blockchain, which helps reduce costs and stress on the network.
Transactions include details like the recipient address, amount, gas price, and gas limit. Gas represents the fee users pay to process their transactions. Sorting transactions by block number helps maintain order within the blockchain.
Receipts
Transaction receipts provide a summary of each transaction's outcome once processed. They contain details like the transaction hash, block number, gas used, and the cost incurred. This information is vital for users and developers to understand how interactions with smart contracts occur and what fees are involved.
Logs
Transaction logs are records of events triggered during transactions, especially involving smart contracts. They help track various activities, such as token transfers and approvals. These logs are essential for aiding blockchain analysis and understanding user behavior on the network.
Example Analyses
Using our dataset, there are numerous analyses researchers can perform. Each type of data in the dataset can lead to insights about the activity on ZKsync.
Gas Usage and Transaction Fees
One critical area of analysis involves looking at transactions, gas usage, and fees. By examining the daily transactions executed on ZKsync, we can see trends and spikes in activity. For instance, during the analyzed period, the network averaged around 900,000 transactions daily, with notable surges during specific events.
Gas usage varies, and by studying this, we can understand how efficiently the network operates. Analysis can reveal spikes in gas usage often corresponding with events like airdrops or high trading activity.
Events and Contract Deployments
Events emitted by smart contracts provide additional data about network activity. By analyzing the top event types, we could see that Transfer events make up the majority of emitted events. Understanding these events helps shed light on how often tokens are moved around within the network.
Contract deployments also indicate the network's growth and activity levels. Observing daily contract deployment numbers can reveal trends in developer interest and the overall health of the ZKsync ecosystem.
Swap Events
Swap events are particularly important for decentralized exchanges (DEX). By looking at the number of swap events over time, one can observe trading activity levels and general market behavior. Analyzing which contracts are involved in swaps can help identify popular trading pairs and market dynamics.
User Behavior Analysis
Analyzing user behavior is crucial to understanding the dynamics of the ZKsync ecosystem. By examining the number of transactions per user, researchers can identify trends like airdrop farmers and usage patterns. Insights regarding how social media influences blockchain activity can also be gathered from this data.
Future Research Directions
Our dataset opens up various paths for future research. Here are some possible areas where researchers could benefit from this information.
MEV and Arbitrage
Minimum Extractable Value (MEV) and arbitrage are vital concepts in blockchain economics. Despite being studied on Layer 1 chains, research on L2 systems like ZKsync is still relatively novel. Our dataset can help researchers look into backrunning strategies and arbitrage opportunities between centralized exchanges and DEX platforms.
User Activity Analysis
Understanding user behavior on ZKsync can lead to valuable insights. Researchers can explore user interactions with smart contracts and identify trends like Sybil attacks, where individuals create multiple accounts to benefit from airdrops or other incentives. This analysis can help organizations understand user engagement and develop strategies to combat deceptive practices.
Data Science and Analytics
Our dataset presents an excellent resource for individuals interested in data analysis and science. Data scientists can use this information on public platforms to improve their skills and showcase their abilities. This dataset could serve as a valuable educational tool for those looking to break into the blockchain industry.
Conclusion
In summary, while blockchain technology has benefits regarding transparency and decentralization, accessing and utilizing blockchain data can still pose challenges. Our initiative to release the ZKsync dataset addresses these challenges, making it easier for researchers and enthusiasts to dive into the blockchain space.
By providing a year of ZKsync data, we hope to enrich the research landscape related to L2 systems and spur interest in ZKsync’s potential. Our dataset not only contributes to existing body research but also aims to be accessible to a wide range of users, regardless of their technical expertise.
Through this effort, we believe that valuable insights will emerge, enhancing the understanding of blockchain dynamics and fostering further innovations in this exciting field.
Title: A Public Dataset For the ZKsync Rollup
Abstract: Despite blockchain data being publicly available, practical challenges and high costs often hinder its effective use by researchers, thus limiting data-driven research and exploration in the blockchain space. This is especially true when it comes to Layer~2 (L2) ecosystems, and ZKsync, in particular. To address these issues, we have curated a dataset from 1 year of activity extracted from a ZKsync Era archive node and made it freely available to external parties. In this paper, we provide details on this dataset and how it was created, showcase a few example analyses that can be performed with it, and discuss some future research directions. We also publish and share the code used in our analysis on GitHub to promote reproducibility and to support further research.
Authors: Maria Inês Silva, Johnnatan Messias, Benjamin Livshits
Last Update: 2024-07-26 00:00:00
Language: English
Source URL: https://arxiv.org/abs/2407.18699
Source PDF: https://arxiv.org/pdf/2407.18699
Licence: https://creativecommons.org/licenses/by/4.0/
Changes: This summary was created with assistance from AI and may have inaccuracies. For accurate information, please refer to the original source documents linked here.
Thank you to arxiv for use of its open access interoperability.
Reference Links
- https://github.com/matter-labs/zksync-data-dump/blob/main/notebooks/01-zksync-data.ipynb
- https://github.com/matter-labs/zksync-data-dump/blob/main/notebooks/02-data-exploration-fees.ipynb
- https://github.com/matter-labs/zksync-data-dump/blob/main/notebooks/03-data-exploration-contracts.ipynb
- https://github.com/matter-labs/zksync-data-dump/blob/main/notebooks/04-data-exploration-swaps.ipynb
- https://github.com/matter-labs/zksync-data-dump
- https://github.com/matter-labs/zksync-data-dump/tree/main/notebooks
- https://era.zksync.network/address/0x0000000000000000000000000000000000008001