This is a slightly modified version of the blog post that our intern, Grace, wrote recently. The original blog post can be found here.
For this blog post, we benchmark dsync, an open-source data migration tool designed for NoSQL databases, against popular, industry-standard tools, to see how it stacks up in terms of speed, reliability, and ease of use.
We’ll walk through the methodology, our experiences with each tool, and the lessons we learned about their strengths, weaknesses, and nuances in data migration.
Data Migration Tools
We evaluated three frequently used data migration tools alongside dsync. Here’s an overview of each tool:
Airbyte
Airbyte is a data integration platform that boasts a vast library of over 300 connectors, supporting a wide range of sources and destinations. Its open-source architecture makes it highly customizable, while the Airbyte Cloud offering provides a managed solution for users. Designed to address common data synchronization challenges, Airbyte emphasizes flexibility and extensibility, making it suitable for complex use cases.
Fivetran
Fivetran is a fully managed data integration service known for its “set-it-and-forget-it” philosophy. It automates data pipeline maintenance and schema updates, providing a highly reliable and low-maintenance solution. Fivetran excels in syncing data from traditional databases, SaaS applications, and event streams into centralized data warehouses. With its user-friendly interface and robust error handling, it is often favored by organizations prioritizing stability and minimal manual intervention.
Estuary.dev
Estuary.dev focuses on real-time data movement and transformation, providing a lightweight solution for integrating sources and destinations. The platform is particularly suited for scenarios requiring low-latency data flows. Its minimalistic design makes setup simple, but the trade-off is reduced visibility and control during migration tasks. Estuary.dev is best for small to medium workloads that don’t require extensive monitoring or fine-grained tuning.
Executive Summary
Through our tests, we found out that dsync performs about 5.75x faster than Estuary.dev, while Airbyte and Fivetran did not work in our tests at all.
Here is a table with the final results:
Benchmarking Task
The benchmarking task consisted of comparing the performance of data migration tools by using a Cosmos DB instance as the source and a MongoDB instance as the destination. We used YCSB (Yahoo! Cloud Serving Benchmark) to generate realistic data loads for testing.
The specific benchmarking steps were as follows:
Download YCSB
Generate data using YCSB: We created a dataset of 10 million documents in a single collection using Workload A. This took a couple of hours to complete.
Initial sync test: We performed the initial data sync with the migration tools.
Generate additional data using YCSB: We generated an additional 1 million documents to simulate a real-world scenario where new data arrives during an ongoing migration.
Catch-up test: We measured how well each tool handled the “catch-up” process, which is effectively the maximum rate of CDC (change data capture) or incremental replication.
This is the environment that we used for the tests:
Source Database: Azure Cosmos DB with MongoDB API
- Provisioning: Configured with 4,000 RUs and Autoscale
- Indexing: Default indexing settings for MongoDB API
Destination Database: Self-managed MongoDB instance on GCP
- CPU: 4 vCPUs
- RAM: 16 GB
- Single-node replica set configuration
Virtual Machine: GCP
- CPU: 4 vCPUs
- RAM: 16 GB
Tool 1: Airbyte
Setup:
We started with the Airbyte Cloud version, which we were able to access through their website by just creating an account. Adding Cosmos DB as the source via the MongoDB connector worked after some trial and error, but we faced a lot of issues when configuring the destination.
Challenges:
We encountered a format specifier error with the destination configurations. When we traced back to the source code on the Github repository, it seemed to be from a missing argument.
This issue seemed to be a known bug in the Airbyte GitHub repository, and some people had solved the issue by changing the version of destination to 0.1.9 in settings.
Unfortunately, the Airbyte Cloud version does not allow direct changes to connector versions. To apply the fix, we had to deploy Airbyte locally — a process that took approximately 45 minutes. Once deployed, we adjusted the settings, toggled TLS encryption, and changed the connector version. Despite these efforts, the syntax error persisted.
Further complicating matters, we encountered repeated network HTTP errors where Airbyte disconnected with a “temporarily down” message. These errors were frustratingly difficult to debug due to inadequate logging and unclear error messages.
Verdict:
Airbyte is often praised for its wide range of connectors (400+), and for good reason — it supports many data sources and destinations, making it a go-to tool for integration tasks. However, the tool’s setup complexity, poor error logging, and instability makes it a challenging choice for data migration.
Tool 2: Fivetran
Setup:
Fivetran did not have an open source code version and we were able to quickstart the application by just creating an account. We were able to successfully configure the source using the “Azure Cosmos DB for MongoDB” connector.
Challenges:
Despite supporting MongoDB as a source, we were disappointed to find that Fivetran didn’t offer MongoDB as a destination. This made it unsuitable for our specific test setup, which required a MongoDB destination.
Verdict:
While Fivetran excels at handling integrations with many sources and destinations, its limited support for MongoDB left it unfit for our specific use case.
Tool 3: Estuary.dev
Setup:
Estuary was undoubtedly the easiest to set up among all the SaaS tools that we tested. The source and destination setup worked on the first try without any significant issues.
Results:
Initial Sync: 1 hour 9 minutes
Challenges:
Efficiency: While Estuary.dev successfully completed the migration, its speed lagged behind other tools.
Lack of Visibility: One of the major pain points with Estuary was the lack of feedback during the migration. There were no throughput metrics, error logs, or progress indicators. This made it difficult to know if the migration had started or if it had completed successfully.
No Pause/Resume: There was no option to pause the migration, so we couldn’t measure how well Estuary handled the catch-up time after generating 1 million more documents.
Verdict:
Estuary was the most straightforward tool in terms of setup, but its lack of visibility and control over the migration process made it difficult to evaluate performance in detail. It’s an option for smaller, simpler migrations but lacks the sophistication needed for complex or real-time workloads.
Tool 4: Dsync
Setup:
Finally, we tested dsync, the open-source migration tool that we develop at Adiom. Given that it’s distributed as a simple standalone binary that can be downloaded in the Github repo, there's no setup difficulties. Testing was conducted on a virtual machine. We tried the default load-level as well as the "Beast" level.
./dsync -s SOURCE -d DEST -load-level Beast
Results:
Initial Sync Time: 12:40 min
Catch-up Time: 7:54 min
Challenges:
Catch-Up Efficiency: We noticed that the catch-up time for dsync did not decrease even when higher load settings were applied. This is something that the team is investigating and it will be addressed in a future release.
Verdict:
Dsync showed promising performance, especially in terms of efficiency as it was more than 5x faster than Estuary. Further, dsync’s performance and metrics were easily observable, making it easy to track the progress and estimate the time to completion.
Final Thoughts: Benchmarked Data Migration Tools Comparison and Takeaways
After testing these tools, we have a few key takeaways:
Airbyte offers a massive library of connectors, but its error-prone setup and lack of clear diagnostics made it challenging to use.
Fivetran is limited in its support for specific use cases like MongoDB.
Estuary.dev is a convenient option for simple migrations with the easiest setup, but lacks efficiency and key features like logging and pausing/resumability, which makes it difficult to assess and manage large-scale migrations.
Dsync is a highly customizable and performant tool, but currently only supports Cosmos DB and MongoDB. We have a preview of DynamoDB support, and are working on other connectors as well. If you would like to give it a try, or talk about the future roadmap, you can get in contact with the team here.
Ultimately, the right tool depends on the specific requirements of the project, whether it’s ease of use, flexibility, performance under load, or visibility into the migration process.
About the Author
Grace is a junior at UC Berkeley, studying Computer Science and Economics. She interned as a software developer at Adiom, where as one of her projects she benchmarked dsync, an open-source data migration tool designed for NoSQL databases, against popular, industry-standard tools, to see how it stacks up in terms of speed, reliability, and ease of use.