Cost-effective batch-based migration strategies for NewSQL-based big data systems
No Thumbnail Available
Date
2024
Authors
Vadlamudi, Naveen Kumar
University of Lethbridge. Faculty of Arts and Science
Journal Title
Journal ISSN
Volume Title
Publisher
Lethbridge, Alta. : University of Lethbridge, Dept. of Mathematics and Computer Science
Abstract
Modern, high-performance applications demand scalable and efficient databases, leading
to the evolution of NewSQL systems. The challenge lies in migrating data from Shardingsphere
with PostgreSQL to AWS (AmazonWeb Services) cloud object storage. Implementing
batch migration algorithms in Apache Spark, specifically targeting Delta Lake format,
introduces complexities to ensure seamless data integration and storage within AWS environments.
This thesis explores tailored batch-based migration algorithms for transferring data from
Shardingsphere with PostgreSQL to AWS cloud object storage, emphasizing performance
optimization by transferring the data faster. The study evaluates various batch loading techniques
in Apache Spark, including sequential and concurrent strategies for shard-by-shard
and aggregated-shards based algorithms. These techniques aim to maximize efficiency in
storing data in Delta Lake format within AWS cloud storage, facilitating effective data
management, visualization, and utilization for modern applications, business intelligence,
AI and ML. Leveraging the Lakehouse architecture for integrated data processing and analytics.
Description
Keywords
batch-based migration algorithms , cloud computing , NewSQL systems , data migration , data pipelines , documentation