Cost-effective batch-based migration strategies for NewSQL-based big data systems

No Thumbnail Available
Date
2024
Authors
Vadlamudi, Naveen Kumar
University of Lethbridge. Faculty of Arts and Science
Journal Title
Journal ISSN
Volume Title
Publisher
Lethbridge, Alta. : University of Lethbridge, Dept. of Mathematics and Computer Science
Abstract
Modern, high-performance applications demand scalable and efficient databases, leading to the evolution of NewSQL systems. The challenge lies in migrating data from Shardingsphere with PostgreSQL to AWS (AmazonWeb Services) cloud object storage. Implementing batch migration algorithms in Apache Spark, specifically targeting Delta Lake format, introduces complexities to ensure seamless data integration and storage within AWS environments. This thesis explores tailored batch-based migration algorithms for transferring data from Shardingsphere with PostgreSQL to AWS cloud object storage, emphasizing performance optimization by transferring the data faster. The study evaluates various batch loading techniques in Apache Spark, including sequential and concurrent strategies for shard-by-shard and aggregated-shards based algorithms. These techniques aim to maximize efficiency in storing data in Delta Lake format within AWS cloud storage, facilitating effective data management, visualization, and utilization for modern applications, business intelligence, AI and ML. Leveraging the Lakehouse architecture for integrated data processing and analytics.
Description
Keywords
batch-based migration algorithms , cloud computing , NewSQL systems , data migration , data pipelines , documentation
Citation