Cost-effective batch-based migration strategies for NewSQL-based big data systems

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Lethbridge, Alta. : University of Lethbridge, Dept. of Mathematics and Computer Science

Abstract

Modern, high-performance applications demand scalable and efficient databases, leading to the evolution of NewSQL systems. The challenge lies in migrating data from Shardingsphere with PostgreSQL to AWS (AmazonWeb Services) cloud object storage. Implementing batch migration algorithms in Apache Spark, specifically targeting Delta Lake format, introduces complexities to ensure seamless data integration and storage within AWS environments. This thesis explores tailored batch-based migration algorithms for transferring data from Shardingsphere with PostgreSQL to AWS cloud object storage, emphasizing performance optimization by transferring the data faster. The study evaluates various batch loading techniques in Apache Spark, including sequential and concurrent strategies for shard-by-shard and aggregated-shards based algorithms. These techniques aim to maximize efficiency in storing data in Delta Lake format within AWS cloud storage, facilitating effective data management, visualization, and utilization for modern applications, business intelligence, AI and ML. Leveraging the Lakehouse architecture for integrated data processing and analytics.

Description

Citation

Endorsement

Review

Supplemented By

Referenced By