Cost-effective batch-based migration strategies for NewSQL-based big data systems
dc.contributor.author | Vadlamudi, Naveen Kumar | |
dc.contributor.author | University of Lethbridge. Faculty of Arts and Science | |
dc.contributor.supervisor | Osborn, Wendy | |
dc.date.accessioned | 2024-10-10T20:42:31Z | |
dc.date.available | 2024-10-10T20:42:31Z | |
dc.date.issued | 2024 | |
dc.degree.level | Masters | |
dc.description.abstract | Modern, high-performance applications demand scalable and efficient databases, leading to the evolution of NewSQL systems. The challenge lies in migrating data from Shardingsphere with PostgreSQL to AWS (AmazonWeb Services) cloud object storage. Implementing batch migration algorithms in Apache Spark, specifically targeting Delta Lake format, introduces complexities to ensure seamless data integration and storage within AWS environments. This thesis explores tailored batch-based migration algorithms for transferring data from Shardingsphere with PostgreSQL to AWS cloud object storage, emphasizing performance optimization by transferring the data faster. The study evaluates various batch loading techniques in Apache Spark, including sequential and concurrent strategies for shard-by-shard and aggregated-shards based algorithms. These techniques aim to maximize efficiency in storing data in Delta Lake format within AWS cloud storage, facilitating effective data management, visualization, and utilization for modern applications, business intelligence, AI and ML. Leveraging the Lakehouse architecture for integrated data processing and analytics. | |
dc.embargo | No | |
dc.identifier.uri | https://hdl.handle.net/10133/6939 | |
dc.language.iso | en | |
dc.publisher | Lethbridge, Alta. : University of Lethbridge, Dept. of Mathematics and Computer Science | |
dc.publisher.department | Department of Mathematics and Computer Science | |
dc.publisher.faculty | Arts and Science | |
dc.relation.ispartofseries | Thesis (University of Lethbridge. Faculty of Arts and Science) | |
dc.subject | batch-based migration algorithms | |
dc.subject | cloud computing | |
dc.subject | NewSQL systems | |
dc.subject | data migration | |
dc.subject | data pipelines | |
dc.subject | documentation | |
dc.subject.lcsh | Dissertations, Academic | |
dc.subject.lcsh | Cloud computing | |
dc.subject.lcsh | SQL (Computer program language) | |
dc.subject.lcsh | Big data | |
dc.subject.lcsh | Algorithms | |
dc.subject.lcsh | Electronic data processing--Batch processing--Documentation | |
dc.subject.lcsh | Data mining | |
dc.title | Cost-effective batch-based migration strategies for NewSQL-based big data systems | |
dc.type | Thesis |