Cost-effective batch-based migration strategies for NewSQL-based big data systems

dc.contributor.authorVadlamudi, Naveen Kumar
dc.contributor.authorUniversity of Lethbridge. Faculty of Arts and Science
dc.contributor.supervisorOsborn, Wendy
dc.date.accessioned2024-10-10T20:42:31Z
dc.date.available2024-10-10T20:42:31Z
dc.date.issued2024
dc.degree.levelMasters
dc.description.abstractModern, high-performance applications demand scalable and efficient databases, leading to the evolution of NewSQL systems. The challenge lies in migrating data from Shardingsphere with PostgreSQL to AWS (AmazonWeb Services) cloud object storage. Implementing batch migration algorithms in Apache Spark, specifically targeting Delta Lake format, introduces complexities to ensure seamless data integration and storage within AWS environments. This thesis explores tailored batch-based migration algorithms for transferring data from Shardingsphere with PostgreSQL to AWS cloud object storage, emphasizing performance optimization by transferring the data faster. The study evaluates various batch loading techniques in Apache Spark, including sequential and concurrent strategies for shard-by-shard and aggregated-shards based algorithms. These techniques aim to maximize efficiency in storing data in Delta Lake format within AWS cloud storage, facilitating effective data management, visualization, and utilization for modern applications, business intelligence, AI and ML. Leveraging the Lakehouse architecture for integrated data processing and analytics.
dc.embargoNo
dc.identifier.urihttps://hdl.handle.net/10133/6939
dc.language.isoen
dc.publisherLethbridge, Alta. : University of Lethbridge, Dept. of Mathematics and Computer Science
dc.publisher.departmentDepartment of Mathematics and Computer Science
dc.publisher.facultyArts and Science
dc.relation.ispartofseriesThesis (University of Lethbridge. Faculty of Arts and Science)
dc.subjectbatch-based migration algorithms
dc.subjectcloud computing
dc.subjectNewSQL systems
dc.subjectdata migration
dc.subjectdata pipelines
dc.subjectdocumentation
dc.subject.lcshDissertations, Academic
dc.subject.lcshCloud computing
dc.subject.lcshSQL (Computer program language)
dc.subject.lcshBig data
dc.subject.lcshAlgorithms
dc.subject.lcshElectronic data processing--Batch processing--Documentation
dc.subject.lcshData mining
dc.titleCost-effective batch-based migration strategies for NewSQL-based big data systems
dc.typeThesis
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
VADLAMUDI_NAVEEN_KUMAR_MSC_2024.pdf
Size:
19.79 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
3.33 KB
Format:
Item-specific license agreed upon to submission
Description: