Publicación:
Sparkmach: A Distributed Data Processing System Based on Automated Machine Learning for Big Data

dc.contributor.author Bravo-Rocca, Gusseppe es_PE
dc.contributor.author Torres-Robatty, Piero es_PE
dc.contributor.author Fiestas-Iquira, Jose es_PE
dc.date.accessioned 2024-05-30T23:13:38Z
dc.date.available 2024-05-30T23:13:38Z
dc.date.issued 2019
dc.description.abstract This work proposes a semi-automated analysis and modeling package for Machine Learning related problems. The library goal is to reduce the steps involved in a traditional data science roadmap. To do so, Sparkmach takes advantage of Machine Learning techniques to build base models for both classification and regression problems. These models include exploratory data analysis, data preprocessing, feature engineering and modeling. The project has its basis in Pymach, a similar library that faces those steps for small and medium-sized datasets (about ten millions of rows and a few columns). Sparkmach central labor is to scale Pymach to overcome big datasets by using Apache Spark distributed computing, a distributed engine for large-scale data processing, that tackle several data science related problems in a cluster environment. Despite the software nature, Sparkmach can be of use for local environments, getting the most benefits from the distributed processing tools.
dc.description.sponsorship Consejo Nacional de Ciencia, Tecnología e Innovación Tecnológica - Concytec
dc.identifier.doi https://doi.org/10.1007/978-3-030-11680-4_13
dc.identifier.uri https://hdl.handle.net/20.500.12390/1325
dc.language.iso eng
dc.publisher Springer International Publishing
dc.relation.ispartof Communications in Computer and Information Science
dc.rights info:eu-repo/semantics/openAccess
dc.subject Statistics
dc.subject Semi-automated machine learning es_PE
dc.subject Data Science es_PE
dc.subject Data mining es_PE
dc.subject Data engineering es_PE
dc.subject Big data es_PE
dc.subject.ocde https://purl.org/pe-repo/ocde/ford#5.08.02
dc.title Sparkmach: A Distributed Data Processing System Based on Automated Machine Learning for Big Data
dc.type info:eu-repo/semantics/bookPart
dspace.entity.type Publication
Archivos