9.1.3 - Harvesting of Datasets
Description
Simpl shall provide built-in tools to harvest data, e.g., define a web crawler which is able to import data and combine the crawled data with existing datasets.
L2 - Detailed Requirement | Issue ID: SIMPL-1625 | Status: Proposed |
Moderator note: Comments are from the previous discussion platform.
Submitted by lisana berberi on Fri, 28/06/2024 - 09:55
There is already an open source, free tool that enable this requirement, Mage AI, through the data loader utilities[1]
[1]https://docs.mage.ai/design/data-loading#example-loading-data-from-a-fi…
In reply to There is already an open… by lisana berberi
Submitted by Richard Mrasek on Mon, 15/07/2024 - 14:31
Hello Lisana Berberi,
thank you for the link to the documentation. The requirement will require to integrate a data pipelining framework, that is able to load the dataset from the different sources and combine them. The data loader utilities (and mage.ai in general) is one of the possible solutions, that we consider. An alternative is Apache Airflow that provides an even larger library of connection hooks to different data storage solutions https://airflow.apache.org/docs/apache-airflow-providers/core-extension…
Best Regards
Richard
Please log in or sign up to comment.