3/5/2023 0 Comments Datum lineTo integrate large volumes of data from the building blocks of companies’ information systems, ETLs, enterprise application integration (systems) (EAIs), and enterprise information integration (EIIs) are always used ( Acharjya and Ahmed, 2016 Daniel, 2017). Thus certain integration tools, including a big data adapter, already exist on the market this is the case with Talend Enterprise Data Integration–Big Data Edition. Hadoop uses scripting via MapReduce Sqoop and Flume also participate in the integration of unstructured data. In the big data context, data integration has been extended to unstructured data (sensor data, web logs, social networks, documents). The big data collection phase can be divided into two main categories, which depend on the type of load, either batch, microbatch, streaming. Then we will look at the different formats for storing structured and unstructured data. In what follows, we will make a comparative study of the tools that make this collection operation with respect to the norms and standards of big data. The data are then stored in the HDFS file format or NOSQL database ( Prabhu et al., 2019). Indeed, in this phase the big data system collects massive data from any structure, and from heterogeneous sources by a variety of tools. Integration consists all the data into the big data storage. 6.Ĭompression consists of reducing the size of the data, but without losing the relevance of the data. The transformation can lead to the division, convergence, normalization, or synthesis of the data. Noise reduction or removing involves cleaning data. 3.Ĭonstant validation and analysis of data. 2.įiltration and selection of incoming information relevant to the business. ![]() ![]() Identification of the various known data formats, by default big data targets the unstructured data. The components in the loading and collection process are: 1.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |