BIG DATA

The impetuous increase in the amount of information that today is possible to manipulate and consult, was possible thanks to the introduction of new ways of storing data and new software technologies introduced recently.

The HADOOP platform has established itself as a market standard for storing structured, destructured and different nature data. This is referred to by the main information “distribution” software such as Cloudera and Hortonworks; these are flanked by increasingly advanced tools for the integration, reclamation and homogenization of data from different sources. Humanativa Group has identified in the Talend Data Fabric platform, which integrates the main data governance tools oriented to the Big Data world, the reference tool for the implementation of complex projects capable of assembling information from traditional sources into a single logical model of data information from traditional, social and IoT sources.

The previous figure illustrates the logical offer of our model, able to integrate platforms, tools, sources and “target” environments to be made available to Data Scientist teams for the development of predictive analysis systems.

 

We summarize below the main technologies we refer to and for which we have a significant number of certified technicians.

Cloudera

Founded in 2008, Cloudera was the first major company to offer a complete Hadoop (CDH)5 distribution and acquiring customers like eBay, Expedia, Nokia and Samsung.

CDH has, in addition to the fundamental characteristics of Hadoop such as scalable storage and distributed computing, a set of additional components, for example a user interface.

Furthermore, CDH allows companies greater market competitiveness, thanks to some special characteristics including security and integration with a wide range of hardware and software solutions.

Hortonworks

Funded at the same time as the Hadoop platform, the completely open-source Hortonworks Data Platform distribution includes components such as Hadoop, Pig, Hive and Ambari.

The management and monitoring of the cluster are carried out with Apache-Ambari.

Hortonworks presents some features such as high system availability with both Hadoop 1.0 and Hadoop 2.0., improving the performance of queries carried out with HiveQL.

TALEND Data Fabric

Talend is the first Data Integration platform on Spark.

 

Thanks to this feature, Talend allows to implement ETL jobs that are run 100% on Spark, significantly reducing development time and the performance of the software generated.

 

The suite shown in the previous figure shows how the Data Fabric platform integrates the different components necessary to create a complete “big data governance” system.

 

We currently use Cloudera, Horntonworks and Talend for some of our strategic projects in various sectors such as airport services, advanced telecommunications and public finance.