Hortonworks · Maloy Manna

Basics of Big Data - Building a Hadoop data warehouse

2014/05/12

Big Data

big data / big data analytics / Cloudera / data mining / data warehouse / data warehousing / ETL / Hadoop / HDFS / Hive / Hortonworks / Impala / parquet / R / schema-on-read / schema-on-write / syncsort / yarn

This is the 3rd part of a series of posts on Big Data. Read Part-1 (What is Big Data) and Part-2 (Hadoop). Traditionally data warehouses have been built with relational databases as backbone. With the new challenges (3Vs) of Big Data, relational databases have been falling short of the requirements of handling New data types (unstructured data) Extended analytic processing Throughput (TB/hour loading) with immediate query access The industry has turned to Hadoop as a disruptive solution for these very challenges.

Basics of Big Data – Part 2 - Hadoop

2014/04/13

Big Data

ambari / big data / Cloudera / hadapt / Hadoop / Hbase / HDFS / Hive / Hortonworks / Hue / Impala / kerberos / knox / MapR / mapreduce / oozie / pig / sentry / Spark / stinger / storm / zookeeper

As discussed in Part 1 of this series, Hadoop is the foremost among tools being currently used for deriving value out of Big Data. The process of gaining insights from data through Business Intelligence and analytics essentially remains the same. However, with the huge variety, volume and velocity (the 3Vs of Big Data), it’s become necessary to re-think of the data management infrastructure. Hadoop, originally designed to be used with the MapReduce algorithm to solve parallel processing constraints in distributed architectures (e.