- Introduction to BIGDATA and HADOOP.
- Relation between Big Data and Hadoop.
- What is the need of going ahead with Hadoop?
- Scenarios to apt Hadoop Technology in REAL TIME Projects.
- How Hadoop is addressing Big Data Changes
- Importance of Hadoop Ecosystem Components
- What is HDFS (Hadoop Distributed File System).
- HDFS Architecture – 5 Daemons of Hadoop
- Replication in Hadoop – Fail Over Mechanism
- Hadoop Cluster Setup and JDK Installation.
- Why is Map Reduce is essential in Hadoop?
- MapReduce and drawbacks w.r.to Task Tracker Failure in Hadoop Cluster.
- Map Reduce Life Cycle & Communication Mechanism of Job Tracker & Task Tracker
- How to write a basic Map Reduce Program
- Compression Techniques in Map Reduce
- Unix Shell Scripting Basics and commands.
- How Unix shell used in hadoop
- PIG Installation (Hands on Installation on Laptops)
- Introduction to Apache Pig
- Map Reduce Vs Apache Pig
- Where to Use Map Reduce and PIG in REAL Time Hadoop Projects
- How to write a simple pig script
- Parameter substitution in PIG Scripts
- How to develop the Complex Pig Script
- Bags , Tuples and fields in PIG
- HIVE Installation(Hands on Installation on Laptops)
- Local Mode & Clustered Mode
- Hive Introduction and need of Apache HIVE in Hadoop
- When to choose PIG & HIVE in REAL Time Project
- Importance Of Hive Meta Store.
- Communication mechanism with Metastore.
- Hive Integration with Hadoop & Hive Query Language(Hive QL)
- SQL VS Hive QL, Data Slicing Mechanisms and Partitions In Hive
- Partitioning Vs Bucketing
- Collection Data Types in HIVE
- User Defined Functions(UDFs) in HIVE
- UDFs, UDAFs, UDTFs and need of UDFs in HIVE
- Hive Serializer/Deserializer – SerDe
- Semi Structured Data Processing Using Hive(XML/JSON)
- HIVE – HBASE Integration
- Sqoop installation with MySQL Client
- Introduction to Sqoop.
- MySQL client and Server Installation
- How to connect to Relational Database using Sqoop
- Different Sqoop Commands
- Hive-Imports, Incremental import,
- import all table and import using password on file
- Hbase introduction and HDFS Vs Hbase
- Hbase Data modeling Elements
- Hbase Architecture & Clients(REST,Thrift,Java Based,Avro)
- MongoDB basics & Introduction to MongoDB
- Features of MongoDB
- REAL Time Use Cases on Hadoop & MongoDB Use Cases
- What is YARN?
- Difference between Map Reduce & YARN
- YARN Architecture(Resource Manager,Application Master,Node Manager),
- When should we go ahead with YARN
- YARN Process flow and Web UI
- Different Configuration Files for YARN.
- What is Impala? & How can we use Impala for Query Processing?
- When should we go ahead with Impala
- HIVE Vs Impala
- Real time Use Cases with Impala.
- Interactive Scala – Scala Shell
- Functional Programing in Scala
- What is Functional Programming
- Difference between Object Oriented and Functional
- Flume Master , Flume Collector and Flume Agent
- Real Time Use Case using Apache Flume.
- Oozie Introduction, Oozie Architecture & Job Submission
- Spark Vs Map Reduce Processing
- File Operations in Spark Shell.
- Introduction to Spark Components.
- What is RDD and why it is important in Spark
- Core Features of RDD & Lazily Evaluated
- Different Operation on RDDs
- Actions and Transformation in RDD
- Running Spark in a Clustered Mode.
- Introduction to Spark SQL
- The SQL Context, Hive Vs Spark SQL
- Introduction to Data Frames [ Dfs ]
Course Content
Introduction to Big Data and Hadoop