Course Content
Understanding Big Data and Hadoop
- Big Data
- Limitations and Solutions of existing Data Analytics Architecture
- Hadoop, Hadoop Features, Hadoop Ecosystem, Hadoop 2.x core components
- Hadoop Storage: HDFS
- Hadoop Processing: MapReduce Framework
- Hadoop Different Distributions.
Deep Drive in HDFS (for Storing the Data)
- Introduction of HDFS
- HDFS Design
- HDFS role in Hadoop
- Features of HDFS
- Daemons of Hadoop and its functionality
- Name Node • Secondary Name Node
- Job Tracker
- Data Node
- Task Tracker
- Anatomy of File Wright
- Anatomy of File Read
- Network Topology
- Nodes • Racks
- Data Center
- Parallel Copying
- Basic Configuration for HDFS
- Data Organization
- Blocks
- Replication
- Rack Awareness
- Heartbeat Signal
- How to Store the Data into HDFS
- How to Read the Data from HDFS
- Accessing HDFS (Introduction of Basic UNIX commands)
- CLI commands
MapReduce using Java (Processing the Data)
- The introduction of MapReduce.
- MapReduce Architecture
- Data flow in MapReduce
- Splits • Mapper
- Portioning
- Sort and shuffle
- Combiner
- Reducer
- Understand Difference Between Block and Input Split
- Role of Record Reader
- Basic Configuration of MapReduce
- MapReduce life cycle
- Driver Code
- Mapper
- How MapReduce Works • Writing and Executing the Basic MapReduce Program using Java • Submission & Initialization of MapReduce Job.
- File Input/Output Formats in MapReduce Jobs
- Text Input Format
- Key Value Input Format
- Sequence File Input Format
- NLine Input Format
- Joins
- Map-side Joins
- Reducer-side Joins
- Word Count Example
- Partition MapReduce Program
- Side Data Distribution
- Distributed Cache (with Program)
- Counters (with Program)
- Types of Counters
- Task Counters
- Job Counters
- User Defined Counters
- Propagation of Counters
- Job Scheduling
SQOOP
- Introduction to SQOOP
- Use of SQOOP
- Connect to MySQL database
- SQOOP commands
- Import
- Export
- Joins in SQOOP
- Export to MySQL
- Export to HBase
HIVE
- Introduction to HIVE
- HIVE Meta Store
- HIVE Architecture
- Tables in HIVE
- Managed Tables
- External Tables
- Hive Data Types
- Primitive Types
- Complex Types
- Partition
- Joins in HIVE
- HIVE UDF’s and UADF’s with Programs
HBASE
- Introduction to HBASE
- Basic Configurations of HBASE
- Fundamentals of HBase
- What is NoSQL?
- HBase Data Model
- Table and Row
- Column Family and Column Qualifier
- Cell and its Versioning
- Categories of NoSQL Data Bases
- Key-Value Database
- Document Database
- Column Family Database
- HBASE Architecture
- HMaster
- Region Servers
- Regions
- MemStore
- Store
- SQL vs. NOSQL
- HDFS vs. HBase
- HBase Designing Tables
- HBase Operations
Zookeeper
- Introduction Zookeeper
- Data Modal • Operations
OOZIE
- Introduction to OOZIE
- Use of OOZIE
Flume
- Introduction to Flume
- Uses of Flume
- Flume Architecture
- Flume Master
- Flume Collectors
- Flume Agents
Hadoop Architecture and HDFS
- Hadoop 2.x Cluster Architecture - Federation and High Availability
- A Typical Production Hadoop Cluster
- Hadoop Cluster Modes
- Common Hadoop Shell Commands
- Hadoop 2.x Configuration Files
- Single node cluster and Multi node cluster set up Hadoop Administration
Hadoop MapReduce Framework
- MapReduce Use Cases
- Traditional way Vs MapReduce way
- Why MapReduce
- Hadoop 2.x MapReduce Architecture
- Hadoop 2.x MapReduce Components
- YARN MR Application Execution Flow
- YARN Workflow
- Anatomy of MapReduce Program
- Demo on MapReduce
- Input Splits
- Relation between Input Splits and HDFS Block
- MapReduce: Combiner & Partitioner
- Demo on de-identifying Health Care Data set
- Demo on Weather Data set.
Advanced MapReduce
- Counters
- Distributed Cache
- MRunit, Reduce Join
- Custom Input Format
- Sequence Input Format
- Xml file Parsing using MapReduce.
Pig
- About Pig
- MapReduce Vs Pig
- Pig Use Cases, Programming Structure in Pig
- Pig Running Modes, Pig components
- Pig Execution, Pig Latin Program
- Data Models in Pig, Pig Data Types
- Shell and Utility Commands