Course Overview

Learn job-relevant skills such as Big Data & Hadoop frameworks and how to use AWS services with Howe's Big data engineering training. The Big Data Engineering course teaches you how to use the database management tool and MongoDB through interactive sessions and industry projects. Enhance your knowledge with Howe's training program and become a successful Hadoop developer.

Program Feature

  • Corporate Trainers for Training Corporate
  • Affordable Course Fees
  • Flexible Schedule
  • Free Demo on Request
  • Certification on Completion
  • Create Your Own Content

Course Content

  • Big Data
  • Limitations and Solutions of existing Data Analytics Architecture
  • Hadoop, Hadoop Features, Hadoop Ecosystem, Hadoop 2.x core components
  • Hadoop Storage: HDFS
  • Hadoop Processing: MapReduce Framework
  • Hadoop Different Distributions.
  • Introduction of HDFS
  • HDFS Design
  • HDFS role in Hadoop
  • Features of HDFS
  • Daemons of Hadoop and its functionality
  • Name Node • Secondary Name Node
  • Job Tracker
  • Data Node
  • Task Tracker
  • Anatomy of File Wright
  • Anatomy of File Read
  • Network Topology
  • Nodes • Racks
  • Data Center
  • Parallel Copying
  • Basic Configuration for HDFS
  • Data Organization
  • Blocks
  • Replication
  • Rack Awareness
  • Heartbeat Signal
  • How to Store the Data into HDFS
  • How to Read the Data from HDFS
  • Accessing HDFS (Introduction of Basic UNIX commands)
  • CLI commands
  • The introduction of MapReduce.
  • MapReduce Architecture
  • Data flow in MapReduce
  • Splits • Mapper
  • Portioning
  • Sort and shuffle
  • Combiner
  • Reducer
  • Understand Difference Between Block and Input Split
  • Role of Record Reader
  • Basic Configuration of MapReduce
  • MapReduce life cycle
  • Driver Code
  • Mapper
  • How MapReduce Works • Writing and Executing the Basic MapReduce Program using Java • Submission & Initialization of MapReduce Job.
  • File Input/Output Formats in MapReduce Jobs
  • Text Input Format
  • Key Value Input Format
  • Sequence File Input Format
  • NLine Input Format
  • Joins
  • Map-side Joins
  • Reducer-side Joins
  • Word Count Example
  • Partition MapReduce Program
  • Side Data Distribution
  • Distributed Cache (with Program)
  • Counters (with Program)
  • Types of Counters
  • Task Counters
  • Job Counters
  • User Defined Counters
  • Propagation of Counters
  • Job Scheduling
  • Introduction to SQOOP
  • Use of SQOOP
  • Connect to MySQL database
  • SQOOP commands
  • Import
  • Export
  • Joins in SQOOP
  • Export to MySQL
  • Export to HBase
  • Introduction to HIVE
  • HIVE Meta Store
  • HIVE Architecture
  • Tables in HIVE
  • Managed Tables
  • External Tables
  • Hive Data Types
  • Primitive Types
  • Complex Types
  • Partition
  • Joins in HIVE
  • HIVE UDF’s and UADF’s with Programs
  • Introduction to HBASE
  • Basic Configurations of HBASE
  • Fundamentals of HBase
  • What is NoSQL?
  • HBase Data Model
  • Table and Row
  • Column Family and Column Qualifier
  • Cell and its Versioning
  • Categories of NoSQL Data Bases
  • Key-Value Database
  • Document Database
  • Column Family Database
  • HBASE Architecture
  • HMaster
  • Region Servers
  • Regions
  • MemStore
  • Store
  • SQL vs. NOSQL
  • HDFS vs. HBase
  • HBase Designing Tables
  • HBase Operations
  • Introduction Zookeeper
  • Data Modal • Operations
  • Introduction to OOZIE
  • Use of OOZIE
  • Introduction to Flume
  • Uses of Flume
  • Flume Architecture
  • Flume Master
  • Flume Collectors
  • Flume Agents
  • Hadoop 2.x Cluster Architecture - Federation and High Availability
  • A Typical Production Hadoop Cluster
  • Hadoop Cluster Modes
  • Common Hadoop Shell Commands
  • Hadoop 2.x Configuration Files
  • Single node cluster and Multi node cluster set up Hadoop Administration
  • MapReduce Use Cases
  • Traditional way Vs MapReduce way
  • Why MapReduce
  • Hadoop 2.x MapReduce Architecture
  • Hadoop 2.x MapReduce Components
  • YARN MR Application Execution Flow
  • YARN Workflow
  • Anatomy of MapReduce Program
  • Demo on MapReduce
  • Input Splits
  • Relation between Input Splits and HDFS Block
  • MapReduce: Combiner & Partitioner
  • Demo on de-identifying Health Care Data set
  • Demo on Weather Data set.
  • Counters
  • Distributed Cache
  • MRunit, Reduce Join
  • Custom Input Format
  • Sequence Input Format
  • Xml file Parsing using MapReduce.
  • About Pig
  • MapReduce Vs Pig
  • Pig Use Cases, Programming Structure in Pig
  • Pig Running Modes, Pig components
  • Pig Execution, Pig Latin Program
  • Data Models in Pig, Pig Data Types
  • Shell and Utility Commands
back to top