Best VPN Service (English Banners)

Most popular cloud and big data courses on Udemy

Alibaba Cloud
Autonomous Standing Desk

System Mechanic® Ultimate Defense™
Save up to 50% on ANSI Standards Packages

IK Multimedia's T-RackS 5
Preview this course
Preview this course
Preview this course
Preview this course
Preview this course
Preview this course
Preview this course
Preview this course

The Ultimate Hands-On Hadoop – Tame your Big Data!

Hadoop tutorial with MapReduce, HDFS, Spark, Flink, Hive, HBase, MongoDB, Cassandra, Kafka + more! Over 25 technologies.
Rating: 4.5 out of 5
(21,078 ratings)
112,811 students
Last updated 5/2020
30-Day Money-Back Guarantee
This course includes:
  • 14.5 hours on-demand video
  • 5 articles
  • 2 downloadable resources
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of completion

What you’ll learn

  • Design distributed systems that manage “big data” using Hadoop and related technologies.
  • Use HDFS and MapReduce for storing and analyzing data at scale.
  • Use Pig and Spark to create scripts to process data on a Hadoop cluster in more complex ways.
  • Analyze relational data using Hive and MySQL
  • Analyze non-relational data using HBase, Cassandra, and MongoDB
  • Query data interactively with Drill, Phoenix, and Presto
  • Choose an appropriate data storage technology for your application
  • Understand how Hadoop clusters are managed by YARN, Tez, Mesos, Zookeeper, Zeppelin, Hue, and Oozie.
  • Publish data to your Hadoop cluster using Kafka, Sqoop, and Flume
  • Consume streaming data using Spark Streaming, Flink, and Storm


  • You will need access to a PC running 64-bit Windows, MacOS, or Linux with an Internet connection and at least 8GB of *free* (not total) RAM, if you want to participate in the hands-on activities and exercises. If your PC does not meet these requirements, you can still follow along in the course without doing hands-on activities.
  • Some activities will require some prior programming experience, preferably in Python or Scala.
  • A basic familiarity with the Linux command line will be very helpful.

The world of Hadoop and “Big Data” can be intimidating – hundreds of different technologies with cryptic names form the Hadoop ecosystem. With this Hadoop tutorial, you’ll not only understand what those systems are and how they fit together – but you’ll go hands-on and learn how to use them to solve real business problems!

Learn and master the most popular big data technologies in this comprehensive course, taught by a former engineer and senior manager from Amazon and IMDb. We’ll go way beyond Hadoop itself, and dive into all sorts of distributed systems you may need to integrate with.

  • Install and work with a real Hadoop installation right on your desktop with Hortonworks (now part of Cloudera) and the Ambari UI

  • Manage big data on a cluster with HDFS and MapReduce

  • Write programs to analyze data on Hadoop with Pig and Spark

  • Store and query your data with Sqoop, Hive, MySQL, HBase, Cassandra, MongoDB, Drill, Phoenix, and Presto

  • Design real-world systems using the Hadoop ecosystem

  • Learn how your cluster is managed with YARN, Mesos, Zookeeper, Oozie, Zeppelin, and Hue

  • Handle streaming data in real time with Kafka, Flume, Spark Streaming, Flink, and Storm

Understanding Hadoop is a highly valuable skill for anyone working at companies with large amounts of data.

Almost every large company you might want to work at uses Hadoop in some way, including Amazon, Ebay, Facebook, Google, LinkedIn, IBM,  Spotify, Twitter, and Yahoo! And it’s not just technology companies that need Hadoop; even the New York Times uses Hadoop for processing images.

This course is comprehensive, covering over 25 different technologies in over 14 hours of video lectures. It’s filled with hands-on activities and exercises, so you get some real experience in using Hadoop – it’s not just theory.

You’ll find a range of activities in this course for people at every level. If you’re a project manager who just wants to learn the buzzwords, there are web UI’s for many of the activities in the course that require no programming knowledge. If you’re comfortable with command lines, we’ll show you how to work with them too. And if you’re a programmer, I’ll challenge you with writing real scripts on a Hadoop system using Scala, Pig Latin, and Python.

You’ll walk away from this course with a real, deep understanding of Hadoop and its associated distributed systems, and you can apply Hadoop to real-world problems. Plus a valuable completion certificate is waiting for you at the end! 

Please note the focus on this course is on application development, not Hadoop administration. Although you will pick up some administration skills along the way.

Knowing how to wrangle “big data” is an incredibly valuable skill for today’s top tech employers. Don’t be left behind – enroll now!

  • “The Ultimate Hands-On Hadoop… was a crucial discovery for me. I supplemented your course with a bunch of literature and conferences until I managed to land an interview. I can proudly say that I landed a job as a Big Data Engineer around a year after I started your course. Thanks so much for all the great content you have generated and the crystal clear explanations. ” – Aldo Serrano

  • “I honestly wouldn’t be where I am now without this course. Frank makes the complex simple by helping you through the process every step of the way. Highly recommended and worth your time especially the Spark environment.   This course helped me achieve a far greater understanding of the environment and its capabilities.  Frank makes the complex simple by helping you through the process every step of the way. Highly recommended and worth your time especially the Spark environment.” – Tyler Buck

Who this course is for:
  • Software engineers and programmers who want to understand the larger Hadoop ecosystem, and use it to store, analyze, and vend “big data” at scale.
  • Project, program, or product managers who want to understand the lingo and high-level architecture of Hadoop.
  • Data analysts and database administrators who are curious about Hadoop and how it relates to their work.
  • System architects who need to understand the components available in the Hadoop ecosystem, and how they fit together.

Taming Big Data with Apache Spark and Python – Hands On!

Dive right in with 15+ hands-on examples of analyzing large data sets with Apache Spark, on your desktop or on Hadoop!
Rating: 4.5 out of 5
(8,869 ratings)
49,589 students
Last updated 3/2020
30-Day Money-Back Guarantee
This course includes:
  • 5.5 hours on-demand video
  • 1 article
  • 7 downloadable resources
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of completion

What you’ll learn

  • Use DataFrames and Structured Streaming in Spark 3
  • Frame big data analysis problems as Spark problems
  • Use Amazon’s Elastic MapReduce service to run your job on a cluster with Hadoop YARN
  • Install and run Apache Spark on a desktop computer or on a cluster
  • Use Spark’s Resilient Distributed Datasets to process and analyze large data sets across many CPU’s
  • Implement iterative algorithms such as breadth-first-search using Spark
  • Use the MLLib machine learning library to answer common data mining questions
  • Understand how Spark SQL lets you work with structured data
  • Understand how Spark Streaming lets your process continuous streams of data in real time
  • Tune and troubleshoot large jobs running on a cluster
  • Share information between nodes on a Spark cluster using broadcast variables and accumulators
  • Understand how the GraphX library helps with network analysis problems
Course content
7 sections • 51 lectures • 5h 33m total length
  • How to Use This Course
  • 01:41
  • Udemy 101: Getting the Most From This Course
  • [Activity]Getting Set Up: Installing Python, a JDK, Spark, and its Dependencies.
  • 14:42
  • [Activity] Installing the MovieLens Movie Rating Dataset
  • 03:35
  • Introduction to Spark
  • 10:11
  • The Resilient Distributed Dataset (RDD)
  • 12:35
  • Ratings Histogram Walkthrough
  • 13:27
  • Filtering RDD’s, and the Minimum Temperature by Location Example
  • 08:11
  • [Activity]Running the Minimum Temperature Example, and Modifying it for Maximums
  • 05:06
  • [Activity] Running the Maximum Temperature by Location Example
  • 03:19
  • [Activity] Counting Word Occurrences using flatmap()
  • 07:24
  • [Activity] Improving the Word Count Script with Regular Expressions
  • 04:42
  • [Exercise] Find the Total Amount Spent by Customer
  • 04:01
  • [Excercise] Check your Results, and Now Sort them by Total Amount Spent.
  • 05:09
  • Check Your Sorted Implementation and Results Against Mine.
  • 02:44
  • [Activity] Find the Most Popular Movie
  • 05:53
  • [Activity] Use Broadcast Variables to Display Movie Names Instead of ID Numbers
  • 08:25
  • [Activity] Run the Script – Discover Who the Most Popular Superhero is!
  • 06:01
  • Superhero Degrees of Separation: Introducing Breadth-First Search
  • 07:56
  • Superhero Degrees of Separation: Accumulators, and Implementing BFS in Spark
  • 06:44
  • [Activity] Superhero Degrees of Separation: Review the Code and Run it
  • 09:35
  • Item-Based Collaborative Filtering in Spark, cache(), and persist()
  • 10:10
  • [Exercise] Improve the Quality of Similar Movies
  • 03:05
  • Introducing Elastic MapReduce
  • 05:09
  • [Activity] Setting up your AWS / Elastic MapReduce Account and Setting Up PuTTY
  • 09:58
  • Partitioning
  • 04:21
  • Create Similar Movies from One Million Ratings – Part 1
  • 05:10
  • Create Similar Movies from One Million Ratings – Part 3
  • 03:30
  • Troubleshooting Spark on a Cluster
  • 03:43
  • More Troubleshooting, and Managing Dependencies
  • 06:02
  • Introducing SparkSQL
  • 06:08
  • Executing SQL commands and SQL-style functions on a DataFrame
  • 08:16
  • Introducing MLLib
  • 08:09
  • [Activity] Using MLLib to Produce Movie Recommendations
  • 02:55
  • Analyzing the ALS Recommendations Results
  • 04:53
  • Spark Streaming
  • 08:04
  • GraphX
  • 02:11
  • Learning More about Spark and Data Science
  • 03:43
  • Bonus Lecture: More courses to explore!


  • Access to a personal computer. This course uses Windows, but the sample code will work fine on Linux as well.
  • Some prior programming or scripting experience. Python experience will help a lot, but you can pick it up as we go.

New! Updated for Spark 3 and with a hands-on structured streaming example.

“Big data” analysis is a hot and highly valuable skill – and this course will teach you the hottest technology in big data: Apache Spark. Employers including Amazon, EBay, NASA JPL, and Yahoo all use Spark to quickly extract meaning from massive data sets across a fault-tolerant Hadoop cluster. You’ll learn those same techniques, using your own Windows system right at home. It’s easier than you might think.

Learn and master the art of framing data analysis problems as Spark problems through over 15 hands-on examples, and then scale them up to run on cloud computing services in this course. You’ll be learning from an ex-engineer and senior manager from Amazon and IMDb.

  • Learn the concepts of Spark’s Resilient Distributed Datastores

  • Develop and run Spark jobs quickly using Python

  • Translate complex analysis problems into iterative or multi-stage Spark scripts

  • Scale up to larger data sets using Amazon’s Elastic MapReduce service

  • Understand how Hadoop YARN distributes Spark across computing clusters

  • Learn about other Spark technologies, like Spark SQL, Spark Streaming, and GraphX

By the end of this course, you’ll be running code that analyzes gigabytes worth of information – in the cloud – in a matter of minutes. 

This course uses the familiar Python programming language; if you’d rather use Scala to get the best performance out of Spark, see my “Apache Spark with Scala – Hands On with Big Data” course instead.

We’ll have some fun along the way. You’ll get warmed up with some simple examples of using Spark to analyze movie ratings data and text in a book. Once you’ve got the basics under your belt, we’ll move to some more complex and interesting tasks. We’ll use a million movie ratings to find movies that are similar to each other, and you might even discover some new movies you might like in the process! We’ll analyze a social graph of superheroes, and learn who the most “popular” superhero is – and develop a system to find “degrees of separation” between superheroes. Are all Marvel superheroes within a few degrees of being connected to The Incredible Hulk? You’ll find the answer.

This course is very hands-on; you’ll spend most of your time following along with the instructor as we write, analyze, and run real code together – both on your own system, and in the cloud using Amazon’s Elastic MapReduce service. 5 hours of video content is included, with over 15 real examples of increasing complexity you can build, run and study yourself. Move through them at your own pace, on your own schedule. The course wraps up with an overview of other Spark-based technologies, including Spark SQL, Spark Streaming, and GraphX.

Wrangling big data with Apache Spark is an important skill in today’s technical world. Enroll now!

  • ” I studied “Taming Big Data with Apache Spark and Python” with Frank Kane, and helped me build a great platform for Big Data as a Service for my company. I recommend the course!  ” – Cleuton Sampaio De Melo Jr.

Who this course is for:
  • People with some software development background who want to learn the hottest technology in big data analysis will want to check this out. This course focuses on Spark from a software development standpoint; we introduce some machine learning and data mining concepts along the way, but that’s not the focus. If you want to learn how to use Spark to carve up huge datasets and extract meaning from them, then this course is for you.
  • If you’ve never written a computer program or a script before, this course isn’t for you – yet. I suggest starting with a Python course first, if programming is new to you.
  • If your software development job involves, or will involve, processing large amounts of data, you need to know about Spark.
  • If you’re training for a new career in data science or big data, Spark is an important part of it.

Spark and Python for Big Data with PySpark

Learn how to use Spark with Python, including Spark Streaming, Machine Learning, Spark 2.0 DataFrames and more!
Rating: 4.5 out of 5
(11,899 ratings)
58,071 students
Last updated 5/2020
30-Day Money-Back Guarantee
This course includes:
  • 10.5 hours on-demand video
  • 4 articles
  • 4 downloadable resources
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of completion

What you’ll learn

  • Use Python and Spark together to analyze Big Data
  • Learn how to use the new Spark 2.0 DataFrame Syntax
  • Work on Consulting Projects that mimic real world situations!
  • Classify Customer Churn with Logisitic Regression
  • Use Spark with Random Forests for Classification
  • Learn how to use Spark’s Gradient Boosted Trees
  • Use Spark’s MLlib to create Powerful Machine Learning Models
  • Learn about the DataBricks Platform!
  • Get set up on Amazon Web Services EC2 for Big Data Analysis
  • Learn how to use AWS Elastic MapReduce Service!
  • Learn how to leverage the power of Linux with a Spark Environment!
  • Create a Spam filter using Spark and Natural Language Processing!
  • Use Spark Streaming to Analyze Tweets in Real Time!
Course content
18 sections • 67 lectures • 10h 35m total length
  • Course Overview
  • Frequently Asked Questions
  • What is Spark? Why Python?
  • Set-up Overview
  • 05:58
  • Note on Installation Sections
  • Recommended Setup
  • Databricks Setup
  • Local Installation VirtualBox Part 1
  • 11:25
  • Local Installation VirtualBox Part 2
  • Setting up PySpark
  • AWS EC2 Set-up Guide
  • 02:46
  • Creating the EC2 Instance
  • SSH with Mac or Linux
  • Installations on EC2
  • AWS EMR Setup
  • Introduction to Python Crash Course
  • Jupyter Notebook Overview
  • Python Crash Course Part One
  • Python Crash Course Part Two
  • Python Crash Course Part Three
  • Python Crash Course Exercises
  • Python Crash Course Exercise Solutions
  • Introduction to Spark DataFrames
  • Spark DataFrame Basics
  • 10:51
  • Spark DataFrame Basics Part Two
  • Spark DataFrame Basic Operations
  • 10:15
  • Groupby and Aggregate Operations
  • Dates and Timestamps
  • DataFrame Project Exercise
  • DataFrame Project Exercise Solutions
  • Introduction to Machine Learning and ISLR
  • Machine Learning with Spark and Python with MLlib
  • Linear Regression Documentation Example
  • Regression Evaluation
  • Linear Regression Example Code Along
  • Linear Regression Consulting Project Solutions
  • Logistic Regression Theory and Reading
  • Logistic Regression Code Along
  • Logistic Regression Consulting Project Solutions
  • Tree Methods Theory and Reading
  • Tree Methods Documentation Examples
  • Decision Tress and Random Forest Code Along Examples
  • Random Forest Classification Consulting Project Solutions
  • K-means Clustering Theory and Reading
  • KMeans Clustering Documentation Example
  • Clustering Example Code Along
  • Clustering Consulting Project Solutions
  • Introduction to Recommender Systems
  • Recommender System – Code Along Project
  • Introduction to Natural Language Processing
  • NLP Tools Part One
  • NLP Tools Part Two
  • Natural Language Processing Code Along Project
  • Introduction to Streaming with Spark!
  • Spark Streaming Documentation Example
  • Spark Streaming Twitter Project – Part
  • Spark Streaming Twitter Project – Part Two
  • Spark Streaming Twitter Project – Part Three
  • Bonus Lecture:


  • General Programming Skills in any Language (Preferrably Python)
  • 20 GB of free space on your local computer (or alternatively a strong internet connection for AWS)

Learn the latest Big Data Technology – Spark! And learn to use it with one of the most popular programming languages, Python!

One of the most valuable technology skills is the ability to analyze huge data sets, and this course is specifically designed to bring you up to speed on one of the best technologies for this task, Apache Spark! The top technology companies like Google, Facebook, Netflix, Airbnb, Amazon, NASA, and more are all using Spark to solve their big data problems!

Spark can perform up to 100x faster than Hadoop MapReduce, which has caused an explosion in demand for this skill! Because the Spark 2.0 DataFrame framework is so new, you now have the ability to quickly become one of the most knowledgeable people in the job market!

This course will teach the basics with a crash course in Python, continuing on to learning how to use Spark DataFrames with the latest Spark 2.0 syntax! Once we’ve done that we’ll go through how to use the MLlib Machine Library with the DataFrame syntax and Spark. All along the way you’ll have exercises and Mock Consulting Projects that put you right into a real world situation where you need to use your new skills to solve a real problem!

We also cover the latest Spark Technologies, like Spark SQL, Spark Streaming, and advanced models like Gradient Boosted Trees! After you complete this course you will feel comfortable putting Spark and PySpark on your resume! This course also has a full 30 day money back guarantee and comes with a LinkedIn Certificate of Completion!

If you’re ready to jump into the world of Python, Spark, and Big Data, this is the course for you!

Who this course is for:
  • Someone who knows Python and would like to learn how to use it for Big Data
  • Someone who is very familiar with another programming language and needs to learn Spark

Overall, the course is comprehensive but needs an update for Structured Streaming, which is now in full release, replacing Spark Streaming based on RDD’s. Also, there is no treatment for saving and reusing trained models and no discussion of hyperparameter tuning. I liked the ‘consulting projects’ which provided real-world exercises.


CCA 175 – Spark and Hadoop Developer – Python (pyspark)

Cloudera Certified Associate Spark and Hadoop Developer using Python as Programming Language
Rating: 4.4 out of 5
(1,079 ratings)
4,817 students
Last updated 5/2020
30-Day Money-Back Guarantee
This course includes:
  • 23 hours on-demand video
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of completion

What you’ll learn

  • Entire curriculum of CCA Spark and Hadoop Developer
  • Apache Sqoop
  • HDFS Commands
  • Python Fundamentals
  • Core Spark – Transformations and Actions
  • Spark SQL and Data Frames
  • Streaming analytics using Kafka, Flume and Spark Streaming


  • Basic programming skills using any programming language
  • Cloudera Quickstart VM or valid account for IT Versity Big Data labs or any Hadoop clusters where Hadoop, Hive and Spark are well integrated.
  • Minimum memory required based on the environment you are using with 64 bit operating system
  • 4 GB RAM with access to proper clusters or 16 GB RAM with virtual machines such as Cloudera QuickStart VM

CCA 175 Spark and Hadoop Developer is one of the well recognized Big Data certifications. This scenario-based certification exam demands basic programming using Python or Scala along with Spark and other Big Data technologies.

This comprehensive course covers all aspects of the certification using Python as a programming language.

  • Python Fundamentals

  • Spark SQL and Data Frames

  • File formats

Please note that the syllabus is recently changed and now the exam is primarily focused on Spark Data Frames and/or Spark SQL.

Exercises will be provided to prepare before attending the certification. The intention of the course is to boost the confidence to attend the certification.  

All the demos are given on our state of the art Big Data cluster. You can avail one-week complimentary lab access by filling this form which is provided as part of the welcome message.

Who this course is for:
  • Any IT aspirant/professional willing to learn Big Data and give CCA 175 certification
Anyone looking for PySpark experience I would recommend this will be Helpful with real time scenarios(use cases) and this might helps in Interviews as well. Also we can Quickly start with Lab and Platform to practice that really helps without wasting your time to setup the things and stuff like that.