Site icon DataFlair

Spark Hadoop Cloudera Certifications You Must Know

Objective

This is a comprehensive guide about various Spark Hadoop Cloudera certifications. In this Cloudera certification tutorial we will discuss all the aspects like different certifications offered by Cloudera, the pattern of Cloudera certification exam / test, number of questions passing score, time limits, required skills and weightage of each and every topic. We will discuss about all the certifications offered by Cloudera like: “CCA Spark and Hadoop Developer Exam (CCA175)”, “Cloudera Certified Administrator for Apache Hadoop (CCAH)”, “CCP Data Scientist”, “CCP Data Engineer”.

Spark Hadoop Cloudera Certifications You Must Know

1. CCA Spark and Hadoop Developer Exam (CCA175)

In CCA Spark and Hadoop Developer certification, you need to write code in Scala and Python and run it on the cluster to prove your skills. This exam can be taken from any computer at any time globally.
CCA175 is a hands-on, practical exam using Cloudera technologies. The users are given their own CDH5 (currently 5.3.2) cluster that is pre-loaded with Spark, Impala, Crunch, Hive, Pig, Sqoop, Kafka, Flume, Kite, Hue, Oozie, DataFu, and many other software that are needed by the users.

a. CCA Spark and Hadoop Developer Certification Exam (CCA175) Details:

b. CCA175 Exam Question Format

In each CCA question, you would be required to solve a particular scenario. In some cases, a tool such as Impala or Hive may be used. In other cases, coding is required. In Spark problem, a template (in Scala or Python) is often provided that contains a skeleton of the solution, asking the candidate to fill in the missing lines with functional code.

c. Prerequisites

There are no prerequisites required to take any Cloudera certification exam.

d. Exam selection and related topics

I. Required Skills

Data Ingest: These are the skills required to transfer data between external systems and your cluster. It includes:

II. Transform, Stage, Store:

It converts a set of data values in a given format stored in HDFS into new data values and/or a new data format and write them into HDFS. This includes writing Spark applications in Scala / Python for the below tasks:

III. Data Analysis

Data Definition Language (DDL) to create tables in the Hive metastore, use by Hive and Impala.

2. Cloudera Certified Administrator for Apache Hadoop (CCAH)

Cloudera Certified Administrator for Apache Hadoop (CCAH) certification shows your technical knowledge, skills, and ability to configure, deploy, monitor, manage, maintain, and secure an Apache Hadoop cluster.

a. Cloudera Certified Administrator for Apache Hadoop (CCA-500) details

b. Exam sections and related topics

I. HDFS (17%)

II. YARN (17%)

III. Hadoop Cluster Planning (16%)

IV. Hadoop Cluster Installation and Administration (25%)

V. Resource Management (10%)

VI. Monitoring and Logging (15%)

3. CCP Data Scientist

“Cloudera Certified Professional Data Scientist” is able to perform descriptive and inferential statistics, apply advanced analytical techniques and build machine learning models using standard tools. Candidates need to prove their abilities on a live cluster with large datasets in a variety of formats. It needs clearing 3 CCP Data Scientist exams (DS700, DS701, and DS702) in any order. All three exams must be passed within 365 days of each other.

a. Common Skills (all exams)

b. Descriptive and Inferential Statistics on Big Data (DS700)

c. Advanced Analytical Techniques on Big Data (DS701)

d. Machine Learning at Scale (DS702)

e. What technologies/languages do you need to know?

You’ll be provided with a cluster with Hadoop technologies on a cluster, plus standard tools like Python and R. Among these standard technologies, it’s your choice what to use to solve the problem.

4. CCP Data Engineer

“Cloudera Certified Data Engineer” is able to perform core competencies required to ingest, transform, store, and analyze data in Cloudera’s CDH environment.

a. What do you need to know?

I. Data Ingestion

These are the skills to transfer data between external systems and your cluster. It includes:

II. Transform, Stage, Store

It means converting a set of data values in a given format stored in HDFS into new data values and/or a new data format and write them into HDFS or Hive/HCatalog. It includes:

III. Data Analysis

It includes operations like Filter, sort, join, aggregate, and/or transform one or more data sets in a given format stored in HDFS to produce a specified result. The queries will include complex data types (e.g., array, map, struct), the implementation of external libraries, partitioned data, compressed data, and requires the use of metadata from Hive/HCatalog.

IV. Workflow

It includes the ability to create and execute various jobs and actions that move data towards greater value and use in a system. It includes:

b. What should you expect?

You are given five to eight customer problems each with a unique, large data set, a CDH cluster, and four hours. For each problem, you must implement a technical solution that meets all the requirements using any tool or combination of tools on the cluster (see list below) — you get to pick the tool(s) that are right for the job.

Exit mobile version