Weekend Sale 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: sale65best

Free Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Practice Exam with Questions & Answers | Set: 3

Questions 21

Which of the following code blocks performs an inner join between DataFrame itemsDf and DataFrame transactionsDf, using columns itemId and transactionId as join keys, respectively?

Options:
A.

itemsDf.join(transactionsDf, "inner", itemsDf.itemId == transactionsDf.transactionId)

B.

itemsDf.join(transactionsDf, itemId == transactionId)

C.

itemsDf.join(transactionsDf, itemsDf.itemId == transactionsDf.transactionId, "inner")

D.

itemsDf.join(transactionsDf, "itemsDf.itemId == transactionsDf.transactionId", "inner")

E.

itemsDf.join(transactionsDf, col(itemsDf.itemId) == col(transactionsDf.transactionId))

Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Premium Access
Questions 22

The code block displayed below contains an error. The code block should create DataFrame itemsAttributesDf which has columns itemId and attribute and lists every attribute from the attributes column in DataFrame itemsDf next to the itemId of the respective row in itemsDf. Find the error.

A sample of DataFrame itemsDf is below.

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Question 22

Code block:

itemsAttributesDf = itemsDf.explode("attributes").alias("attribute").select("attribute", "itemId")

Options:
A.

Since itemId is the index, it does not need to be an argument to the select() method.

B.

The alias() method needs to be called after the select() method.

C.

The explode() method expects a Column object rather than a string.

D.

explode() is not a method of DataFrame. explode() should be used inside the select() method instead.

E.

The split() method should be used inside the select() method instead of the explode() method.

Questions 23

Which of the following statements about RDDs is incorrect?

Options:
A.

An RDD consists of a single partition.

B.

The high-level DataFrame API is built on top of the low-level RDD API.

C.

RDDs are immutable.

D.

RDD stands for Resilient Distributed Dataset.

E.

RDDs are great for precisely instructing Spark on how to do a query.

Questions 24

Which of the following is the deepest level in Spark's execution hierarchy?

Options:
A.

Job

B.

Task

C.

Executor

D.

Slot

E.

Stage

Questions 25

The code block shown below should write DataFrame transactionsDf as a parquet file to path storeDir, using brotli compression and replacing any previously existing file. Choose the answer that

correctly fills the blanks in the code block to accomplish this.

transactionsDf.__1__.format("parquet").__2__(__3__).option(__4__, "brotli").__5__(storeDir)

Options:
A.

1. save

2. mode

3. "ignore"

4. "compression"

5. path

B.

1. store

2. with

3. "replacement"

4. "compression"

5. path

C.

1. write

2. mode

3. "overwrite"

4. "compression"

5. save

(Correct)

D.

1. save

2. mode

3. "replace"

4. "compression"

5. path

E.

1. write

2. mode

3. "overwrite"

4. compression

5. parquet

Questions 26

Which of the following statements about Spark's execution hierarchy is correct?

Options:
A.

In Spark's execution hierarchy, a job may reach over multiple stage boundaries.

B.

In Spark's execution hierarchy, manifests are one layer above jobs.

C.

In Spark's execution hierarchy, a stage comprises multiple jobs.

D.

In Spark's execution hierarchy, executors are the smallest unit.

E.

In Spark's execution hierarchy, tasks are one layer above slots.

Questions 27

Which of the following statements about storage levels is incorrect?

Options:
A.

The cache operator on DataFrames is evaluated like a transformation.

B.

In client mode, DataFrames cached with the MEMORY_ONLY_2 level will not be stored in the edge node's memory.

C.

Caching can be undone using the DataFrame.unpersist() operator.

D.

MEMORY_AND_DISK replicates cached DataFrames both on memory and disk.

E.

DISK_ONLY will not use the worker node's memory.

Questions 28

Which of the following describes the role of the cluster manager?

Options:
A.

The cluster manager schedules tasks on the cluster in client mode.

B.

The cluster manager schedules tasks on the cluster in local mode.

C.

The cluster manager allocates resources to Spark applications and maintains the executor processes in client mode.

D.

The cluster manager allocates resources to Spark applications and maintains the executor processes in remote mode.

E.

The cluster manager allocates resources to the DataFrame manager.

Questions 29

Which of the following code blocks returns a single-column DataFrame of all entries in Python list throughputRates which contains only float-type values ?

Options:
A.

spark.createDataFrame((throughputRates), FloatType)

B.

spark.createDataFrame(throughputRates, FloatType)

C.

spark.DataFrame(throughputRates, FloatType)

D.

spark.createDataFrame(throughputRates)

E.

spark.createDataFrame(throughputRates, FloatType())

Questions 30

The code block shown below should convert up to 5 rows in DataFrame transactionsDf that have the value 25 in column storeId into a Python list. Choose the answer that correctly fills the blanks in

the code block to accomplish this.

Code block:

transactionsDf.__1__(__2__).__3__(__4__)

Options:
A.

1. filter

2. "storeId"==25

3. collect

4. 5

B.

1. filter

2. col("storeId")==25

3. toLocalIterator

4. 5

C.

1. select

2. storeId==25

3. head

4. 5

D.

1. filter

2. col("storeId")==25

3. take

4. 5

E.

1. filter

2. col("storeId")==25

3. collect

4. 5