Free Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Practice Exam with Questions & Answers | Set: 2

Name: How to Pass Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Exams
Brand: Examstrack
SKU: databricks-certified-associate-developer-for-apache-spark-3-0
Price: 36.75 USD
Availability: InStock

Questions 11

The code block displayed below contains multiple errors. The code block should return a DataFrame that contains only columns transactionId, predError, value and storeId of DataFrame

transactionsDf. Find the errors.

Code block:

transactionsDf.select([col(productId), col(f)])

Sample of transactionsDf:

1.+-------------+---------+-----+-------+---------+----+

3.+-------------+---------+-----+-------+---------+----+

4.| 1| 3| 4| 25| 1|null|

5.| 2| 6| 7| 2| 2|null|

6.| 3| 3| null| 25| 3|null|

7.+-------------+---------+-----+-------+---------+----+

Options:

The column names should be listed directly as arguments to the operator and not as a list.

The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all column names should be expressed

as strings without being wrapped in a col() operator.

The select operator should be replaced by a drop operator.

The column names should be listed directly as arguments to the operator and not as a list and following the pattern of how column names are expressed in the code block, columns productId and

f should be replaced by transactionId, predError, value and storeId.

The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all col() operators should be removed.

Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Premium Access

Answer:

Explanation:

Explanation

Correct code block: transactionsDf.drop("productId", "f")

This QUESTION NO: requires a lot of thinking to get right. For solving it, you may take advantage of the digital notepad that is provided to you during the test. You have probably seen that the code

block

includes multiple errors. In the test, you are usually confronted with a code block that only contains a single error. However, since you are practicing here, this challenging multi-error QUESTION

NO: will

make it easier for you to deal with single-error questions in the real exam.

The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all column names should be expressed as

strings without being wrapped in a col() operator.

Correct! Here, you need to figure out the many, many things that are wrong with the initial code block. While the QUESTION NO: can be solved by using a select statement, a drop statement, given

the

answer options, is the correct one. Then, you can read in the documentation that drop does not take a list as an argument, but just the column names that should be dropped. Finally, the column

names should be expressed as strings and not as Python variable names as in the original code block.

The column names should be listed directly as arguments to the operator and not as a list.

Incorrect. While this is a good first step and part of the correct solution (see above), this modification is insufficient to solve the question.

The column names should be listed directly as arguments to the operator and not as a list and following the pattern of how column names are expressed in the code block, columns productId and f

should be replaced by transactionId, predError, value and storeId.

Wrong. If you use the same pattern as in the original code block (col(productId), col(f)), you are still making a mistake. col(productId) will trigger Python to search for the content of a variable named

productId instead of telling Spark to use the column productId - for that, you need to express it as a string.

The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all col() operators should be removed.

No. This still leaves you with Python trying to interpret the column names as Python variables (see above).

The select operator should be replaced by a drop operator.

Wrong, this is not enough to solve the question. If you do this, you will still face problems since you are passing a Python list to drop and the column names are still interpreted as Python variables

(see above).

More info: pyspark.sql.DataFrame.drop — PySpark 3.1.2 documentation

Static notebook | Dynamic notebook: See test 3, QUESTION NO: 30 (Databricks import instructions)

David

15-Sep-2025

Thanks to Examstrack's competent team of IT experts, I aced my Databricks certification exam. Their verified questions and answers are the best!

Questions 12

Which of the following describes Spark actions?

Options:

Writing data to disk is the primary purpose of actions.

Actions are Spark's way of exchanging data between executors.

The driver receives data upon request by actions.

Stage boundaries are commonly established by actions.

Actions are Spark's way of modifying RDDs.

Questions 13

The code block displayed below contains an error. The code block should read the csv file located at path data/transactions.csv into DataFrame transactionsDf, using the first row as column header

and casting the columns in the most appropriate type. Find the error.

First 3 rows of transactions.csv:

1.transactionId;storeId;productId;name

2.1;23;12;green grass

3.2;35;31;yellow sun

4.3;23;12;green grass

Code block:

transactionsDf = spark.read.load("data/transactions.csv", sep=";", format="csv", header=True)

Options:

The DataFrameReader is not accessed correctly.

The transaction is evaluated lazily, so no file will be read.

Spark is unable to understand the file type.

The code block is unable to capture all columns.

The resulting DataFrame will not have the appropriate schema.

Questions 14

Which of the following code blocks adds a column predErrorSqrt to DataFrame transactionsDf that is the square root of column predError?

Options:

transactionsDf.withColumn("predErrorSqrt", sqrt(predError))

transactionsDf.select(sqrt(predError))

transactionsDf.withColumn("predErrorSqrt", col("predError").sqrt())

transactionsDf.withColumn("predErrorSqrt", sqrt(col("predError")))

transactionsDf.select(sqrt("predError"))

Answer:

Explanation:

Explanation

transactionsDf.withColumn("predErrorSqrt", sqrt(col("predError")))

Correct. The DataFrame.withColumn() operator is used to add a new column to a DataFrame. It takes two arguments: The name of the new column (here: predErrorSqrt) and a Column expression

as the new column. In PySpark, a Column expression means referring to a column using the col("predError") command or by other means, for example by transactionsDf.predError, or even just

using the column name as a string, "predError".

The QUESTION NO: asks for the square root. sqrt() is a function in pyspark.sql.functions and calculates the square root. It takes a value or a Column as an input. Here it is the predError column of

DataFrame transactionsDf expressed through col("predError").

transactionsDf.withColumn("predErrorSqrt", sqrt(predError))

Incorrect. In this expression, sqrt(predError) is incorrect syntax. You cannot refer to predError in this way – to Spark it looks as if you are trying to refer to the non-existent Python variable predError.

You could pass transactionsDf.predError, col("predError") (as in the correct solution), or even just "predError" instead.

transactionsDf.select(sqrt(predError))

Wrong. Here, the explanation just above this one about how to refer to predError applies.

transactionsDf.select(sqrt("predError"))

No. While this is correct syntax, it will return a single-column DataFrame only containing a column showing the square root of column predError. However, the QUESTION NO: asks for a column to

be added to the original DataFrame transactionsDf.

transactionsDf.withColumn("predErrorSqrt", col("predError").sqrt())

No. The issue with this statement is that column col("predError") has no sqrt() method. sqrt() is a member of pyspark.sql.functions, but not of pyspark.sql.Column.

More info: pyspark.sql.DataFrame.withColumn — PySpark 3.1.2 documentation and pyspark.sql.functions.sqrt — PySpark 3.1.2 documentation

Static notebook | Dynamic notebook: See test 2, QUESTION NO: 31 (Databricks import instructions)

Questions 15

The code block displayed below contains an error. The code block should trigger Spark to cache DataFrame transactionsDf in executor memory where available, writing to disk where insufficient

executor memory is available, in a fault-tolerant way. Find the error.

Code block:

transactionsDf.persist(StorageLevel.MEMORY_AND_DISK)

Options:

Caching is not supported in Spark, data are always recomputed.

Data caching capabilities can be accessed through the spark object, but not through the DataFrame API.

The storage level is inappropriate for fault-tolerant storage.

The code block uses the wrong operator for caching.

The DataFrameWriter needs to be invoked.

Questions 16

Which of the following code blocks creates a new 6-column DataFrame by appending the rows of the 6-column DataFrame yesterdayTransactionsDf to the rows of the 6-column DataFrame

todayTransactionsDf, ignoring that both DataFrames have different column names?

Options:

union(todayTransactionsDf, yesterdayTransactionsDf)

todayTransactionsDf.unionByName(yesterdayTransactionsDf, allowMissingColumns=True)

todayTransactionsDf.unionByName(yesterdayTransactionsDf)

todayTransactionsDf.concat(yesterdayTransactionsDf)

todayTransactionsDf.union(yesterdayTransactionsDf)

Questions 17

Which of the following code blocks returns a single-column DataFrame showing the number of words in column supplier of DataFrame itemsDf?

Sample of DataFrame itemsDf:

1.+------+-----------------------------+-------------------+

2.|itemId|attributes |supplier |

3.+------+-----------------------------+-------------------+

4.|1 |[blue, winter, cozy] |Sports Company Inc.|

5.|2 |[red, summer, fresh, cooling]|YetiX |

6.|3 |[green, summer, travel] |Sports Company Inc.|

7.+------+-----------------------------+-------------------+

Options:

itemsDf.split("supplier", " ").count()

itemsDf.split("supplier", " ").size()

itemsDf.select(word_count("supplier"))

spark.select(size(split(col(supplier), " ")))

itemsDf.select(size(split("supplier", " ")))

Questions 18

Which of the following code blocks returns a DataFrame that is an inner join of DataFrame itemsDf and DataFrame transactionsDf, on columns itemId and productId, respectively and in which every

itemId just appears once?

Options:

itemsDf.join(transactionsDf, "itemsDf.itemId==transactionsDf.productId").distinct("itemId")

itemsDf.join(transactionsDf, itemsDf.itemId==transactionsDf.productId).dropDuplicates(["itemId"])

itemsDf.join(transactionsDf, itemsDf.itemId==transactionsDf.productId).dropDuplicates("itemId")

itemsDf.join(transactionsDf, itemsDf.itemId==transactionsDf.productId, how="inner").distinct(["itemId"])

itemsDf.join(transactionsDf, "itemsDf.itemId==transactionsDf.productId", how="inner").dropDuplicates(["itemId"])

Questions 19

The code block shown below should return a two-column DataFrame with columns transactionId and supplier, with combined information from DataFrames itemsDf and transactionsDf. The code

block should merge rows in which column productId of DataFrame transactionsDf matches the value of column itemId in DataFrame itemsDf, but only where column storeId of DataFrame

transactionsDf does not match column itemId of DataFrame itemsDf. Choose the answer that correctly fills the blanks in the code block to accomplish this.

Code block:

transactionsDf.__1__(itemsDf, __2__).__3__(__4__)

Options:

1. join

2. transactionsDf.productId==itemsDf.itemId, how="inner"

3. select

4. "transactionId", "supplier"

1. select

2. "transactionId", "supplier"

3. join

4. [transactionsDf.storeId!=itemsDf.itemId, transactionsDf.productId==itemsDf.itemId]

1. join

2. [transactionsDf.productId==itemsDf.itemId, transactionsDf.storeId!=itemsDf.itemId]

3. select

4. "transactionId", "supplier"

1. filter

2. "transactionId", "supplier"

3. join

4. "transactionsDf.storeId!=itemsDf.itemId, transactionsDf.productId==itemsDf.itemId"

1. join

2. transactionsDf.productId==itemsDf.itemId, transactionsDf.storeId!=itemsDf.itemId

3. filter

4. "transactionId", "supplier"

Answer:

Explanation:

Explanation

This QUESTION NO: is pretty complex and, in its complexity, is probably above what you would encounter in the exam. However, reading the QUESTION NO: carefully, you can use your logic skills

to weed out the

wrong answers here.

First, you should examine the join statement which is common to all answers. The first argument of the join() operator (documentation linked below) is the DataFrame to be joined with. Where join is

in gap 3, the first argument of gap 4 should therefore be another DataFrame. For none of the questions where join is in the third gap, this is the case. So you can immediately discard two answers.

For all other answers, join is in gap 1, followed by .(itemsDf, according to the code block. Given how the join() operator is called, there are now three remaining candidates.

Looking further at the join() statement, the second argument (on=) expects "a string for the join column name, a list of column names, a join expression (Column), or a list of Columns", according to

the documentation. As one answer option includes a list of join expressions (transactionsDf.productId==itemsDf.itemId, transactionsDf.storeId!=itemsDf.itemId) which is unsupported according to the

documentation, we can discard that answer, leaving us with two remaining candidates.

Both candidates have valid syntax, but only one of them fulfills the condition in the QUESTION NO: "only where column storeId of DataFrame transactionsDf does not match column itemId of

DataFrame

itemsDf". So, this one remaining answer option has to be the correct one!

As you can see, although sometimes overwhelming at first, even more complex questions can be figured out by rigorously applying the knowledge you can gain from the documentation during the

exam.

More info: pyspark.sql.DataFrame.join — PySpark 3.1.2 documentation

Static notebook | Dynamic notebook: See test 3, QUESTION NO: 47 (Databricks import instructions)

Questions 20

Which of the following code blocks produces the following output, given DataFrame transactionsDf?

Output:

1.root

2. |-- transactionId: integer (nullable = true)

3. |-- predError: integer (nullable = true)

4. |-- value: integer (nullable = true)

5. |-- storeId: integer (nullable = true)

6. |-- productId: integer (nullable = true)

7. |-- f: integer (nullable = true)

DataFrame transactionsDf:

1.+-------------+---------+-----+-------+---------+----+

3.+-------------+---------+-----+-------+---------+----+

4.| 1| 3| 4| 25| 1|null|

5.| 2| 6| 7| 2| 2|null|

6.| 3| 3| null| 25| 3|null|

7.+-------------+---------+-----+-------+---------+----+

Options:

transactionsDf.schema.print()

transactionsDf.rdd.printSchema()

transactionsDf.rdd.formatSchema()

transactionsDf.printSchema()

print(transactionsDf.schema)

Exam Code: Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0

Certification Provider: Databricks

Exam Name: Databricks Certified Associate Developer for Apache Spark 3.0 Exam

Last Update: Oct 31, 2025

Questions: 180

How to Pass Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Exams

PDF + Testing Engine
~~$164.99~~ $57.75 Add to Cart

Testing Engine
~~$124.99~~ $43.75 Add to Cart

PDF (Q&A)
~~$104.99~~ $36.75 Add to Cart

Databricks Related Exams

How to pass Databricks Databricks-Certified-Professional-Data-Engineer - Databricks Certified Data Engineer Professional Exam Exam

How to pass Databricks Databricks-Certified-Professional-Data-Scientist - Databricks Certified Professional Data Scientist Exam Exam

How to pass Databricks Databricks-Certified-Data-Engineer-Associate - Databricks Certified Data Engineer Associate Exam Exam

How to pass Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 - Databricks Certified Associate Developer for Apache Spark 3.5 – Python Exam

Databricks-Generative-AI-Engineer-Associate - Databricks Certified Generative AI Engineer Associate

Databricks-Machine-Learning-Professional - Databricks Certified Machine Learning Professional

Databricks-Certified-Data-Analyst-Associate - Databricks Certified Data Analyst Associate Exam

Databricks-Machine-Learning-Associate - Databricks Certified Machine Learning Associate Exam

Get Databricks Full Access

Databricks Free Exams
Examstrack provides free Databricks exam prep materials and practice tests to support your Databricks certification goals.

Big Halloween Sale 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: sale65best

Navigation:

examstrack logo

Hot Vendors:

Free Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Practice Exam with Questions & Answers | Set: 2

How to Pass Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Exams

Databricks Related Exams

Databricks Free Exams