Free Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Practice Exam with Questions & Answers | Set: 4

Name: How to Pass Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Exams
Brand: Examstrack
SKU: databricks-certified-associate-developer-for-apache-spark-3-0
Price: 36.75 USD
Availability: InStock

Questions 31

Which of the following is a viable way to improve Spark's performance when dealing with large amounts of data, given that there is only a single application running on the cluster?

Options:

Increase values for the properties spark.default.parallelism and spark.sql.shuffle.partitions

Decrease values for the properties spark.default.parallelism and spark.sql.partitions

Increase values for the properties spark.sql.parallelism and spark.sql.partitions

Increase values for the properties spark.sql.parallelism and spark.sql.shuffle.partitions

Increase values for the properties spark.dynamicAllocation.maxExecutors, spark.default.parallelism, and spark.sql.shuffle.partitions

Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Premium Access

David

15-Sep-2025

Thanks to Examstrack's competent team of IT experts, I aced my Databricks certification exam. Their verified questions and answers are the best!

Questions 32

Which of the following statements about garbage collection in Spark is incorrect?

Options:

Garbage collection information can be accessed in the Spark UI's stage detail view.

Optimizing garbage collection performance in Spark may limit caching ability.

Manually persisting RDDs in Spark prevents them from being garbage collected.

In Spark, using the G1 garbage collector is an alternative to using the default Parallel garbage collector.

Serialized caching is a strategy to increase the performance of garbage collection.

Answer:

Explanation:

Manually persisting RDDs in Spark prevents them from being garbage collected.

This statement is incorrect, and thus the correct answer to the question. Spark's garbage collector will remove even persisted objects, albeit in an "LRU" fashion. LRU stands for least recently used.

So, during a garbage collection run, the objects that were used the longest time ago will be garbage collected first.

See the linked StackOverflow post below for more information.

Serialized caching is a strategy to increase the performance of garbage collection.

This statement is correct. The more Java objects Spark needs to collect during garbage collection, the longer it takes. Storing a collection of many Java objects, such as a DataFrame with a

complex schema, through serialization as a single byte array thus increases performance. This means that garbage collection takes less time on a serialized DataFrame than an unserialized

DataFrame.

Optimizing garbage collection performance in Spark may limit caching ability.

This statement is correct. A full garbage collection run slows down a Spark application. When taking about "tuning" garbage collection, we mean reducing the amount or duration of these slowdowns.

A full garbage collection run is triggered when the Old generation of the Java heap space is almost full. (If you are unfamiliar with this concept, check out the link to the Garbage Collection Tuning docs below.) Thus, one measure to avoid triggering a garbage collection run is to prevent the Old generation share of the heap space to be almost full.

To achieve this, one may decrease its size. Objects with sizes greater than the Old generation space will then be discarded instead of cached (stored) in the space and helping it to be "almost full".

This will decrease the number of full garbage collection runs, increasing overall performance.

Inevitably, however, objects will need to be recomputed when they are needed. So, this mechanism only works when a Spark application needs to reuse cached data as little as possible.

Garbage collection information can be accessed in the Spark UI's stage detail view.

This statement is correct. The task table in the Spark UI's stage detail view has a "GC Time" column, indicating the garbage collection time needed per task.

In Spark, using the G1 garbage collector is an alternative to using the default Parallel garbage collector.

This statement is correct. The G1 garbage collector, also known as garbage first garbage collector, is an alternative to the default Parallel garbage collector.

While the default Parallel garbage collector divides the heap into a few static regions, the G1 garbage collector divides the heap into many small regions that are created dynamically. The G1

garbage collector has certain advantages over the Parallel garbage collector which improve performance particularly for Spark workloads that require high throughput and low latency.

The G1 garbage collector is not enabled by default, and you need to explicitly pass an argument to Spark to enable it. For more information about the two garbage collectors, check out the

Databricks article linked below.

Questions 33

Which of the following is the idea behind dynamic partition pruning in Spark?

Options:

Dynamic partition pruning is intended to skip over the data you do not need in the results of a query.

Dynamic partition pruning concatenates columns of similar data types to optimize join performance.

Dynamic partition pruning performs wide transformations on disk instead of in memory.

Dynamic partition pruning reoptimizes physical plans based on data types and broadcast variables.

Dynamic partition pruning reoptimizes query plans based on runtime statistics collected during query execution.

Questions 34

Which of the following statements about executors is correct, assuming that one can consider each of the JVMs working as executors as a pool of task execution slots?

Options:

Slot is another name for executor.

There must be less executors than tasks.

An executor runs on a single core.

There must be more slots than tasks.

Tasks run in parallel via slots.

Questions 35

Which of the following code blocks reads in the JSON file stored at filePath as a DataFrame?

Options:

spark.read.json(filePath)

spark.read.path(filePath, source="json")

spark.read().path(filePath)

spark.read().json(filePath)

spark.read.path(filePath)

Questions 36

Which of the following statements about broadcast variables is correct?

Options:

Broadcast variables are serialized with every single task.

Broadcast variables are commonly used for tables that do not fit into memory.

Broadcast variables are immutable.

Broadcast variables are occasionally dynamically updated on a per-task basis.

Broadcast variables are local to the worker node and not shared across the cluster.

Questions 37

The code block displayed below contains one or more errors. The code block should load parquet files at location filePath into a DataFrame, only loading those files that have been modified before

2029-03-20 05:44:46. Spark should enforce a schema according to the schema shown below. Find the error.

Schema:

1.root

2. |-- itemId: integer (nullable = true)

3. |-- attributes: array (nullable = true)

4. | |-- element: string (containsNull = true)

5. |-- supplier: string (nullable = true)

Code block:

1.schema = StructType([

2. StructType("itemId", IntegerType(), True),

3. StructType("attributes", ArrayType(StringType(), True), True),

4. StructType("supplier", StringType(), True)

5.])

7.spark.read.options("modifiedBefore", "2029-03-20T05:44:46").schema(schema).load(filePath)

Options:

The attributes array is specified incorrectly, Spark cannot identify the file format, and the syntax of the call to Spark's DataFrameReader is incorrect.

Columns in the schema definition use the wrong object type and the syntax of the call to Spark's DataFrameReader is incorrect.

The data type of the schema is incompatible with the schema() operator and the modification date threshold is specified incorrectly.

Columns in the schema definition use the wrong object type, the modification date threshold is specified incorrectly, and Spark cannot identify the file format.

Columns in the schema are unable to handle empty values and the modification date threshold is specified incorrectly.

Answer:

Explanation:

Explanation

Correct code block:

schema = StructType([

StructField("itemId", IntegerType(), True),

StructField("attributes", ArrayType(StringType(), True), True),

StructField("supplier", StringType(), True)

])

spark.read.options(modifiedBefore="2029-03-20T05:44:46").schema(schema).parquet(filePath)

This QUESTION NO: is more difficult than what you would encounter in the exam. In the exam, for this QUESTION NO: type, only one error needs to be identified and not "one or multiple" as in the

question.

Columns in the schema definition use the wrong object type, the modification date threshold is specified incorrectly, and Spark cannot identify the file format.

Correct! Columns in the schema definition should use the StructField type. Building a schema from pyspark.sql.types, as here using classes like StructType and StructField, is one of multiple ways

of expressing a schema in Spark. A StructType always contains a list of StructFields (see documentation linked below). So, nesting StructType and StructType as shown in the QUESTION NO: is

wrong.

The modification date threshold should be specified by a keyword argument like options(modifiedBefore="2029-03-20T05:44:46") and not two consecutive non-keyword arguments as in the original

code block (see documentation linked below).

Spark cannot identify the file format correctly, because either it has to be specified by using the DataFrameReader.format(), as an argument to DataFrameReader.load(), or directly by calling, for

example, DataFrameReader.parquet().

Columns in the schema are unable to handle empty values and the modification date threshold is specified incorrectly.

No. If StructField would be used for the columns instead of StructType (see above), the third argument specified whether the column is nullable. The original schema shows that columns should be

nullable and this is specified correctly by the third argument being True in the schema in the code block.

It is correct, however, that the modification date threshold is specified incorrectly (see above).

The attributes array is specified incorrectly, Spark cannot identify the file format, and the syntax of the call to Spark's DataFrameReader is incorrect.

Wrong. The attributes array is specified correctly, following the syntax for ArrayType (see linked documentation below). That Spark cannot identify the file format is correct, see correct answer

above. In addition, the DataFrameReader is called correctly through the SparkSession spark.

Columns in the schema definition use the wrong object type and the syntax of the call to Spark's DataFrameReader is incorrect.

Incorrect, the object types in the schema definition are correct and syntax of the call to Spark's DataFrameReader is correct.

The data type of the schema is incompatible with the schema() operator and the modification date threshold is specified incorrectly.

False. The data type of the schema is StructType and an accepted data type for the DataFrameReader.schema() method. It is correct however that the modification date threshold is specified

incorrectly (see correct answer above).

Questions 38

The code block shown below should return the number of columns in the CSV file stored at location filePath. From the CSV file, only lines should be read that do not start with a # character. Choose

the answer that correctly fills the blanks in the code block to accomplish this.

Code block:

__1__(__2__.__3__.csv(filePath, __4__).__5__)

Options:

1. size

2. spark

3. read()

4. escape='#'

5. columns

1. DataFrame

2. spark

3. read()

4. escape='#'

5. shape[0]

1. len

2. pyspark

3. DataFrameReader

4. comment='#'

5. columns

1. size

2. pyspark

3. DataFrameReader

4. comment='#'

5. columns

1. len

2. spark

3. read

4. comment='#'

5. columns

Answer:

Explanation:

Explanation

Correct code block:

len(spark.read.csv(filePath, comment='#').columns)

This is a challenging QUESTION NO: with difficulties in an unusual context: The boundary between DataFrame and the DataFrameReader. It is unlikely that a QUESTION NO: of this difficulty level

appears in the

exam. However, solving it helps you get more comfortable with the DataFrameReader, a subject you will likely have to deal with in the exam.

Before dealing with the inner parentheses, it is easier to figure out the outer parentheses, gaps 1 and 5. Given the code block, the object in gap 5 would have to be evaluated by the object in gap 1,

returning the number of columns in the read-in CSV. One answer option includes DataFrame in gap 1 and shape[0] in gap 2. Since DataFrame cannot be used to evaluate shape[0], we can discard

this answer option.

Other answer options include size in gap 1. size() is not a built-in Python command, so if we use it, it would have to come from somewhere else. pyspark.sql.functions includes a size() method, but

this method only returns the length of an array or map stored within a column (documentation linked below). So, using a size() method is not an option here. This leaves us with two potentially valid

answers.

We have to pick between gaps 2 and 3 being spark.read or pyspark.DataFrameReader. Looking at the documentation (linked below), the DataFrameReader is actually a child class of pyspark.sql,

which means that we cannot import it using pyspark.DataFrameReader. Moreover, spark.read makes sense because on Databricks, spark references current Spark session

(pyspark.sql.SparkSession) and spark.read therefore returns a DataFrameReader (also see documentation below). Finally, there is only one correct answer option remaining.

More info:

- pyspark.sql.functions.size — PySpark 3.1.2 documentation

- pyspark.sql.DataFrameReader.csv — PySpark 3.1.2 documentation

- pyspark.sql.SparkSession.read — PySpark 3.1.2 documentation

Static notebook | Dynamic notebook: See test 3, QUESTION NO: 50 (Databricks import instructions)

Questions 39

Which of the following code blocks shuffles DataFrame transactionsDf, which has 8 partitions, so that it has 10 partitions?

Options:

transactionsDf.repartition(transactionsDf.getNumPartitions()+2)

transactionsDf.repartition(transactionsDf.rdd.getNumPartitions()+2)

transactionsDf.coalesce(10)

transactionsDf.coalesce(transactionsDf.getNumPartitions()+2)

transactionsDf.repartition(transactionsDf._partitions+2)

Questions 40

Which of the following statements about Spark's configuration properties is incorrect?

Options:

The maximum number of tasks that an executor can process at the same time is controlled by the spark.task.cpus property.

The maximum number of tasks that an executor can process at the same time is controlled by the spark.executor.cores property.

The default value for spark.sql.autoBroadcastJoinThreshold is 10MB.

The default number of partitions to use when shuffling data for joins or aggregations is 300.

The default number of partitions returned from certain transformations can be controlled by the spark.default.parallelism property.

Exam Code: Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0

Certification Provider: Databricks

Exam Name: Databricks Certified Associate Developer for Apache Spark 3.0 Exam

Last Update: Oct 30, 2025

Questions: 180

How to Pass Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Exams

PDF + Testing Engine
~~$164.99~~ $57.75 Add to Cart

Testing Engine
~~$124.99~~ $43.75 Add to Cart

PDF (Q&A)
~~$104.99~~ $36.75 Add to Cart

Databricks Related Exams

How to pass Databricks Databricks-Certified-Professional-Data-Engineer - Databricks Certified Data Engineer Professional Exam Exam

How to pass Databricks Databricks-Certified-Professional-Data-Scientist - Databricks Certified Professional Data Scientist Exam Exam

How to pass Databricks Databricks-Certified-Data-Engineer-Associate - Databricks Certified Data Engineer Associate Exam Exam

How to pass Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 - Databricks Certified Associate Developer for Apache Spark 3.5 – Python Exam

Databricks-Generative-AI-Engineer-Associate - Databricks Certified Generative AI Engineer Associate

Databricks-Machine-Learning-Professional - Databricks Certified Machine Learning Professional

Databricks-Machine-Learning-Associate - Databricks Certified Machine Learning Associate Exam

Databricks-Certified-Data-Analyst-Associate - Databricks Certified Data Analyst Associate Exam

Get Databricks Full Access

Databricks Free Exams
Examstrack provides free Databricks exam prep materials and practice tests to support your Databricks certification goals.

Big Halloween Sale 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: sale65best

Navigation:

examstrack logo

Hot Vendors:

Free Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Practice Exam with Questions & Answers | Set: 4

How to Pass Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Exams

Databricks Related Exams

Databricks Free Exams