Free Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Practice Exam with Questions & Answers

Name: How to Pass Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Exams
Brand: Examstrack
SKU: databricks-certified-associate-developer-for-apache-spark-3-0
Price: 36.75 USD
Availability: InStock

Questions 1

The code block shown below should return a DataFrame with only columns from DataFrame transactionsDf for which there is a corresponding transactionId in DataFrame itemsDf. DataFrame

itemsDf is very small and much smaller than DataFrame transactionsDf. The query should be executed in an optimized way. Choose the answer that correctly fills the blanks in the code block to

accomplish this.

__1__.__2__(__3__, __4__, __5__)

Options:

1. transactionsDf

2. join

3. broadcast(itemsDf)

4. transactionsDf.transactionId==itemsDf.transactionId

5. "outer"

1. transactionsDf

2. join

3. itemsDf

4. transactionsDf.transactionId==itemsDf.transactionId

5. "anti"

1. transactionsDf

2. join

3. broadcast(itemsDf)

4. "transactionId"

5. "left_semi"

1. itemsDf

2. broadcast

3. transactionsDf

4. "transactionId"

5. "left_semi"

1. itemsDf

2. join

3. broadcast(transactionsDf)

4. "transactionId"

5. "left_semi"

Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Premium Access

David

15-Sep-2025

Thanks to Examstrack's competent team of IT experts, I aced my Databricks certification exam. Their verified questions and answers are the best!

Questions 2

Which of the following code blocks can be used to save DataFrame transactionsDf to memory only, recalculating partitions that do not fit in memory when they are needed?

Options:

from pyspark import StorageLevel

transactionsDf.cache(StorageLevel.MEMORY_ONLY)

transactionsDf.cache()

transactionsDf.storage_level('MEMORY_ONLY')

transactionsDf.persist()

transactionsDf.clear_persist()

from pyspark import StorageLevel

transactionsDf.persist(StorageLevel.MEMORY_ONLY)

Questions 3

The code block displayed below contains an error. The code block is intended to return all columns of DataFrame transactionsDf except for columns predError, productId, and value. Find the error.

Excerpt of DataFrame transactionsDf:

transactionsDf.select(~col("predError"), ~col("productId"), ~col("value"))

Options:

The select operator should be replaced by the drop operator and the arguments to the drop operator should be column names predError, productId and value wrapped in the col operator so they

should be expressed like drop(col(predError), col(productId), col(value)).

The select operator should be replaced with the deselect operator.

The column names in the select operator should not be strings and wrapped in the col operator, so they should be expressed like select(~col(predError), ~col(productId), ~col(value)).

The select operator should be replaced by the drop operator.

The select operator should be replaced by the drop operator and the arguments to the drop operator should be column names predError, productId and value as strings.

(Correct)

Questions 4

The code block shown below should store DataFrame transactionsDf on two different executors, utilizing the executors' memory as much as possible, but not writing anything to disk. Choose the

answer that correctly fills the blanks in the code block to accomplish this.

1.from pyspark import StorageLevel

2.transactionsDf.__1__(StorageLevel.__2__).__3__

Options:

1. cache

2. MEMORY_ONLY_2

3. count()

1. persist

2. DISK_ONLY_2

3. count()

1. persist

2. MEMORY_ONLY_2

3. select()

1. cache

2. DISK_ONLY_2

3. count()

1. persist

2. MEMORY_ONLY_2

3. count()

Questions 5

Which of the following code blocks displays the 10 rows with the smallest values of column value in DataFrame transactionsDf in a nicely formatted way?

Options:

transactionsDf.sort(asc(value)).show(10)

transactionsDf.sort(col("value")).show(10)

transactionsDf.sort(col("value").desc()).head()

transactionsDf.sort(col("value").asc()).print(10)

transactionsDf.orderBy("value").asc().show(10)

Questions 6

Which of the following code blocks returns a DataFrame with approximately 1,000 rows from the 10,000-row DataFrame itemsDf, without any duplicates, returning the same rows even if the code

block is run twice?

Options:

itemsDf.sampleBy("row", fractions={0: 0.1}, seed=82371)

itemsDf.sample(fraction=0.1, seed=87238)

itemsDf.sample(fraction=1000, seed=98263)

itemsDf.sample(withReplacement=True, fraction=0.1, seed=23536)

itemsDf.sample(fraction=0.1)

Answer:

Explanation:

Explanation

itemsDf.sample(fraction=0.1, seed=87238)

Correct. If itemsDf has 10,000 rows, this code block returns about 1,000, since DataFrame.sample() is never guaranteed to return an exact amount of rows. To ensure you are not returning

duplicates, you should leave the withReplacement parameter at False, which is the default. Since the QUESTION NO: specifies that the same rows should be returned even if the code block is run

twice,

you need to specify a seed. The number passed in the seed does not matter as long as it is an integer.

itemsDf.sample(withReplacement=True, fraction=0.1, seed=23536)

Incorrect. While this code block fulfills almost all requirements, it may return duplicates. This is because withReplacement is set to True.

Here is how to understand what replacement means: Imagine you have a bucket of 10,000 numbered balls and you need to take 1,000 balls at random from the bucket (similar to the problem in the

question). Now, if you would take those balls with replacement, you would take a ball, note its number, and put it back into the bucket, meaning the next time you take a ball from the bucket there

would be a chance you could take the exact same ball again. If you took the balls without replacement, you would leave the ball outside the bucket and not put it back in as you take the next 999

balls.

itemsDf.sample(fraction=1000, seed=98263)

Wrong. The fraction parameter needs to have a value between 0 and 1. In this case, it should be 0.1, since 1,000/10,000 = 0.1.

itemsDf.sampleBy("row", fractions={0: 0.1}, seed=82371)

No, DataFrame.sampleBy() is meant for stratified sampling. This means that based on the values in a column in a DataFrame, you can draw a certain fraction of rows containing those values from

the DataFrame (more details linked below). In the scenario at hand, sampleBy is not the right operator to use because you do not have any information about any column that the sampling should

depend on.

itemsDf.sample(fraction=0.1)

Incorrect. This code block checks all the boxes except that it does not ensure that when you run it a second time, the exact same rows will be returned. In order to achieve this, you would have to

specify a seed.

More info:

- pyspark.sql.DataFrame.sample — PySpark 3.1.2 documentation

- pyspark.sql.DataFrame.sampleBy — PySpark 3.1.2 documentation

- Types of Samplings in PySpark 3. The explanations of the sampling… | by Pinar Ersoy | Towards Data Science

Questions 7

Which of the following statements about reducing out-of-memory errors is incorrect?

Options:

Concatenating multiple string columns into a single column may guard against out-of-memory errors.

Reducing partition size can help against out-of-memory errors.

Limiting the amount of data being automatically broadcast in joins can help against out-of-memory errors.

Setting a limit on the maximum size of serialized data returned to the driver may help prevent out-of-memory errors.

Decreasing the number of cores available to each executor can help against out-of-memory errors.

Answer:

Explanation:

Explanation

Concatenating multiple string columns into a single column may guard against out-of-memory errors.

Exactly, this is an incorrect answer! Concatenating any string columns does not reduce the size of the data, it just structures it a different way. This does little to how Spark processes the data and

definitely does not reduce out-of-memory errors.

Reducing partition size can help against out-of-memory errors.

No, this is not incorrect. Reducing partition size is a viable way to aid against out-of-memory errors, since executors need to load partitions into memory before processing them. If the executor does

not have enough memory available to do that, it will throw an out-of-memory error. Decreasing partition size can therefore be very helpful for preventing that.

Decreasing the number of cores available to each executor can help against out-of-memory errors.

No, this is not incorrect. To process a partition, this partition needs to be loaded into the memory of an executor. If you imagine that every core in every executor processes a partition, potentially in

parallel with other executors, you can imagine that memory on the machine hosting the executors fills up quite quickly. So, memory usage of executors is a concern, especially when multiple

partitions are processed at the same time. To strike a balance between performance and memory usage, decreasing the number of cores may help against out-of-memory errors.

Setting a limit on the maximum size of serialized data returned to the driver may help prevent out-of-memory errors.

No, this is not incorrect. When using commands like collect() that trigger the transmission of potentially large amounts of data from the cluster to the driver, the driver may experience out-of-memory

errors. One strategy to avoid this is to be careful about using commands like collect() that send back large amounts of data to the driver. Another strategy is setting the parameter

spark.driver.maxResultSize. If data to be transmitted to the driver exceeds the threshold specified by the parameter, Spark will abort the job and therefore prevent an out-of-memory error.

Limiting the amount of data being automatically broadcast in joins can help against out-of-memory errors.

Wrong, this is not incorrect. As part of Spark's internal optimization, Spark may choose to speed up operations by broadcasting (usually relatively small) tables to executors. This broadcast is

happening from the driver, so all the broadcast tables are loaded into the driver first. If these tables are relatively big, or multiple mid-size tables are being broadcast, this may lead to an out-of-

memory error. The maximum table size for which Spark will consider broadcasting is set by the spark.sql.autoBroadcastJoinThreshold parameter.

More info: Configuration - Spark 3.1.2 Documentation and Spark OOM Error — Closeup. Does the following look familiar when… | by Amit Singh Rathore | The Startup | Medium

Questions 8

Which of the following describes characteristics of the Spark UI?

Options:

Via the Spark UI, workloads can be manually distributed across executors.

Via the Spark UI, stage execution speed can be modified.

The Scheduler tab shows how jobs that are run in parallel by multiple users are distributed across the cluster.

There is a place in the Spark UI that shows the property spark.executor.memory.

Some of the tabs in the Spark UI are named Jobs, Stages, Storage, DAGs, Executors, and SQL.

Questions 9

Which of the following code blocks prints out in how many rows the expression Inc. appears in the string-type column supplier of DataFrame itemsDf?

Options:

1.counter = 0

3.for index, row in itemsDf.iterrows():

4. if 'Inc.' in row['supplier']:

5. counter = counter + 1

7.print(counter)

1.counter = 0

3.def count(x):

4. if 'Inc.' in x['supplier']:

5. counter = counter + 1

7.itemsDf.foreach(count)

8.print(counter)

print(itemsDf.foreach(lambda x: 'Inc.' in x))

print(itemsDf.foreach(lambda x: 'Inc.' in x).sum())

1.accum=sc.accumulator(0)

3.def check_if_inc_in_supplier(row):

4. if 'Inc.' in row['supplier']:

5. accum.add(1)

7.itemsDf.foreach(check_if_inc_in_supplier)

8.print(accum.value)

Questions 10

Which of the following code blocks returns all unique values across all values in columns value and productId in DataFrame transactionsDf in a one-column DataFrame?

Options:

tranactionsDf.select('value').join(transactionsDf.select('productId'), col('value')==col('productId'), 'outer')

transactionsDf.select(col('value'), col('productId')).agg({'*': 'count'})

transactionsDf.select('value', 'productId').distinct()

transactionsDf.select('value').union(transactionsDf.select('productId')).distinct()

transactionsDf.agg({'value': 'collect_set', 'productId': 'collect_set'})

Exam Code: Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0

Certification Provider: Databricks

Exam Name: Databricks Certified Associate Developer for Apache Spark 3.0 Exam

Last Update: Oct 30, 2025

Questions: 180

How to Pass Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Exams

PDF + Testing Engine
~~$164.99~~ $57.75 Add to Cart

Testing Engine
~~$124.99~~ $43.75 Add to Cart

PDF (Q&A)
~~$104.99~~ $36.75 Add to Cart

Databricks Related Exams

How to pass Databricks Databricks-Certified-Professional-Data-Engineer - Databricks Certified Data Engineer Professional Exam Exam

How to pass Databricks Databricks-Certified-Professional-Data-Scientist - Databricks Certified Professional Data Scientist Exam Exam

How to pass Databricks Databricks-Certified-Data-Engineer-Associate - Databricks Certified Data Engineer Associate Exam Exam

How to pass Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 - Databricks Certified Associate Developer for Apache Spark 3.5 – Python Exam

Databricks-Machine-Learning-Professional - Databricks Certified Machine Learning Professional

Databricks-Certified-Data-Analyst-Associate - Databricks Certified Data Analyst Associate Exam

Databricks-Generative-AI-Engineer-Associate - Databricks Certified Generative AI Engineer Associate

Databricks-Machine-Learning-Associate - Databricks Certified Machine Learning Associate Exam

Get Databricks Full Access

Databricks Free Exams
Examstrack provides free Databricks exam prep materials and practice tests to support your Databricks certification goals.

Big Halloween Sale 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: sale65best

Navigation:

examstrack logo

Hot Vendors:

Free Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Practice Exam with Questions & Answers

How to Pass Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Exams

Databricks Related Exams

Databricks Free Exams