Free Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Practice Exam with Questions & Answers | Set: 4

Questions 31

A data engineer is working with a large JSON dataset containing order information. The dataset is stored in a distributed file system and needs to be loaded into a Spark DataFrame for analysis. The data engineer wants to ensure that the schema is correctly defined and that the data is read efficiently.

Which approach should the data scientist use to efficiently load the JSON data into a Spark DataFrame with a predefined schema?

Options:

Use spark.read.json() to load the data, then use DataFrame.printSchema() to view the inferred schema, and finally use DataFrame.cast() to modify column types.

Use spark.read.json() with the inferSchema option set to true

Use spark.read.format("json").load() and then use DataFrame.withColumn() to cast each column to the desired data type.

Define a StructType schema and use spark.read.schema(predefinedSchema).json() to load the data.

Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Premium Access

Questions 32

Given the schema:

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question 32

event_ts TIMESTAMP,

sensor_id STRING,

metric_value LONG,

ingest_ts TIMESTAMP,

source_file_path STRING

The goal is to deduplicate based on: event_ts, sensor_id, and metric_value.

Options:

dropDuplicates on all columns (wrong criteria)

dropDuplicates with no arguments (removes based on all columns)

groupBy without aggregation (invalid use)

dropDuplicates on the exact matching fields

Questions 33

The following code fragment results in an error:

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question 33

Which code fragment should be used instead?

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question 33

Options:

Questions 34

A developer wants to test Spark Connect with an existing Spark application.

What are the two alternative ways the developer can start a local Spark Connect server without changing their existing application code? (Choose 2 answers)

Options:

Execute their pyspark shell with the option --remote "https://localhost "

Execute their pyspark shell with the option --remote "sc://localhost"

Set the environment variable SPARK_REMOTE="sc://localhost" before starting the pyspark shell

Add .remote("sc://localhost") to their SparkSession.builder calls in their Spark code

Ensure the Spark property spark.connect.grpc.binding.port is set to 15002 in the application code

Questions 35

A data engineer needs to write a DataFrame df to a Parquet file, partitioned by the column country, and overwrite any existing data at the destination path.

Which code should the data engineer use to accomplish this task in Apache Spark?

Options:

df.write.mode("overwrite").partitionBy("country").parquet("/data/output")

df.write.mode("append").partitionBy("country").parquet("/data/output")

df.write.mode("overwrite").parquet("/data/output")

df.write.partitionBy("country").parquet("/data/output")

Questions 36

Which feature of Spark Connect is considered when designing an application to enable remote interaction with the Spark cluster?

Options:

It provides a way to run Spark applications remotely in any programming language

It can be used to interact with any remote cluster using the REST API

It allows for remote execution of Spark jobs

It is primarily used for data ingestion into Spark from external sources

Questions 37

A Data Analyst is working on the DataFrame sensor_df, which contains two columns:

Which code fragment returns a DataFrame that splits the record column into separate columns and has one array item per row?

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Question 37

Options:

exploded_df = sensor_df.withColumn("record_exploded", explode("record"))

exploded_df = exploded_df.select("record_datetime", "sensor_id", "status", "health")

exploded_df = exploded_df.select(

"record_datetime",

"record_exploded.sensor_id",

"record_exploded.status",

"record_exploded.health"

)

exploded_df = sensor_df.withColumn("record_exploded", explode("record"))

exploded_df = exploded_df.select(

"record_datetime",

"record_exploded.sensor_id",

"record_exploded.status",

"record_exploded.health"

)

exploded_df = sensor_df.withColumn("record_exploded", explode("record"))

exploded_df = exploded_df.select("record_datetime", "record_exploded")

Questions 38

21 of 55.

What is the behavior of the function date_sub(start, days) if a negative value is passed into the days parameter?

Options:

The number of days specified will be added to the start date.

An error message of an invalid parameter will be returned.

The same start date will be returned.

The number of days specified will be removed from the start date.

Questions 39

What is the relationship between jobs, stages, and tasks during execution in Apache Spark?

Options:

A job contains multiple stages, and each stage contains multiple tasks.

A job contains multiple tasks, and each task contains multiple stages.

A stage contains multiple jobs, and each job contains multiple tasks.

A stage contains multiple tasks, and each task contains multiple jobs.

Questions 40

An engineer wants to join two DataFrames df1 and df2 on the respective employee_id and emp_id columns:

df1: employee_id INT, name STRING

df2: emp_id INT, department STRING

The engineer uses:

result = df1.join(df2, df1.employee_id == df2.emp_id, how='inner')

What is the behaviour of the code snippet?

Options:

The code fails to execute because the column names employee_id and emp_id do not match automatically

The code fails to execute because it must use on='employee_id' to specify the join column explicitly

The code fails to execute because PySpark does not support joining DataFrames with a different structure

The code works as expected because the join condition explicitly matches employee_id from df1 with emp_id from df2

Exam Code: Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5

Certification Provider: Databricks

Exam Name: Databricks Certified Associate Developer for Apache Spark 3.5 – Python

Last Update: Dec 5, 2025

Questions: 136

How to Pass Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Exams

PDF + Testing Engine
~~$164.99~~ $49.5 Add to Cart

Testing Engine
~~$124.99~~ $37.5 Add to Cart

PDF (Q&A)
~~$104.99~~ $31.5 Add to Cart

Databricks Related Exams

How to pass Databricks Databricks-Certified-Professional-Data-Engineer - Databricks Certified Data Engineer Professional Exam Exam

How to pass Databricks Databricks-Certified-Professional-Data-Scientist - Databricks Certified Professional Data Scientist Exam Exam

How to pass Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 - Databricks Certified Associate Developer for Apache Spark 3.0 Exam Exam

How to pass Databricks Databricks-Certified-Data-Engineer-Associate - Databricks Certified Data Engineer Associate Exam Exam

Databricks-Generative-AI-Engineer-Associate - Databricks Certified Generative AI Engineer Associate

Databricks-Certified-Data-Analyst-Associate - Databricks Certified Data Analyst Associate Exam

Databricks-Machine-Learning-Associate - Databricks Certified Machine Learning Associate Exam

Databricks-Machine-Learning-Professional - Databricks Certified Machine Learning Professional

Get Databricks Full Access

Databricks Free Exams
Examstrack provides free Databricks exam prep materials and practice tests to support your Databricks certification goals.

Big Cyber Monday Sale 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 70track

Navigation:

examstrack logo

Hot Vendors:

Free Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Practice Exam with Questions & Answers | Set: 4

How to Pass Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 Exams

Databricks Related Exams

Databricks Free Exams