You have:
DataFrame A: 128 GB of transactions
DataFrame B: 1 GB user lookup table
Which strategy is correct for broadcasting?
What is the risk associated with this operation when converting a large Pandas API on Spark DataFrame back to a Pandas DataFrame?
A developer is running Spark SQL queries and notices underutilization of resources. Executors are idle, and the number of tasks per stage is low.
What should the developer do to improve cluster utilization?
Given a DataFrame df that has 10 partitions, after running the code:
result = df.coalesce(20)
How many partitions will the result DataFrame have?
Which UDF implementation calculates the length of strings in a Spark DataFrame?
39 of 55.
A Spark developer is developing a Spark application to monitor task performance across a cluster.
One requirement is to track the maximum processing time for tasks on each worker node and consolidate this information on the driver for further analysis.
Which technique should the developer use?
18 of 55.
An engineer has two DataFrames — df1 (small) and df2 (large). To optimize the join, the engineer uses a broadcast join:
from pyspark.sql.functions import broadcast
df_result = df2.join(broadcast(df1), on="id", how="inner")
What is the purpose of using broadcast() in this scenario?
A data scientist is working on a project that requires processing large amounts of structured data, performing SQL queries, and applying machine learning algorithms. The data scientist is considering using Apache Spark for this task.
Which combination of Apache Spark modules should the data scientist use in this scenario?
Options:
The following code fragment results in an error:
@F.udf(T.IntegerType())
def simple_udf(t: str) -> str:
return answer * 3.14159
Which code fragment should be used instead?
29 of 55.
A Spark application is experiencing performance issues in client mode due to the driver being resource-constrained.
How should this issue be resolved?
PDF + Testing Engine
|
---|
$57.75 |
Testing Engine
|
---|
$43.75 |
PDF (Q&A)
|
---|
$36.75 |
Databricks Free Exams |
---|
![]() |