Summer Special 60% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: bestdeal

Free Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Practice Exam with Questions & Answers | Set: 4

Questions 31

Which of the following is a viable way to improve Spark's performance when dealing with large amounts of data, given that there is only a single application running on the cluster?

Options:
A.

Increase values for the properties spark.default.parallelism and spark.sql.shuffle.partitions

B.

Decrease values for the properties spark.default.parallelism and spark.sql.partitions

C.

Increase values for the properties spark.sql.parallelism and spark.sql.partitions

D.

Increase values for the properties spark.sql.parallelism and spark.sql.shuffle.partitions

E.

Increase values for the properties spark.dynamicAllocation.maxExecutors, spark.default.parallelism, and spark.sql.shuffle.partitions

Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Premium Access
Questions 32

Which of the following statements about garbage collection in Spark is incorrect?

Options:
A.

Garbage collection information can be accessed in the Spark UI's stage detail view.

B.

Optimizing garbage collection performance in Spark may limit caching ability.

C.

Manually persisting RDDs in Spark prevents them from being garbage collected.

D.

In Spark, using the G1 garbage collector is an alternative to using the default Parallel garbage collector.

E.

Serialized caching is a strategy to increase the performance of garbage collection.

Questions 33

Which of the following is the idea behind dynamic partition pruning in Spark?

Options:
A.

Dynamic partition pruning is intended to skip over the data you do not need in the results of a query.

B.

Dynamic partition pruning concatenates columns of similar data types to optimize join performance.

C.

Dynamic partition pruning performs wide transformations on disk instead of in memory.

D.

Dynamic partition pruning reoptimizes physical plans based on data types and broadcast variables.

E.

Dynamic partition pruning reoptimizes query plans based on runtime statistics collected during query execution.

Questions 34

Which of the following statements about executors is correct, assuming that one can consider each of the JVMs working as executors as a pool of task execution slots?

Options:
A.

Slot is another name for executor.

B.

There must be less executors than tasks.

C.

An executor runs on a single core.

D.

There must be more slots than tasks.

E.

Tasks run in parallel via slots.

Questions 35

Which of the following code blocks reads in the JSON file stored at filePath as a DataFrame?

Options:
A.

spark.read.json(filePath)

B.

spark.read.path(filePath, source="json")

C.

spark.read().path(filePath)

D.

spark.read().json(filePath)

E.

spark.read.path(filePath)

Questions 36

Which of the following statements about broadcast variables is correct?

Options:
A.

Broadcast variables are serialized with every single task.

B.

Broadcast variables are commonly used for tables that do not fit into memory.

C.

Broadcast variables are immutable.

D.

Broadcast variables are occasionally dynamically updated on a per-task basis.

E.

Broadcast variables are local to the worker node and not shared across the cluster.

Questions 37

The code block displayed below contains one or more errors. The code block should load parquet files at location filePath into a DataFrame, only loading those files that have been modified before

2029-03-20 05:44:46. Spark should enforce a schema according to the schema shown below. Find the error.

Schema:

1.root

2. |-- itemId: integer (nullable = true)

3. |-- attributes: array (nullable = true)

4. | |-- element: string (containsNull = true)

5. |-- supplier: string (nullable = true)

Code block:

1.schema = StructType([

2. StructType("itemId", IntegerType(), True),

3. StructType("attributes", ArrayType(StringType(), True), True),

4. StructType("supplier", StringType(), True)

5.])

6.

7.spark.read.options("modifiedBefore", "2029-03-20T05:44:46").schema(schema).load(filePath)

Options:
A.

The attributes array is specified incorrectly, Spark cannot identify the file format, and the syntax of the call to Spark's DataFrameReader is incorrect.

B.

Columns in the schema definition use the wrong object type and the syntax of the call to Spark's DataFrameReader is incorrect.

C.

The data type of the schema is incompatible with the schema() operator and the modification date threshold is specified incorrectly.

D.

Columns in the schema definition use the wrong object type, the modification date threshold is specified incorrectly, and Spark cannot identify the file format.

E.

Columns in the schema are unable to handle empty values and the modification date threshold is specified incorrectly.

Questions 38

The code block shown below should return the number of columns in the CSV file stored at location filePath. From the CSV file, only lines should be read that do not start with a # character. Choose

the answer that correctly fills the blanks in the code block to accomplish this.

Code block:

__1__(__2__.__3__.csv(filePath, __4__).__5__)

Options:
A.

1. size

2. spark

3. read()

4. escape='#'

5. columns

B.

1. DataFrame

2. spark

3. read()

4. escape='#'

5. shape[0]

C.

1. len

2. pyspark

3. DataFrameReader

4. comment='#'

5. columns

D.

1. size

2. pyspark

3. DataFrameReader

4. comment='#'

5. columns

E.

1. len

2. spark

3. read

4. comment='#'

5. columns

Questions 39

Which of the following code blocks shuffles DataFrame transactionsDf, which has 8 partitions, so that it has 10 partitions?

Options:
A.

transactionsDf.repartition(transactionsDf.getNumPartitions()+2)

B.

transactionsDf.repartition(transactionsDf.rdd.getNumPartitions()+2)

C.

transactionsDf.coalesce(10)

D.

transactionsDf.coalesce(transactionsDf.getNumPartitions()+2)

E.

transactionsDf.repartition(transactionsDf._partitions+2)

Questions 40

Which of the following statements about Spark's configuration properties is incorrect?

Options:
A.

The maximum number of tasks that an executor can process at the same time is controlled by the spark.task.cpus property.

B.

The maximum number of tasks that an executor can process at the same time is controlled by the spark.executor.cores property.

C.

The default value for spark.sql.autoBroadcastJoinThreshold is 10MB.

D.

The default number of partitions to use when shuffling data for joins or aggregations is 300.

E.

The default number of partitions returned from certain transformations can be controlled by the spark.default.parallelism property.