Big Halloween Sale 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: sale65best

Free Databricks Databricks-Certified-Professional-Data-Engineer Practice Exam with Questions & Answers | Set: 3

Questions 21

Which statement regarding spark configuration on the Databricks platform is true?

Options:
A.

Spark configuration properties set for an interactive cluster with the Clusters UI will impact all notebooks attached to that cluster.

B.

When the same spar configuration property is set for an interactive to the same interactive cluster.

C.

Spark configuration set within an notebook will affect all SparkSession attached to the same interactive cluster

D.

The Databricks REST API can be used to modify the Spark configuration properties for an interactive cluster without interrupting jobs.

Databricks Databricks-Certified-Professional-Data-Engineer Premium Access
Questions 22

A junior data engineer has been asked to develop a streaming data pipeline with a grouped aggregation using DataFrame df. The pipeline needs to calculate the average humidity and average temperature for each non-overlapping five-minute interval. Incremental state information should be maintained for 10 minutes for late-arriving data.

Streaming DataFrame df has the following schema:

"device_id INT, event_time TIMESTAMP, temp FLOAT, humidity FLOAT"

Code block:

Choose the response that correctly fills in the blank within the code block to complete this task.

Options:
A.

withWatermark("event_time", "10 minutes")

B.

awaitArrival("event_time", "10 minutes")

C.

await("event_time + ‘10 minutes'")

D.

slidingWindow("event_time", "10 minutes")

E.

delayWrite("event_time", "10 minutes")

Questions 23

A junior developer complains that the code in their notebook isn't producing the correct results in the development environment. A shared screenshot reveals that while they're using a notebook versioned with Databricks Repos, they're using a personal branch that contains old logic. The desired branch named dev-2.3.9 is not available from the branch selection dropdown.

Which approach will allow this developer to review the current logic for this notebook?

Options:
A.

Use Repos to make a pull request use the Databricks REST API to update the current branch to dev-2.3.9

B.

Use Repos to pull changes from the remote Git repository and select the dev-2.3.9 branch.

C.

Use Repos to checkout the dev-2.3.9 branch and auto-resolve conflicts with the current branch

D.

Merge all changes back to the main branch in the remote Git repository and clone the repo again

E.

Use Repos to merge the current branch and the dev-2.3.9 branch, then make a pull request to sync with the remote repository

Questions 24

A data engineer wants to create a cluster using the Databricks CLI for a big ETL pipeline. The cluster should have five workers, one driver of type i3.xlarge, and should use the '14.3.x-scala2.12' runtime.

Which command should the data engineer use?

Options:
A.

databricks clusters create 14.3.x-scala2.12 --num-workers 5 --node-type-id i3.xlarge --cluster-name DataEngineer_cluster

B.

databricks clusters add 14.3.x-scala2.12 --num-workers 5 --node-type-id i3.xlarge --cluster-name Data Engineer_cluster

C.

databricks compute add 14.3.x-scala2.12 --num-workers 5 --node-type-id i3.xlarge --cluster-name Data Engineer_cluster

D.

databricks compute create 14.3.x-scala2.12 --num-workers 5 --node-type-id i3.xlarge --cluster-name Data Engineer_cluster

Questions 25

The data governance team is reviewing code used for deleting records for compliance with GDPR. They note the following logic is used to delete records from the Delta Lake table named users.

Assuming that user_id is a unique identifying key and that delete_requests contains all users that have requested deletion, which statement describes whether successfully executing the above logic guarantees that the records to be deleted are no longer accessible and why?

Options:
A.

Yes; Delta Lake ACID guarantees provide assurance that the delete command succeeded fully and permanently purged these records.

B.

No; the Delta cache may return records from previous versions of the table until the cluster is restarted.

C.

Yes; the Delta cache immediately updates to reflect the latest data files recorded to disk.

D.

No; the Delta Lake delete command only provides ACID guarantees when combined with the merge into command.

E.

No; files containing deleted records may still be accessible with time travel until a vacuum command is used to remove invalidated data files.

Questions 26

Which configuration parameter directly affects the size of a spark-partition upon ingestion of data into Spark?

Options:
A.

spark.sql.files.maxPartitionBytes

B.

spark.sql.autoBroadcastJoinThreshold

C.

spark.sql.files.openCostInBytes

D.

spark.sql.adaptive.coalescePartitions.minPartitionNum

E.

spark.sql.adaptive.advisoryPartitionSizeInBytes

Questions 27

Where in the Spark UI can one diagnose a performance problem induced by not leveraging predicate push-down?

Options:
A.

In the Executor's log file, by gripping for "predicate push-down"

B.

In the Stage's Detail screen, in the Completed Stages table, by noting the size of data read from the Input column

C.

In the Storage Detail screen, by noting which RDDs are not stored on disk

D.

In the Delta Lake transaction log. by noting the column statistics

E.

In the Query Detail screen, by interpreting the Physical Plan

Questions 28

The data governance team is reviewing user for deleting records for compliance with GDPR. The following logic has been implemented to propagate deleted requests from the user_lookup table to the user aggregate table.

Assuming that user_id is a unique identifying key and that all users have requested deletion have been removed from the user_lookup table, which statement describes whether successfully executing the above logic guarantees that the records to be deleted from the user_aggregates table are no longer accessible and why?

Options:
A.

No: files containing deleted records may still be accessible with time travel until a BACUM command is used to remove invalidated data files.

B.

Yes: Delta Lake ACID guarantees provide assurance that the DELETE command successed fully and permanently purged these records.

C.

No: the change data feed only tracks inserts and updates not deleted records.

D.

No: the Delta Lake DELETE command only provides ACID guarantees when combined with the MERGE INTO command

Questions 29

In order to facilitate near real-time workloads, a data engineer is creating a helper function to leverage the schema detection and evolution functionality of Databricks Auto Loader. The desired function will automatically detect the schema of the source directly, incrementally process JSON files as they arrive in a source directory, and automatically evolve the schema of the table when new fields are detected.

The function is displayed below with a blank:

Which response correctly fills in the blank to meet the specified requirements?

Options:
A.

Option A

B.

Option B

C.

Option C

D.

Option D

E.

Option E

Questions 30

A data pipeline uses Structured Streaming to ingest data from kafka to Delta Lake. Data is being stored in a bronze table, and includes the Kafka_generated timesamp, key, and value. Three months after the pipeline is deployed the data engineering team has noticed some latency issued during certain times of the day.

A senior data engineer updates the Delta Table's schema and ingestion logic to include the current timestamp (as recoded by Apache Spark) as well the Kafka topic and partition. The team plans to use the additional metadata fields to diagnose the transient processing delays:

Which limitation will the team face while diagnosing this problem?

Options:
A.

New fields not be computed for historic records.

B.

Updating the table schema will invalidate the Delta transaction log metadata.

C.

Updating the table schema requires a default value provided for each file added.

D.

Spark cannot capture the topic partition fields from the kafka source.