Weekend Sale 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: sale65best

Free Databricks Databricks-Certified-Data-Engineer-Associate Practice Exam with Questions & Answers | Set: 3

Questions 21

A data engineer is running code in a Databricks Repo that is cloned from a central Git repository. A colleague of the data engineer informs them that changes have beenmade and synced to the central Git repository. The data engineer now needs to sync their Databricks Repo to get the changes from the central Git repository.

Which of the following Git operations does the data engineer need to run to accomplish this task?

Options:
A.

Merge

B.

Push

C.

Pull

D.

Commit

E.

Clone

Databricks Databricks-Certified-Data-Engineer-Associate Premium Access
Questions 22

A data engineer needs to create a table in Databricks using data from their organization’s existing SQLite database.

They run the following command:

Databricks-Certified-Data-Engineer-Associate Question 22

Which of the following lines of code fills in the above blank to successfully complete the task?

Options:
A.

org.apache.spark.sql.jdbc

B.

autoloader

C.

DELTA

D.

sqlite

E.

org.apache.spark.sql.sqlite

Questions 23

An engineering manager wants to monitor the performance of a recent project using a Databricks SQL query. For the first week following the project’s release, the manager wants the query results to be updated every minute. However, the manager is concerned that the compute resources used for the query will be left running and cost the organization a lot of money beyond the first week of the project’s release.

Which of the following approaches can the engineering team use to ensure the query does not cost the organization any money beyond the first week of the project’s release?

Options:
A.

They can set a limit to the number of DBUs that are consumed by the SQL Endpoint.

B.

They can set the query’s refresh schedule to end after a certain number of refreshes.

C.

They cannot ensure the query does not cost the organization money beyond the first week of the project’s release.

D.

They can set a limit to the number of individuals that are able to manage the query’s refresh schedule.

E.

They can set the query’s refresh schedule to end on a certain date in the query scheduler.

Questions 24

Which of the following data workloads will utilize a Gold table as its source?

Options:
A.

A job that enriches data by parsing its timestamps into a human-readable format

B.

A job that aggregates uncleaned data to create standard summary statistics

C.

A job that cleans data by removing malformatted records

D.

A job that queries aggregated data designed to feed into a dashboard

E.

A job that ingests raw data from a streaming source into the Lakehouse

Questions 25

Which tool is used by Auto Loader to process data incrementally?

Options:
A.

Spark Structured Streaming

B.

Unity Catalog

C.

Checkpointing

D.

Databricks SQL

Questions 26

A data engineer has a Python variable table_name that they would like to use in a SQL query. They want to construct a Python code block that will run the query using table_name.

They have the following incomplete code block:

____(f"SELECT customer_id, spend FROM {table_name}")

Which of the following can be used to fill in the blank to successfully complete the task?

Options:
A.

spark.delta.sql

B.

spark.delta.table

C.

spark.table

D.

dbutils.sql

E.

spark.sql

Questions 27

Which of the following benefits is provided by the array functions from Spark SQL?

Options:
A.

An ability to work with data in a variety of types at once

B.

An ability to work with data within certain partitions and windows

C.

An ability to work with time-related data in specified intervals

D.

An ability to work with complex, nested data ingested from JSON files

E.

An ability to work with an array of tables for procedural automation

Questions 28

Which of the following approaches should be used to send the Databricks Job owner an email in the case that the Job fails?

Options:
A.

Manually programming in an alert system in each cell of the Notebook

B.

Setting up an Alert in the Job page

C.

Setting up an Alert in the Notebook

D.

There is no way to notify the Job owner in the case of Job failure

E.

MLflow Model Registry Webhooks

Questions 29

A data engineer that is new to using Python needs to create a Python function to add two integers together and return the sum?

Which of the following code blocks can the data engineer use to complete this task?

A)

Databricks-Certified-Data-Engineer-Associate Question 29

B)

Databricks-Certified-Data-Engineer-Associate Question 29

C)

Databricks-Certified-Data-Engineer-Associate Question 29

D)

Databricks-Certified-Data-Engineer-Associate Question 29

E)

Databricks-Certified-Data-Engineer-Associate Question 29

Options:
A.

Option A

B.

Option B

C.

Option C

D.

Option D

E.

Option E

Questions 30

A Delta Live Table pipeline includes two datasets defined using STREAMING LIVE TABLE. Three datasets are defined against Delta Lake table sources using LIVE TABLE.

The table is configured to run in Development mode using the Continuous Pipeline Mode.

Assuming previously unprocessed data exists and all definitions are valid, what is the expected outcome after clicking Start to update the pipeline?

Options:
A.

All datasets will be updated once and the pipeline will shut down. The compute resources will be terminated.

B.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will persist until the pipeline is shut down.

C.

All datasets will be updated once and the pipeline will persist without any processing. The compute resources will persist but go unused.

D.

All datasets will be updated once and the pipeline will shut down. The compute resources will persist to allow for additional testing.

E.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will persist to allow for additional testing.