Pre-Summer Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 70track

Free Databricks Databricks-Certified-Data-Engineer-Associate Practice Exam with Questions & Answers

Questions 1

A data engineer is onboarding a new Bronze ingestion pipeline in Databricks with Unity Catalog. The team wants Databricks to handle storage layout, apply platform optimizations over time, and simplify lifecycle management so that when a table is dropped, its underlying data is also cleaned up according to Databricks-managed retention policies.

Which table type should the data engineer create for these ingestion tables?

Options:
A.

Managed tables so that Unity Catalog manages both metadata and underlying data lifecycle

B.

External tables with a LOCATION pointing to an external volume for full control of file layout

C.

Foreign tables federated from an external catalog to delegate optimization to the source system

D.

Temporary views over files to avoid table-level governance and lifecycle coupling

Databricks Databricks-Certified-Data-Engineer-Associate Premium Access
Questions 2

A data engineer is getting a partner organization up to speed with Databricks account. Both teams share some business use cases. The data engineer has to share some of your Unity-Catalog managed delta tables and the notebook jobs creating those tables with the partner organization.

How can the data engineer seamlessly share the required information?

Options:
A.

Zip all the code and share via email and allow data ingestion from your data lake

B.

Data and Notebooks can be shared simply using Unity Catalog.

C.

Share access to codebase via Github and allow them to ingest datasets from your Datalake.

D.

Share required datasets and notebooks via Delta Sharing. Manage permissions via Unity Catalog.

Questions 3

A data engineer wants to schedule their Databricks SQL dashboard to refresh once per day, but they only want the associated SQL endpoint to be running when it is necessary.

Which of the following approaches can the data engineer use to minimize the total running time of the SQL endpoint used in the refresh schedule of their dashboard?

Options:
A.

They can ensure the dashboard’s SQL endpoint matches each of the queries’ SQL endpoints.

B.

They can set up the dashboard’s SQL endpoint to be serverless.

C.

They can turn on the Auto Stop feature for the SQL endpoint.

D.

They can reduce the cluster size of the SQL endpoint.

E.

They can ensure the dashboard’s SQL endpoint is not one of the included query’s SQL endpoint.

Questions 4

A data engineer needs to ingest from both streaming and batch sources for a firm that relies on highly accurate data. Occasionally, some of the data picked up by the sensors that provide a streaming input are outside the expected parameters. If this occurs, the data must be dropped, but the stream should not fail.

Which feature of Delta Live Tables meets this requirement?

Options:
A.

Monitoring

B.

Change Data Capture

C.

Expectations

D.

Error Handling

Questions 5

What is the maximum output supported by a job cluster to ensure a notebook does not fail?

Options:
A.

10MBS

B.

25MBS

C.

30MBS

D.

15MBS

Questions 6

A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.

The cade block used by the data engineer is below:

Databricks-Certified-Data-Engineer-Associate Question 6

If the data engineer only wants the query to execute a micro-batch to process data every 5 seconds, which of the following lines of code should the data engineer use to fill in the blank?

Options:
A.

trigger( " 5 seconds " )

B.

trigger()

C.

trigger(once= " 5 seconds " )

D.

trigger(processingTime= " 5 seconds " )

E.

trigger(continuous= " 5 seconds " )

Questions 7

Identify how the count_if function and the count where x is null can be used

Consider a table random_values with below data.

What would be the output of below query?

select count_if(col > 1) as count_a. count(*) as count_b.count(col1) as count_c from random_values col1

0

1

2

NULL -

2

3

Options:
A.

3 6 5

B.

4 6 5

C.

3 6 6

D.

4 6 6

Questions 8

A data engineer needs to use a Delta table as part of a data pipeline, but they do not know if they have the appropriate permissions.

In which of the following locations can the data engineer review their permissions on the table?

Options:
A.

Databricks Filesystem

B.

Jobs

C.

Dashboards

D.

Repos

E.

Data Explorer

Questions 9

A data engineer is setting up a new Databricks pipeline that ingests clickstream events from Kafka and daily product catalogs from cloud object storage. To ensure auditability and easy reprocessing, the engineer wants to land all source data first. Later stages will handle cleaning, deduplication, and business modeling before the data is used in dashboards.

Which approach aligns with Medallion Architecture principles?

Options:
A.

Land both sources in Gold with denormalized star schemas to optimize BI while retaining full source fidelity

B.

Land streaming events from Kafka in Silver and the product catalog directly in Gold to minimize layers for batch data

C.

Land both sources in the Bronze layer append-only with minimal validation, then build Silver/Gold downstream for quality and analytics

D.

Land both sources directly into the Silver layer with schema enforcement and deduplication to reduce downstream complexity

Questions 10

A data engineer has been given a new record of data:

id STRING = ' a1 '

rank INTEGER = 6

rating FLOAT = 9.4

Which of the following SQL commands can be used to append the new record to an existing Delta table my_table?

Options:
A.

INSERT INTO my_table VALUES ( ' a1 ' , 6, 9.4)

B.

my_table UNION VALUES ( ' a1 ' , 6, 9.4)

C.

INSERT VALUES ( ' a1 ' , 6, 9.4) INTO my_table

D.

UPDATE my_table VALUES ( ' a1 ' , 6, 9.4)

E.

UPDATE VALUES ( ' a1 ' , 6, 9.4) my_table