Pre-Summer Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 70track

Free Databricks Databricks-Certified-Data-Engineer-Associate Practice Exam with Questions & Answers | Set: 4

Questions 31

A data engineer has written a function in a Databricks Notebook to calculate the population of bacteria in a given medium.

Databricks-Certified-Data-Engineer-Associate Question 31

Analysts use this function in the notebook and sometimes provide input arguments of the wrong data type, which can cause errors during execution.

Which Databricks feature will help the data engineer quickly identify if an incorrect data type has been provided as input?

Options:
A.

The Data Engineer should add print statements to find out what the variable is.

B.

The Databricks debugger enables breakpoints that will raise an error if the wrong data type is submitted

C.

The Spark User interface has a debug tab that contains the variables that are used in this session.

D.

The Databricks debugger enables the use of a variable explorer to see at a glance the value of the variables.

Questions 32

Which of the following is stored in the Databricks customer ' s cloud account?

Options:
A.

Databricks web application

B.

Cluster management metadata

C.

Repos

D.

Data

E.

Notebooks

Questions 33

A data engineer is working with two tables. Each of these tables is displayed below in its entirety.

Databricks-Certified-Data-Engineer-Associate Question 33

The data engineer runs the following query to join these tables together:

Databricks-Certified-Data-Engineer-Associate Question 33

Which of the following will be returned by the above query?

Databricks-Certified-Data-Engineer-Associate Question 33

Options:
A.

Option A

B.

Option B

C.

Option C

D.

Option D

E.

Option E

Questions 34

A data engineer is decommissioning a sandbox schema in Unity Catalog. Some tables are ephemeral staging outputs that can be safely removed entirely, but a few tables point at shared cloud storage used by downstream jobs outside Databricks. The engineer must avoid deleting any shared files when cleaning up catalog objects.

How does Unity Catalog behave when dropping Managed vs External tables?

Options:
A.

Drop all tables; Databricks will only remove metadata for both managed and external tables

B.

Drop managed tables that are ephemeral and drop external tables; files for both remain for 7 days

C.

Drop managed staging tables to remove data and metadata, and drop external tables to remove only metadata

D.

Drop external tables first to delete their files, then drop managed tables to keep their files for recovery

Questions 35

A data engineer is migrating pipeline tasks to reduce operational toil. The workspace uses Unity Catalog and is in a region that supports serverless. The engineer wants Databricks to auto-select instance types, manage scaling, apply Photon, and handle runtime upgrades automatically for job runs.

How should the data engineer meet this requirement while adhering to Databricks constraints?

Options:
A.

Use a Pro SQL warehouse and schedule Python notebook tasks to execute as pipeline steps.

B.

Use an all-purpose cluster with cluster policies to enforce standard sizes and enable autoscaling.

C.

Create a job with a single-task job cluster and manually set the instance families and minimum/maximum workers.

D.

Run the job on a serverless compute for workflows configuration, ensuring Unity Catalog is enabled and regional support is available.

Questions 36

A data engineer and data analyst are working together on a data pipeline. The data engineer is working on the raw, bronze, and silver layers of the pipeline using Python, and the data analyst is working on the gold layer of the pipeline using SQL The raw source of the pipeline is a streaming input. They now want to migrate their pipeline to use Delta Live Tables.

Which change will need to be made to the pipeline when migrating to Delta Live Tables?

Options:
A.

The pipeline can have different notebook sources in SQL & Python.

B.

The pipeline will need to be written entirely in SQL.

C.

The pipeline will need to be written entirely in Python.

D.

The pipeline will need to use a batch source in place of a streaming source.

Questions 37

A Delta Live Table pipeline includes two datasets defined using STREAMING LIVE TABLE. Three datasets are defined against Delta Lake table sources using LIVE TABLE.

The table is configured to run in Development mode using the Continuous Pipeline Mode.

Assuming previously unprocessed data exists and all definitions are valid, what is the expected outcome after clicking Start to update the pipeline?

Options:
A.

All datasets will be updated once and the pipeline will shut down. The compute resources will be terminated.

B.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will persist until the pipeline is shut down.

C.

All datasets will be updated once and the pipeline will persist without any processing. The compute resources will persist but go unused.

D.

All datasets will be updated once and the pipeline will shut down. The compute resources will persist to allow for additional testing.

E.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will persist to allow for additional testing.

Questions 38

Which of the following describes when to use the CREATE STREAMING LIVE TABLE (formerly CREATE INCREMENTAL LIVE TABLE) syntax over the CREATE LIVE TABLE syntax when creating Delta Live Tables (DLT) tables using SQL?

Options:
A.

CREATE STREAMING LIVE TABLE should be used when the subsequent step in the DLT pipeline is static.

B.

CREATE STREAMING LIVE TABLE should be used when data needs to be processed incrementally.

C.

CREATE STREAMING LIVE TABLE is redundant for DLT and it does not need to be used.

D.

CREATE STREAMING LIVE TABLE should be used when data needs to be processed through complicated aggregations.

E.

CREATE STREAMING LIVE TABLE should be used when the previous step in the DLT pipeline is static.

Questions 39

A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.

Databricks-Certified-Data-Engineer-Associate Question 39

The code block used by the data engineer is below:

Which line of code should the data engineer use to fill in the blank if the data engineer only wants the query to execute a micro-batch to process data every 5 seconds?

Options:
A.

trigger( " 5 seconds " )

B.

trigger(continuous= " 5 seconds " )

C.

trigger(once= " 5 seconds " )

D.

trigger(processingTime= " 5 seconds " )

Questions 40

A Databricks workflow fails at the last stage due to an error in a notebook. This workflow runs daily. The data engineer fixes the mistake and wants to rerun the pipeline. This workflow is very costly and time-intensive to run.

Which action should the data engineer do in order to minimise downtime and cost?

Options:
A.

Switch to another cluster

B.

Repair run

C.

Re-run the entire workflow

D.

Restart the cluster