Big 11.11 Sale 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: sale65best

Free Databricks Databricks-Certified-Data-Engineer-Associate Practice Exam with Questions & Answers | Set: 3

Questions 21

A data engineer is developing an ETL process based on Spark SQL. The execution fails. The data engineer checks the Spark Ul and can see the ERRORS as follows:

Databricks-Certified-Data-Engineer-Associate Question 21

Which two corrective actions should the data engineer perform to resolve this issue?

Choose 2 answers - (Q) Narrow the filters in order to collect less data in the query

Options:
A.

Upsize the worker nodes and activate autoshuffle partitions

B.

Upsize the driver node and deactivate autoshuffle partitions

C.

Cache the dataset in order to boost the query performance

D.

Fix the shuffle partitions to 50 to ensure the allocation

Questions 22

Which file format is used for storing Delta Lake Table?

Options:
A.

Parquet

B.

Delta

C.

SV

D.

JSON

Questions 23

A data engineer and data analyst are working together on a data pipeline. The data engineer is working on the raw, bronze, and silver layers of the pipeline using Python, and the data analyst is working on the gold layer of the pipeline using SQL. The raw source of the pipeline is a streaming input. They now want to migrate their pipeline to use Delta Live Tables.

Which of the following changes will need to be made to the pipeline when migrating to Delta Live Tables?

Options:
A.

None of these changes will need to be made

B.

The pipeline will need to stop using the medallion-based multi-hop architecture

C.

The pipeline will need to be written entirely in SQL

D.

The pipeline will need to use a batch source in place of a streaming source

E.

The pipeline will need to be written entirely in Python

Questions 24

A dataset has been defined using Delta Live Tables and includes an expectations clause:

CONSTRAINT valid_timestamp EXPECT (timestamp > '2020-01-01') ON VIOLATION DROP ROW

What is the expected behavior when a batch of data containing data that violates these constraints is processed?

Options:
A.

Records that violate the expectation are dropped from the target dataset and loaded into a quarantine table.

B.

Records that violate the expectation are added to the target dataset and flagged as invalid in a field added to the target dataset.

C.

Records that violate the expectation are dropped from the target dataset and recorded as invalid in the event log.

D.

Records that violate the expectation are added to the target dataset and recorded as invalid in the event log.

E.

Records that violate the expectation cause the job to fail.

Questions 25

Which of the following describes a scenario in which a data team will want to utilize cluster pools?

Options:
A.

An automated report needs to be refreshed as quickly as possible.

B.

An automated report needs to be made reproducible.

C.

An automated report needs to be tested to identify errors.

D.

An automated report needs to be version-controlled across multiple collaborators.

E.

An automated report needs to be runnable by all stakeholders.

Questions 26

A data engineer has joined an existing project and they see the following query in the project repository:

CREATE STREAMING LIVE TABLE loyal_customers AS

SELECT customer_id -

FROM STREAM(LIVE.customers)

WHERE loyalty_level = 'high';

Which of the following describes why the STREAM function is included in the query?

Options:
A.

The STREAM function is not needed and will cause an error.

B.

The table being created is a live table.

C.

The customers table is a streaming live table.

D.

The customers table is a reference to a Structured Streaming query on a PySpark DataFrame.

E.

The data in the customers table has been updated since its last run.

Questions 27

What Databricks feature can be used to check the data sources and tables used in a workspace?

Options:
A.

Do not use the lineage feature as it only tracks activity from the last 3 months and will not provide full details on dependencies.

B.

Use the lineage feature to visualize a graph that highlights where the table is used only in notebooks,

C.

Use the lineage feature to visualize a graph that highlights where the table is used only in reports.

D.

Use the lineage feature to visualize a graph that shows all dependencies, including where the table is used in notebooks, other tables, and reports.

Questions 28

A single Job runs two notebooks as two separate tasks. A data engineer has noticed that one of the notebooks is running slowly in the Job’s current run. The data engineer asks a tech lead for help in identifying why this might be the case.

Which of the following approaches can the tech lead use to identify why the notebook is running slowly as part of the Job?

Options:
A.

They can navigate to the Runs tab in the Jobs UI to immediately review the processing notebook.

B.

They can navigate to the Tasks tab in the Jobs UI and click on the active run to review the processing notebook.

C.

They can navigate to the Runs tab in the Jobs UI and click on the active run to review the processing notebook.

D.

There is no way to determine why a Job task is running slowly.

E.

They can navigate to the Tasks tab in the Jobs UI to immediately review the processing notebook.

Questions 29

A data engineer has been provided a PySpark DataFrame named df with columns product and revenue. The data engineer needs to compute complex aggregations to determine each product's total revenue, average revenue, and transaction count.

Which code snippet should the data engineer use?

A)

Databricks-Certified-Data-Engineer-Associate Question 29

B)

Databricks-Certified-Data-Engineer-Associate Question 29

C)

Databricks-Certified-Data-Engineer-Associate Question 29

D)

Databricks-Certified-Data-Engineer-Associate Question 29

Options:
A.

Option A

B.

Option B

C.

Option C

D.

Option D

Questions 30

A data engineer is maintaining an ETL pipeline code with a GitHub repository linked to their Databricks account. The data engineer wants to deploy the ETL pipeline to production as a databricks workflow.

Which approach should the data engineer use?

Options:
A.

Databricks Asset Bundles (DAB) + GitHub Integration

B.

Maintain workflow_config.j son and deploy it using Databricks CLI

C.

Manually create and manage the workflow in Ul

D.

Maintain workflow_conf ig. json and deploy it using Terraform