Big 11.11 Sale 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: sale65best

Free Databricks Databricks-Certified-Data-Engineer-Associate Practice Exam with Questions & Answers | Set: 4

Questions 31

Which of the following benefits is provided by the array functions from Spark SQL?

Options:
A.

An ability to work with data in a variety of types at once

B.

An ability to work with data within certain partitions and windows

C.

An ability to work with time-related data in specified intervals

D.

An ability to work with complex, nested data ingested from JSON files

E.

An ability to work with an array of tables for procedural automation

Databricks Databricks-Certified-Data-Engineer-Associate Premium Access
Questions 32

A data engineer is designing an ETL pipeline to process both streaming and batch data from multiple sources The pipeline must ensure data quality, handle schema evolution, and provide easy maintenance. The team is considering using Delta Live Tables (DLT) in Databricks to achieve these goals. They want to understand the key features and benefits of DLT that make it suitable for this use case.

Why is Delta Live Tables (DLT) an appropriate choice?

Options:
A.

Automatic data quality checks, built-in support for schema evolution, and declarative pipeline development

B.

Manual schema enforcement, high operational overhead, and limited scalability

C.

Requires custom code for data quality checks, no support for streaming data, and complex pipeline maintenance

D.

Supports only batch processing, no data versioning, and high infrastructure costs

Questions 33

Which of the following commands will return the number of null values in the member_id column?

Options:
A.

SELECT count(member_id) FROM my_table;

B.

SELECT count(member_id) - count_null(member_id) FROM my_table;

C.

SELECT count_if(member_id IS NULL) FROM my_table;

D.

SELECT null(member_id) FROM my_table;

E.

SELECT count_null(member_id) FROM my_table;

Questions 34

A Delta Live Table pipeline includes two datasets defined using STREAMING LIVE TABLE. Three datasets are defined against Delta Lake table sources using LIVE TABLE.

The table is configured to run in Development mode using the Continuous Pipeline Mode.

Assuming previously unprocessed data exists and all definitions are valid, what is the expected outcome after clicking Start to update the pipeline?

Options:
A.

All datasets will be updated once and the pipeline will shut down. The compute resources will be terminated.

B.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will persist until the pipeline is shut down.

C.

All datasets will be updated once and the pipeline will persist without any processing. The compute resources will persist but go unused.

D.

All datasets will be updated once and the pipeline will shut down. The compute resources will persist to allow for additional testing.

E.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will persist to allow for additional testing.

Questions 35

A data engineer has three tables in a Delta Live Tables (DLT) pipeline. They have configured the pipeline to drop invalid records at each table. They notice that some data is being dropped due to quality concerns at some point in the DLT pipeline. They would like to determine at which table in their pipeline the data is being dropped.

Which of the following approaches can the data engineer take to identify the table that is dropping the records?

Options:
A.

They can set up separate expectations for each table when developing their DLT pipeline.

B.

They cannot determine which table is dropping the records.

C.

They can set up DLT to notify them via email when records are dropped.

D.

They can navigate to the DLT pipeline page, click on each table, and view the data quality statistics.

E.

They can navigate to the DLT pipeline page, click on the “Error” button, and review the present errors.

Questions 36

Which of the following must be specified when creating a new Delta Live Tables pipeline?

Options:
A.

A key-value pair configuration

B.

The preferred DBU/hour cost

C.

A path to cloud storage location for the written data

D.

A location of a target database for the written data

E.

At least one notebook library to be executed

Questions 37

A data engineer needs to apply custom logic to identify employees with more than 5 years of experience in array column employees in table stores. The custom logic should create a new column exp_employees that is an array of all of the employees with more than 5 years of experience for each row. In order to apply this custom logic at scale, the data engineer wants to use the FILTER higher-order function.

Which of the following code blocks successfully completes this task?

Databricks-Certified-Data-Engineer-Associate Question 37

Options:
A.

Option A

B.

Option B

C.

Option C

D.

Option D

E.

Option E

Questions 38

A data engineer needs to provide access to a group named manufacturing-team. The team needs privileges to create tables in the quality schema.

Which set of SQL commands will grant a group named manufacturing-team to create tables in a schema named production with the parent catalog named manufacturing with the least privileges?

A)

Databricks-Certified-Data-Engineer-Associate Question 38

B)

Databricks-Certified-Data-Engineer-Associate Question 38

C)

Databricks-Certified-Data-Engineer-Associate Question 38

D)

Databricks-Certified-Data-Engineer-Associate Question 38

Options:
A.

Option A

B.

Option B

C.

Option C

D.

Option D

Questions 39

A data organization leader is upset about the data analysis team’s reports being different from the data engineering team’s reports. The leader believes the siloed nature of their organization’s data engineering and data analysis architectures is to blame.

Which of the following describes how a data lakehouse could alleviate this issue?

Options:
A.

Both teams would autoscale their work as data size evolves

B.

Both teams would use the same source of truth for their work

C.

Both teams would reorganize to report to the same department

D.

Both teams would be able to collaborate on projects in real-time

E.

Both teams would respond more quickly to ad-hoc requests

Questions 40

A data engineer has realized that they made a mistake when making a daily update to a table. They need to use Delta time travel to restore the table to a version that is 3 days old. However, when the data engineer attempts to time travel to the older version, they are unable to restore the data because the data files have been deleted.

Which of the following explains why the data files are no longer present?

Options:
A.

The VACUUM command was run on the table

B.

The TIME TRAVEL command was run on the table

C.

The DELETE HISTORY command was run on the table

D.

The OPTIMIZE command was nun on the table

E.

The HISTORY command was run on the table