Pre-Summer Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 70track

Free Databricks Databricks-Certified-Professional-Data-Engineer Practice Exam with Questions & Answers | Set: 3

Questions 21

A table named user_ltv is being used to create a view that will be used by data analysis on various teams. Users in the workspace are configured into groups, which are used for setting up data access using ACLs.

The user_ltv table has the following schema:

Databricks-Certified-Professional-Data-Engineer Question 21

An analyze who is not a member of the auditing group executing the following query:

Databricks-Certified-Professional-Data-Engineer Question 21

Which result will be returned by this query?

Options:
A.

All columns will be displayed normally for those records that have an age greater than 18; records not meeting this condition will be omitted.

B.

All columns will be displayed normally for those records that have an age greater than 17; records not meeting this condition will be omitted.

C.

All age values less than 18 will be returned as null values all other columns will be returned with the values in user_ltv.

D.

All records from all columns will be displayed with the values in user_ltv.

Databricks Databricks-Certified-Professional-Data-Engineer Premium Access
Questions 22

A data engineer needs to install the PyYAML Python package within an air-gapped Databricks environment . The workspace has no direct internet access to PyPI. The engineer has downloaded the .whl file locally and wants it available automatically on all new clusters.

Which approach should the data engineer use?

Options:
A.

Upload the PyYAML .whl file to the user home directory and create a cluster-scoped init script to install it.

B.

Upload the PyYAML .whl file to a Unity Catalog Volume, ensure it’s allow-listed, and create a cluster-scoped init script that installs it from that path.

C.

Set up a private PyPI repository and install via pip index URL.

D.

Add the .whl file to Databricks Git Repos and assume automatic installation.

Questions 23

The business reporting team requires that data for their dashboards be updated every hour. The total processing time for the pipeline that extracts, transforms, and loads the data for their pipeline runs in 10 minutes. Assuming normal operating conditions, which configuration will meet their service-level agreement requirements with the lowest cost?

Options:
A.

Schedule a job to execute the pipeline once an hour on a dedicated interactive cluster.

B.

Schedule a job to execute the pipeline once an hour on a new job cluster.

C.

Schedule a Structured Streaming job with a trigger interval of 60 minutes.

D.

Configure a job that executes every time new data lands in a given directory.

Questions 24

A member of the data engineering team has submitted a short notebook that they wish to schedule as part of a larger data pipeline. Assume that the commands provided below produce the logically correct results when run as presented.

Databricks-Certified-Professional-Data-Engineer Question 24

Which command should be removed from the notebook before scheduling it as a job?

Options:
A.

Cmd 2

B.

Cmd 3

C.

Cmd 4

D.

Cmd 5

E.

Cmd 6

Questions 25

A data engineering team needs to implement a tagging system for their tables as part of an automated ETL process, and needs to apply tags programmatically to tables in Unity Catalog.

Which SQL command adds tags to a table programmatically?

Options:
A.

ALTER TABLE table_name SET TAGS ( ' key1 ' = ' value1 ' , ' key2 ' = ' value2 ' );

B.

APPLY TAGS ON table_name VALUES ( ' key1 ' = ' value1 ' , ' key2 ' = ' value2 ' );

C.

COMMENT ON TABLE table_name TAGS ( ' key1 ' = ' value1 ' , ' key2 ' = ' value2 ' );

D.

SET TAGS FOR table_name AS ( ' key1 ' = ' value1 ' , ' key2 ' = ' value2 ' );

Questions 26

To reduce storage and compute costs, the data engineering team has been tasked with curating a series of aggregate tables leveraged by business intelligence dashboards, customer-facing applications, production machine learning models, and ad hoc analytical queries.

The data engineering team has been made aware of new requirements from a customer-facing application, which is the only downstream workload they manage entirely. As a result, an aggregate table used by numerous teams across the organization will need to have a number of fields renamed, and additional fields will also be added.

Which of the solutions addresses the situation while minimally interrupting other teams in the organization without increasing the number of tables that need to be managed?

Options:
A.

Send all users notice that the schema for the table will be changing; include in the communication the logic necessary to revert the new table schema to match historic queries.

B.

Configure a new table with all the requisite fields and new names and use this as the source for the customer-facing application; create a view that maintains the original data schema and table name by aliasing select fields from the new table.

C.

Create a new table with the required schema and new fields and use Delta Lake ' s deep clone functionality to sync up changes committed to one table to the corresponding table.

D.

Replace the current table definition with a logical view defined with the query logic currently writing the aggregate table; create a new table to power the customer-facing application.

E.

Add a table comment warning all users that the table schema and field names will be changing on a given date; overwrite the table in place to the specifications of the customer-facing application.

Questions 27

What is true for Delta Lake?

Options:
A.

Views in the Lakehouse maintain a valid cache of the most recent versions of source tables at all times.

B.

Delta Lake automatically collects statistics on the first 32 columns of each table, which are leveraged in data skipping based on query filters.

C.

Z-ORDER can only be applied to numeric values stored in Delta Lake tables.

D.

Primary and foreign key constraints can be leveraged to ensure duplicate values are never entered into a dimension table.

Questions 28

Although the Databricks Utilities Secrets module provides tools to store sensitive credentials and avoid accidentally displaying them in plain text users should still be careful with which credentials are stored here and which users have access to using these secrets.

Which statement describes a limitation of Databricks Secrets?

Options:
A.

Because the SHA256 hash is used to obfuscate stored secrets, reversing this hash will display the value in plain text.

B.

Account administrators can see all secrets in plain text by logging on to the Databricks Accounts console.

C.

Secrets are stored in an administrators-only table within the Hive Metastore; database administrators have permission to query this table by default.

D.

Iterating through a stored secret and printing each character will display secret contents in plain text.

E.

The Databricks REST API can be used to list secrets in plain text if the personal access token has proper credentials.

Questions 29

The view updates represents an incremental batch of all newly ingested data to be inserted or updated in the customers table.

The following logic is used to process these records.

MERGE INTO customers

USING (

SELECT updates.customer_id as merge_ey, updates .*

FROM updates

UNION ALL

SELECT NULL as merge_key, updates .*

FROM updates JOIN customers

ON updates.customer_id = customers.customer_id

WHERE customers.current = true AND updates.address < > customers.address

) staged_updates

ON customers.customer_id = mergekey

WHEN MATCHED AND customers. current = true AND customers.address < > staged_updates.address THEN

UPDATE SET current = false, end_date = staged_updates.effective_date

WHEN NOT MATCHED THEN

INSERT (customer_id, address, current, effective_date, end_date)

VALUES (staged_updates.customer_id, staged_updates.address, true, staged_updates.effective_date, null)

Which statement describes this implementation?

Options:
A.

The customers table is implemented as a Type 2 table; old values are overwritten and new customers are appended.

B.

The customers table is implemented as a Type 1 table; old values are overwritten by new values and no history is maintained.

C.

The customers table is implemented as a Type 2 table; old values are maintained but marked as no longer current and new values are inserted.

D.

The customers table is implemented as a Type 0 table; all writes are append only with no changes to existing values.

Questions 30

The data engineer team is configuring environment for development testing, and production before beginning migration on a new data pipeline. The team requires extensive testing on both the code and data resulting from code execution, and the team want to develop and test against similar production data as possible.

A junior data engineer suggests that production data can be mounted to the development testing environments, allowing pre production code to execute against production data. Because all users have

Admin privileges in the development environment, the junior data engineer has offered to configure permissions and mount this data for the team.

Which statement captures best practices for this situation?

Options:
A.

Because access to production data will always be verified using passthrough credentials it is safe to mount data to any Databricks development environment.

B.

All developer, testing and production code and data should exist in a single unified workspace; creating separate environments for testing and development further reduces risks.

C.

In environments where interactive code will be executed, production data should only be accessible with read permissions; creating isolated databases for each environment further reduces risks.

D.

Because delta Lake versions all data and supports time travel, it is not possible for user error or malicious actors to permanently delete production data, as such it is generally safe to mount production data anywhere.