Free Databricks Databricks-Certified-Professional-Data-Engineer Practice Exam with Questions & Answers | Set: 2

Name: How to Pass Databricks-Certified-Professional-Data-Engineer Exams
Brand: Examstrack
SKU: databricks-certified-professional-data-engineer
Price: 31.5 USD
Availability: InStock

Questions 11

A data engineer is designing a system to process batch patient encounter data stored in an S3 bucket, creating a Delta table (patient_encounters) with columns encounter_id, patient_id, encounter_date, diagnosis_code, and treatment_cost. The table is queried frequently by patient_id and encounter_date, requiring fast performance. Fine-grained access controls must be enforced. The engineer wants to minimize maintenance and boost performance.

How should the data engineer create the patient_encounters table?

Options:

Create an external table in Unity Catalog, specifying an S3 location for the data files. Enable predictive optimization through table properties, and configure Unity Catalog permissions for access controls.

Create a managed table in Unity Catalog . Configure Unity Catalog permissions for access controls, and rely on predictive optimization to enhance query performance and simplify maintenance.

Create a managed table in Unity Catalog. Configure Unity Catalog permissions for access controls, schedule jobs to run OPTIMIZE and VACUUM commands daily to achieve best performance.

Create a managed table in Hive Metastore. Configure Hive Metastore permissions for access controls, and rely on predictive optimization to enhance query performance and simplify maintenance.

Databricks Databricks-Certified-Professional-Data-Engineer Premium Access

Adele Snow

24-Jun-2026

Databricks-Certified-Professional-Data-Engineer was seamless with Examstrack. Their study guide and dumps mirror the real test perfectly.

Houston Torres

07-Jun-2026

Aced Databricks-Certified-Professional-Data-Engineer exam with Examstrack's help. Best Databricks study material for guaranteed success. Highly recommend!

Violet Gardner

13-Jun-2026

Thanks to Examstrack, mastered Databricks-Certified-Professional-Data-Engineer. Their testing engine and questions answers are top-notch for success.

Mia

12-Jun-2026

Examstrack verified questions and answers were instrumental in helping me pass the challenging Databricks-Certified-Professional-Data-Engineer exam.

Questions 12

A nightly job ingests data into a Delta Lake table using the following code:

Databricks-Certified-Professional-Data-Engineer Question 12

The next step in the pipeline requires a function that returns an object that can be used to manipulate new records that have not yet been processed to the next table in the pipeline.

Which code snippet completes this function definition?

def new_records():

Options:

return spark.readStream.table( " bronze " )

return spark.readStream.load( " bronze " )

return spark.read.option( " readChangeFeed " , " true " ).table ( " bronze " )

Questions 13

An hourly batch job is configured to ingest data files from a cloud object storage container where each batch represent all records produced by the source system in a given hour. The batch job to process these records into the Lakehouse is sufficiently delayed to ensure no late-arriving data is missed. The user_id field represents a unique key for the data, which has the following schema:

user_id BIGINT, username STRING, user_utc STRING, user_region STRING, last_login BIGINT, auto_pay BOOLEAN, last_updated BIGINT

New records are all ingested into a table named account_history which maintains a full record of all data in the same schema as the source. The next table in the system is named account_current and is implemented as a Type 1 table representing the most recent value for each unique user_id .

Assuming there are millions of user accounts and tens of thousands of records processed hourly, which implementation can be used to efficiently update the described account_current table as part of each hourly batch job?

Options:

Use Auto Loader to subscribe to new files in the account history directory; configure a Structured Streaminq trigger once job to batch update newly detected files into the account current table.

Overwrite the account current table with each batch using the results of a query against the account history table grouping by user id and filtering for the max value of last updated.

Filter records in account history using the last updated field and the most recent hour processed, as well as the max last iogin by user id write a merge statement to update or insert the most recent value for each user id.

Use Delta Lake version history to get the difference between the latest version of account history and one version prior, then write these records to account current.

Filter records in account history using the last updated field and the most recent hour processed, making sure to deduplicate on username; write a merge statement to update or insert the

most recent value for each username.

Questions 14

A data team ' s Structured Streaming job is configured to calculate running aggregates for item sales to update a downstream marketing dashboard. The marketing team has introduced a new field to track the number of times this promotion code is used for each item. A junior data engineer suggests updating the existing query as follows: Note that proposed changes are in bold.

Databricks-Certified-Professional-Data-Engineer Question 14

Which step must also be completed to put the proposed query into production?

Options:

Increase the shuffle partitions to account for additional aggregates

Specify a new checkpointlocation

Run REFRESH TABLE delta, /item_agg '

Remove .option (mergeSchema ' , true ' ) from the streaming write

Questions 15

A DLT pipeline includes the following streaming tables:

Raw_lot ingest raw device measurement data from a heart rate tracking device.

Bgm_stats incrementally computes user statistics based on BPM measurements from raw_lot.

How can the data engineer configure this pipeline to be able to retain manually deleted or updated records in the raw_iot table while recomputing the downstream table when a pipeline update is run?

Options:

Set the skipChangeCommits flag to true on bpm_stats

Set the SkipChangeCommits flag to true raw_lot

Set the pipelines, reset, allowed property to false on bpm_stats

Set the pipelines, reset, allowed property to false on raw_iot

Questions 16

A data engineer manages a production Lakeflow Declarative Pipeline that processes customer transaction data. The pipeline includes several data quality expectations such as transaction_amount > 0 and customer_id IS NOT NULL. These expectations are defined using the EXPECT clause in SQL.

The engineer aims to monitor the pipeline’s data quality by analyzing the number of records that passed or failed each expectation during the latest pipeline update. The Lakeflow Declarative Pipelines event logs are stored in a Delta table named event_log_table.

For the most recent pipeline update, determine a programmatically appropriate approach to extract information like the name of each expectation, associated dataset, count of records that passed the expectation, and count of records that failed the expectation.

Which method retrieves the desired data quality metrics from the Lakeflow Declarative Pipelines event log?

Options:

Access the event_log_table, filter for events where event_type = ' flow_progress ' , and parse details.flow_progress.data_quality.expectations field to extract the required metrics.

Use the Lakeflow Declarative Pipelines UI to navigate to the specific pipeline, select the dataset, and view the Data Quality tab to manually retrieve the expectation metrics.

Query the event_log_table for events with event_type = ' data_quality ' and directly select the passed_records and failed_records fields.

Access the event_log_table, filter for events where event_type = ' expectation_result ' , and extract the expectation metrics from the details field.

Questions 17

A data engineer is designing a pipeline in Databricks that processes records from a Kafka stream where late-arriving data is common.

Which approach should the data engineer use?

Options:

Implement a custom solution using Databricks Jobs to periodically reprocess all historical data.

Use batch processing and overwrite the entire output table each time to ensure late data is incorporated correctly.

Use an Auto CDC pipeline with batch tables to simplify late data handling.

Use a watermark to specify the allowed lateness to accommodate records that arrive after their expected window, ensuring correct aggregation and state management.

Questions 18

A platform team is creating a standardized template for Databricks Asset Bundles to support CI/CD. The template must specify defaults for artifacts, workspace root paths, and a run identity, while allowing a “dev” target to be the default and override specific paths.

How should the team use databricks.yml to satisfy these requirements?

Options:

Use deployment, builds, context, identity, and environments; set dev as default environment and override paths under builds.

Use roots, modules, profiles, actor, and targets; where profiles contain workspace and artifacts defaults and actor sets run identity.

Use project, packages, environment, identity, and stages; set dev as default stage and override workspace under environment.

Use bundle, artifacts, workspace, run_as, and targets at the top level; set one target with default: true and override workspace paths or artifacts under that target.

Answer:

Explanation:

In Databricks Asset Bundles, the databricks.yml file defines all top-level configuration keys, including bundle, artifacts, workspace, run_as, and targets. The targets section defines specific deployment contexts (for example, dev, test, prod). Setting default: true for a target marks it as the default environment. Overrides for workspace paths and artifact configurations can be defined inside each target while keeping defaults at the top level.

Reference Source: Databricks Asset Bundle Configuration Guide – “Structure of databricks.yml and target overrides.”

====================

QUESTION NO: 31

A data engineer inherits a Delta table with historical partitions by country that are badly skewed. Queries often filter by high-cardinality customer_id and vary across dimensions over time. The engineer wants a strategy that avoids a disruptive full rewrite, reduces sensitivity to skewed partitions, and sustains strong query performance as access patterns evolve.

Which two actions should the data engineer take? (Choose 2)

A. Keep existing partitions and rely on bin-packing OPTIMIZE only; ZORDER and clustering are unnecessary for multi-dimensional filters.

B. Periodically run OPTIMIZE table_name.

C. Disable data skipping statistics to avoid maintenance overhead; rely on adaptive query execution instead.

D. Depend solely on optimized writes; Databricks will automatically replace partitioning with clustering over time.

E. Switch from static partitioning to liquid clustering and select initial clustering keys that reflect common filters such as customer_id.

Answer: B, E

Liquid Clustering replaces traditional partitioning and ZORDER optimization by automatically organizing data according to clustering keys. It supports evolving clustering strategies without requiring a full table rewrite. To maintain cluster balance and improve performance, the OPTIMIZE command should be run periodically. OPTIMIZE groups data files by clustering keys and helps reduce small file overhead.

Reference Source: Databricks Delta Lake Guide – “Use Liquid Clustering for Tables” and “OPTIMIZE Command for File Compaction and Data Layout.”

====================

QUESTION NO: 39

A data engineer needs to provide access to a group named manufacturing-team. The team needs privileges to create tables in the quality schema.

Which set of SQL commands will grant a group named manufacturing-team to create tables in a schema named production with the parent catalog named manufacturing with the least privileges?

A. GRANT CREATE TABLE ON SCHEMA manufacturing.quality TO manufacturing-team; GRANT CREATE SCHEMA ON SCHEMA manufacturing.quality TO manufacturing-team; GRANT CREATE CATALOG ON CATALOG manufacturing TO manufacturing-team;

B. GRANT CREATE TABLE ON SCHEMA manufacturing.quality TO manufacturing-team; GRANT CREATE SCHEMA ON SCHEMA manufacturing.quality TO manufacturing-team; GRANT USE CATALOG ON CATALOG manufacturing TO manufacturing-team;

C. GRANT CREATE TABLE ON SCHEMA manufacturing.quality TO manufacturing-team; GRANT USE SCHEMA ON SCHEMA manufacturing.quality TO manufacturing-team; GRANT USE CATALOG ON CATALOG manufacturing TO manufacturing-team;

D. GRANT USE TABLE ON SCHEMA manufacturing.quality TO manufacturing-team; GRANT USE SCHEMA ON SCHEMA manufacturing.quality TO manufacturing-team; GRANT USE CATALOG ON CATALOG manufacturing TO manufacturing-team;

Answer: C

To create a table within a schema, a principal must have CREATE TABLE on the schema, USE SCHEMA on that schema, and USE CATALOG on the parent catalog. This combination ensures the group has just enough privileges to create objects in that schema without excessive permissions like CREATE SCHEMA or CREATE CATALOG.

Reference Source: Databricks Unity Catalog Privilege Model – “Privileges Required to Create a Table.”

Questions 19

A data engineer is optimizing a managed Delta table that suffers from data skew and frequently changing query filter columns . The engineer wants to avoid costly data rewrites when query patterns evolve. The table size is under 1 TB.

How should the data engineer meet this requirement?

Options:

Apply Z-ordering , since it allows flexible reorganization of data layout without rewriting existing files and adapts easily to new filter columns.

Use Hive-style partitioning , as it provides efficient data skipping and is easy to change partition columns at any time.

Enable liquid clustering , as it efficiently handles data skew, allows clustering keys to be changed without rewriting existing data, and adapts to evolving query patterns.

Combine partitioning and Z-ordering to maximize flexibility and minimize maintenance as query patterns change.

Questions 20

A data engineer needs to implement column masking for a sensitive column in a Unity Catalog-managed table. The masking logic must dynamically check if users belong to specific groups defined in a separate table (group_access) that maps groups to allowed departments.

Which approach should the engineer use to efficiently enforce this requirement?

Options:

Create a UDF that hardcodes allowed groups and apply it as a column mask.

Create a view without selecting the sensitive column.

Apply a column mask that references the group_access mapping table in its UDF.

Use a row filter to restrict access based on the user’s group.

Exam Code: Databricks-Certified-Professional-Data-Engineer

Certification Provider: Databricks

Exam Name: Databricks Certified Data Engineer Professional Exam

Last Update: Jul 5, 2026

Questions: 202

How to Pass Databricks-Certified-Professional-Data-Engineer Exams

PDF + Testing Engine
~~$164.99~~ $49.5 Add to Cart

Testing Engine
~~$124.99~~ $37.5 Add to Cart

PDF (Q&A)
~~$104.99~~ $31.5 Add to Cart

Databricks Related Exams

How to pass Databricks Databricks-Certified-Professional-Data-Scientist - Databricks Certified Professional Data Scientist Exam Exam

How to pass Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 - Databricks Certified Associate Developer for Apache Spark 3.0 Exam Exam

How to pass Databricks Databricks-Certified-Data-Engineer-Associate - Databricks Certified Data Engineer Associate Exam Exam

How to pass Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.5 - Databricks Certified Associate Developer for Apache Spark 3.5 – Python Exam

Databricks-Certified-Data-Analyst-Associate - Databricks Certified Data Analyst Associate Exam

Databricks-Machine-Learning-Associate - Databricks Certified Machine Learning Associate Exam

Databricks-Generative-AI-Engineer-Associate - Databricks Certified Generative AI Engineer Associate

Databricks-Machine-Learning-Professional - Databricks Certified Machine Learning Professional

Get Databricks Full Access

Databricks Free Exams
Examstrack provides free Databricks exam prep materials and practice tests to support your Databricks certification goals.

Summer Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 70track

Navigation:

examstrack logo

Hot Vendors:

Free Databricks Databricks-Certified-Professional-Data-Engineer Practice Exam with Questions & Answers | Set: 2

How to Pass Databricks-Certified-Professional-Data-Engineer Exams

Databricks Related Exams

Databricks Free Exams