Pre-Summer Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 70track

Free Databricks Databricks-Certified-Professional-Data-Engineer Practice Exam with Questions & Answers | Set: 4

Questions 31

A data engineer has created a new cluster using shared access mode with default configurations. The data engineer needs to allow the development team access to view the driver logs if needed.

What are the minimal cluster permissions that allow the development team to accomplish this?

Options:
A.

CAN ATTACH TO

B.

CAN MANAGE

C.

CAN VIEW

D.

CAN RESTART

Databricks Databricks-Certified-Professional-Data-Engineer Premium Access
Questions 32

The data governance team is reviewing user for deleting records for compliance with GDPR. The following logic has been implemented to propagate deleted requests from the user_lookup table to the user aggregate table.

Databricks-Certified-Professional-Data-Engineer Question 32

Assuming that user_id is a unique identifying key and that all users have requested deletion have been removed from the user_lookup table, which statement describes whether successfully executing the above logic guarantees that the records to be deleted from the user_aggregates table are no longer accessible and why?

Options:
A.

No: files containing deleted records may still be accessible with time travel until a BACUM command is used to remove invalidated data files.

B.

Yes: Delta Lake ACID guarantees provide assurance that the DELETE command successed fully and permanently purged these records.

C.

No: the change data feed only tracks inserts and updates not deleted records.

D.

No: the Delta Lake DELETE command only provides ACID guarantees when combined with the MERGE INTO command

Questions 33

A Structured Streaming job deployed to production has been experiencing delays during peak hours of the day. At present, during normal execution, each microbatch of data is processed in less than 3 seconds. During peak hours of the day, execution time for each microbatch becomes very inconsistent, sometimes exceeding 30 seconds. The streaming write is currently configured with a trigger interval of 10 seconds.

Holding all other variables constant and assuming records need to be processed in less than 10 seconds, which adjustment will meet the requirement?

Options:
A.

Decrease the trigger interval to 5 seconds; triggering batches more frequently allows idle executors to begin processing the next batch while longer running tasks from previous batches finish.

B.

Increase the trigger interval to 30 seconds; setting the trigger interval near the maximum execution time observed for each batch is always best practice to ensure no records are dropped.

C.

The trigger interval cannot be modified without modifying the checkpoint directory; to maintain the current stream state, increase the number of shuffle partitions to maximize parallelism.

D.

Use the trigger once option and configure a Databricks job to execute the query every 10 seconds; this ensures all backlogged records are processed with each batch.

E.

Decrease the trigger interval to 5 seconds; triggering batches more frequently may prevent records from backing up and large batches from causing spill.

Questions 34

A data engineer is creating a daily reporting job. There are two reporting notebooks—one for weekdays and one for weekends. An “if/else condition” task is configured as {{job.start_time.is_weekday}} == true to route the job to either the weekday or weekend notebook tasks. The same job would be used across multiple time zones.

Which action should a senior data engineer take upon reviewing the job to merge or reject the pull request?

Options:
A.

Reject, as the {{job.start_time.is_weekday}} is for the UTC timezone .

B.

Reject, as the {{job.start_time.is_weekday}} is not a valid value reference.

C.

Merge, as the job configuration looks good.

D.

Reject, as they should use {{job.trigger_time.is_weekday}} instead.

Questions 35

A data engineer, while designing a Pandas UDF to process financial time-series data with complex calculations that require maintaining state across rows within each stock symbol group, must ensure the function is efficient and scalable. Which approach will solve the problem with minimum overhead while preserving data integrity?

Options:
A.

Use a scalar_iter Pandas UDF with iterator-based processing, implementing state management through persistent storage (Delta tables) that gets updated after each batch to maintain continuity across iterator chunks.

B.

Use a scalar Pandas UDF that processes the entire dataset at once, implementing custom partitioning logic within the UDF to group by stock symbol and maintain state using global variables shared across all executor processes.

C.

Use applyInPandas on a Spark DataFrame so that each stock symbol group is received as a pandas DataFrame, allowing processing within each group while maintaining state variables local to each group’s processing function.

D.

Use a grouped-aggregate Pandas UDF that processes each stock symbol group independently, maintaining state through intermediate aggregation results that get passed between successive UDF calls via broadcast variables.

Questions 36

A production cluster has 3 executor nodes and uses the same virtual machine type for the driver and executor.

When evaluating the Ganglia Metrics for this cluster, which indicator would signal a bottleneck caused by code executing on the driver?

Options:
A.

The five Minute Load Average remains consistent/flat

B.

Bytes Received never exceeds 80 million bytes per second

C.

Total Disk Space remains constant

D.

Network I/O never spikes

E.

Overall cluster CPU utilization is around 25%

Questions 37

A data team is implementing an append-only Delta Lake pipeline that processes both batch and streaming data . They want to ensure that schema changes in the source data are automatically incorporated without breaking the pipeline.

Which configuration should the team use when writing data to the Delta table?

Options:
A.

ignoreChanges = false

B.

mergeSchema = true

C.

overwriteSchema = true

D.

validateSchema = false

Questions 38

A data engineer is performing a join operating to combine values from a static userlookup table with a streaming DataFrame streamingDF.

Which code block attempts to perform an invalid stream-static join?

Options:
A.

userLookup.join(streamingDF, [ " userid " ], how= " inner " )

B.

streamingDF.join(userLookup, [ " user_id " ], how= " outer " )

C.

streamingDF.join(userLookup, [ " user_id”], how= " left " )

D.

streamingDF.join(userLookup, [ " userid " ], how= " inner " )

E.

userLookup.join(streamingDF, [ " user_id " ], how= " right " )

Questions 39

Which is a key benefit of an end-to-end test?

Options:
A.

It closely simulates real world usage of your application.

B.

It pinpoint errors in the building blocks of your application.

C.

It provides testing coverage for all code paths and branches.

D.

It makes it easier to automate your test suite

Questions 40

The security team is exploring whether or not the Databricks secrets module can be leveraged for connecting to an external database.

After testing the code with all Python variables being defined with strings, they upload the password to the secrets module and configure the correct permissions for the currently active user. They then modify their code to the following (leaving all other variables unchanged).

Databricks-Certified-Professional-Data-Engineer Question 40

Which statement describes what will happen when the above code is executed?

Options:
A.

The connection to the external table will fail; the string " redacted " will be printed.

B.

An interactive input box will appear in the notebook; if the right password is provided, the connection will succeed and the encoded password will be saved to DBFS.

C.

An interactive input box will appear in the notebook; if the right password is provided, the connection will succeed and the password will be printed in plain text.

D.

The connection to the external table will succeed; the string value of password will be printed in plain text.

E.

The connection to the external table will succeed; the string " redacted " will be printed.