Pre-Winter Sale 60% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: bestdeal

Free Databricks Databricks-Certified-Data-Engineer-Associate Practice Exam with Questions & Answers

Questions 1

Which of the following tools is used by Auto Loader process data incrementally?

Options:
A.

Checkpointing

B.

Spark Structured Streaming

C.

Data Explorer

D.

Unity Catalog

E.

Databricks SQL

Databricks Databricks-Certified-Data-Engineer-Associate Premium Access
Questions 2

A data engineer is designing a data pipeline. The source system generates files in a shared directory that is also used by other processes. As a result, the files should be kept as is and will accumulate in the directory. The data engineer needs to identify which files are new since the previous run in the pipeline, and set up the pipeline to only ingest those new files with each run.

Which of the following tools can the data engineer use to solve this problem?

Options:
A.

Unity Catalog

B.

Delta Lake

C.

Databricks SQL

D.

Data Explorer

E.

Auto Loader

Questions 3

Which of the following describes the type of workloads that are always compatible with Auto Loader?

Options:
A.

Dashboard workloads

B.

Streaming workloads

C.

Machine learning workloads

D.

Serverless workloads

E.

Batch workloads

Questions 4

In which of the following file formats is data from Delta Lake tables primarily stored?

Options:
A.

Delta

B.

CSV

C.

Parquet

D.

JSON

E.

A proprietary, optimized format specific to Databricks

Questions 5

A data engineer needs to process SQL queries on a large dataset with fluctuating workloads. The workload requires automatic scaling based on the volume of queries, without the need to manage or provision infrastructure. The solution should be cost-efficient and charge only for the compute resources used during query execution.

Which compute option should the data engineer use?

Options:
A.

Databricks SQL Analytics

B.

Databricks Jobs

C.

Databricks Runtime for ML

D.

Serverless SQL Warehouse

Questions 6

Which of the following is hosted completely in the control plane of the classic Databricks architecture?

Options:
A.

Worker node

B.

JDBC data source

C.

Databricks web application

D.

Databricks Filesystem

E.

Driver node

Questions 7

A data engineering team is using Kafka to capture event data and then ingest it into Databricks. The team wants to be able to see these historical events. Medallion architecture is already in place. The team wants to be mindful of costs.

Where should this historical event data be stored?

Options:
A.

Gold

B.

Silver

C.

Bronze

D.

Raw layer

Questions 8

A data analyst has created a Delta table sales that is used by the entire data analysis team. They want help from the data engineering team to implement a series of tests to ensure the data is clean. However, the data engineering team uses Python for its tests rather than SQL.

Which of the following commands could the data engineering team use to access sales in PySpark?

Options:
A.

SELECT * FROM sales

B.

There is no way to share data between PySpark and SQL.

C.

spark.sql("sales")

D.

spark.delta.table("sales")

E.

spark.table("sales")

Questions 9

Which of the following describes the relationship between Gold tables and Silver tables?

Options:
A.

Gold tables are more likely to contain aggregations than Silver tables.

B.

Gold tables are more likely to contain valuable data than Silver tables.

C.

Gold tables are more likely to contain a less refined view of data than Silver tables.

D.

Gold tables are more likely to contain more data than Silver tables.

E.

Gold tables are more likely to contain truthful data than Silver tables.

Questions 10

An organization needs to share a dataset stored in its Databricks Unity Catalog with an external partner who uses a different data platform that is not Databricks. The goal is to maintain data security and ensure the partner can access the data efficiently.

Which method should the data engineer use to securely share the dataset with the external partner?

Options:
A.

Using Delta Sharing with the open sharing protocol

B.

Exporting data as CSV files and emailing them

C.

Using a third-party API to access the Delta table

D.

Databricks-to-Databricks Sharing