Pre-Summer Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 70track

Free Databricks Databricks-Certified-Data-Engineer-Associate Practice Exam with Questions & Answers | Set: 3

Questions 21

A data engineer has developed a data pipeline to ingest data from a JSON source using Auto Loader, but the engineer has not provided any type inference or schema hints in their pipeline. Upon reviewing the data, the data engineer has noticed that all of the columns in the target table are of the string type despite some of the fields only including float or boolean values.

Which of the following describes why Auto Loader inferred all of the columns to be of the string type?

Options:
A.

There was a type mismatch between the specific schema and the inferred schema

B.

JSON data is a text-based format

C.

Auto Loader only works with string data

D.

All of the fields had at least one null value

E.

Auto Loader cannot infer the schema of ingested data

Databricks Databricks-Certified-Data-Engineer-Associate Premium Access
Questions 22

A data engineer needs to use a Delta table as part of a data pipeline, but they do not know if they have the appropriate permissions.

In which location can the data engineer review their permissions on the table?

Options:
A.

Jobs

B.

Dashboards

C.

Catalog Explorer

D.

Repos

Questions 23

A dataset has been defined using Delta Live Tables and includes an expectations clause:

CONSTRAINT valid_timestamp EXPECT (timestamp > ' 2020-01-01 ' ) ON VIOLATION FAIL UPDATE

What is the expected behavior when a batch of data containing data that violates these constraints is processed?

Options:
A.

Records that violate the expectation cause the job to fail.

B.

Records that violate the expectation are added to the target dataset and flagged as invalid in a field added to the target dataset.

C.

Records that violate the expectation are dropped from the target dataset and recorded as invalid in the event log.

D.

Records that violate the expectation are added to the target dataset and recorded as invalid in the event log.

Questions 24

A data engineer wants to create a relational object by pulling data from two tables. The relational object does not need to be used by other data engineers in other sessions. In order to save on storage costs, the data engineer wants to avoid copying and storing physical data.

Which of the following relational objects should the data engineer create?

Options:
A.

Spark SQL Table

B.

View

C.

Database

D.

Temporary view

E.

Delta Table

Questions 25

A data engineer needs to create a table in Databricks using data from their organization ' s existing SQLite database. They run the following command:

CREATE TABLE jdbc_customer360

USING

OPTIONS (

url " jdbc:sqlite:/customers.db " , dbtable " customer360 "

)

Which line of code fills in the above blank to successfully complete the task?

Options:
A.

autoloader

B.

org.apache.spark.sql.jdbc

C.

sqlite

D.

org.apache.spark.sql.sqlite

Questions 26

A data engineer is working on a personal laptop and needs to perform complex transformations on data stored in a Delta Lake on cloud storage. The engineer decides to use Databricks Connect to interact with Databricks clusters and work in their local IDE.

How does Databricks Connect enable the engineer to develop, test, and debug code seamlessly on their local machine while interacting with Databricks clusters?

Options:
A.

By allowing direct execution of Spark jobs from the local machine without needing a network connection

B.

By providing a local environment that mimics the Databricks runtime, enabling the engineer to develop, test, and debug code using a specific IDE that is required by Databricks

C.

By providing a local environment that mimics the Databricks runtime, enabling the engineer to develop, test, and debug code using their preferred ide

D.

By providing a local environment that mimics the Databricks runtime, enabling the engineer to develop, test, and debug code only through Databricks ' own web interface

Questions 27

An organization has data stored across multiple external systems, including MySQL, Amazon Redshift, and Google BigQuery. The data engineer wants to perform analytics without ingesting data directly into Databricks, while ensuring unified governance and minimizing data duplication.

Which feature of Databricks enables querying these external data sources while maintaining centralized governance?

Options:
A.

Lakehouse Federation

B.

Databricks Connect

C.

MLflow

D.

Delta Lake

Questions 28

A data engineer has a Python notebook in Databricks, but they need to use SQL to accomplish a specific task within a cell. They still want all of the other cells to use Python without making any changes to those cells.

Which of the following describes how the data engineer can use SQL within a cell of their Python notebook?

Options:
A.

It is not possible to use SQL in a Python notebook

B.

They can attach the cell to a SQL endpoint rather than a Databricks cluster

C.

They can simply write SQL syntax in the cell

D.

They can add %sql to the first line of the cell

E.

They can change the default language of the notebook to SQL

Questions 29

A data engineer at a company that uses Databricks with Unity Catalog needs to share a collection of tables with an external partner who also uses a Databricks workspace enabled for Unity Catalog. The data engineer decides to use Delta Sharing to accomplish this.

What is the first piece of information the data engineer should request from the external partner to set up Delta Sharing?

Options:
A.

Their Databricks account password

B.

The name of their Databricks cluster

C.

The IP address of their Databricks workspace

D.

The sharing identifier of their Unity Catalog metastore

Questions 30

Which SQL keyword can be used to convert a table from a long format to a wide format?

Options:
A.

TRANSFORM

B.

PIVOT

C.

SUM

D.

CONVERT