Summer Special 60% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: bestdeal

Free Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Practice Exam with Questions & Answers | Set: 2

Questions 11

The code block displayed below contains multiple errors. The code block should return a DataFrame that contains only columns transactionId, predError, value and storeId of DataFrame

transactionsDf. Find the errors.

Code block:

transactionsDf.select([col(productId), col(f)])

Sample of transactionsDf:

1.+-------------+---------+-----+-------+---------+----+

2.|transactionId|predError|value|storeId|productId| f|

3.+-------------+---------+-----+-------+---------+----+

4.| 1| 3| 4| 25| 1|null|

5.| 2| 6| 7| 2| 2|null|

6.| 3| 3| null| 25| 3|null|

7.+-------------+---------+-----+-------+---------+----+

Options:
A.

The column names should be listed directly as arguments to the operator and not as a list.

B.

The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all column names should be expressed

as strings without being wrapped in a col() operator.

C.

The select operator should be replaced by a drop operator.

D.

The column names should be listed directly as arguments to the operator and not as a list and following the pattern of how column names are expressed in the code block, columns productId and

f should be replaced by transactionId, predError, value and storeId.

E.

The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all col() operators should be removed.

Databricks Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Premium Access
Questions 12

Which of the following describes Spark actions?

Options:
A.

Writing data to disk is the primary purpose of actions.

B.

Actions are Spark's way of exchanging data between executors.

C.

The driver receives data upon request by actions.

D.

Stage boundaries are commonly established by actions.

E.

Actions are Spark's way of modifying RDDs.

Questions 13

The code block displayed below contains an error. The code block should read the csv file located at path data/transactions.csv into DataFrame transactionsDf, using the first row as column header

and casting the columns in the most appropriate type. Find the error.

First 3 rows of transactions.csv:

1.transactionId;storeId;productId;name

2.1;23;12;green grass

3.2;35;31;yellow sun

4.3;23;12;green grass

Code block:

transactionsDf = spark.read.load("data/transactions.csv", sep=";", format="csv", header=True)

Options:
A.

The DataFrameReader is not accessed correctly.

B.

The transaction is evaluated lazily, so no file will be read.

C.

Spark is unable to understand the file type.

D.

The code block is unable to capture all columns.

E.

The resulting DataFrame will not have the appropriate schema.

Questions 14

Which of the following code blocks adds a column predErrorSqrt to DataFrame transactionsDf that is the square root of column predError?

Options:
A.

transactionsDf.withColumn("predErrorSqrt", sqrt(predError))

B.

transactionsDf.select(sqrt(predError))

C.

transactionsDf.withColumn("predErrorSqrt", col("predError").sqrt())

D.

transactionsDf.withColumn("predErrorSqrt", sqrt(col("predError")))

E.

transactionsDf.select(sqrt("predError"))

Questions 15

The code block displayed below contains an error. The code block should trigger Spark to cache DataFrame transactionsDf in executor memory where available, writing to disk where insufficient

executor memory is available, in a fault-tolerant way. Find the error.

Code block:

transactionsDf.persist(StorageLevel.MEMORY_AND_DISK)

Options:
A.

Caching is not supported in Spark, data are always recomputed.

B.

Data caching capabilities can be accessed through the spark object, but not through the DataFrame API.

C.

The storage level is inappropriate for fault-tolerant storage.

D.

The code block uses the wrong operator for caching.

E.

The DataFrameWriter needs to be invoked.

Questions 16

Which of the following code blocks creates a new 6-column DataFrame by appending the rows of the 6-column DataFrame yesterdayTransactionsDf to the rows of the 6-column DataFrame

todayTransactionsDf, ignoring that both DataFrames have different column names?

Options:
A.

union(todayTransactionsDf, yesterdayTransactionsDf)

B.

todayTransactionsDf.unionByName(yesterdayTransactionsDf, allowMissingColumns=True)

C.

todayTransactionsDf.unionByName(yesterdayTransactionsDf)

D.

todayTransactionsDf.concat(yesterdayTransactionsDf)

E.

todayTransactionsDf.union(yesterdayTransactionsDf)

Questions 17

Which of the following code blocks returns a single-column DataFrame showing the number of words in column supplier of DataFrame itemsDf?

Sample of DataFrame itemsDf:

1.+------+-----------------------------+-------------------+

2.|itemId|attributes |supplier |

3.+------+-----------------------------+-------------------+

4.|1 |[blue, winter, cozy] |Sports Company Inc.|

5.|2 |[red, summer, fresh, cooling]|YetiX |

6.|3 |[green, summer, travel] |Sports Company Inc.|

7.+------+-----------------------------+-------------------+

Options:
A.

itemsDf.split("supplier", " ").count()

B.

itemsDf.split("supplier", " ").size()

C.

itemsDf.select(word_count("supplier"))

D.

spark.select(size(split(col(supplier), " ")))

E.

itemsDf.select(size(split("supplier", " ")))

Questions 18

Which of the following code blocks returns a DataFrame that is an inner join of DataFrame itemsDf and DataFrame transactionsDf, on columns itemId and productId, respectively and in which every

itemId just appears once?

Options:
A.

itemsDf.join(transactionsDf, "itemsDf.itemId==transactionsDf.productId").distinct("itemId")

B.

itemsDf.join(transactionsDf, itemsDf.itemId==transactionsDf.productId).dropDuplicates(["itemId"])

C.

itemsDf.join(transactionsDf, itemsDf.itemId==transactionsDf.productId).dropDuplicates("itemId")

D.

itemsDf.join(transactionsDf, itemsDf.itemId==transactionsDf.productId, how="inner").distinct(["itemId"])

E.

itemsDf.join(transactionsDf, "itemsDf.itemId==transactionsDf.productId", how="inner").dropDuplicates(["itemId"])

Questions 19

The code block shown below should return a two-column DataFrame with columns transactionId and supplier, with combined information from DataFrames itemsDf and transactionsDf. The code

block should merge rows in which column productId of DataFrame transactionsDf matches the value of column itemId in DataFrame itemsDf, but only where column storeId of DataFrame

transactionsDf does not match column itemId of DataFrame itemsDf. Choose the answer that correctly fills the blanks in the code block to accomplish this.

Code block:

transactionsDf.__1__(itemsDf, __2__).__3__(__4__)

Options:
A.

1. join

2. transactionsDf.productId==itemsDf.itemId, how="inner"

3. select

4. "transactionId", "supplier"

B.

1. select

2. "transactionId", "supplier"

3. join

4. [transactionsDf.storeId!=itemsDf.itemId, transactionsDf.productId==itemsDf.itemId]

C.

1. join

2. [transactionsDf.productId==itemsDf.itemId, transactionsDf.storeId!=itemsDf.itemId]

3. select

4. "transactionId", "supplier"

D.

1. filter

2. "transactionId", "supplier"

3. join

4. "transactionsDf.storeId!=itemsDf.itemId, transactionsDf.productId==itemsDf.itemId"

E.

1. join

2. transactionsDf.productId==itemsDf.itemId, transactionsDf.storeId!=itemsDf.itemId

3. filter

4. "transactionId", "supplier"

Questions 20

Which of the following code blocks produces the following output, given DataFrame transactionsDf?

Output:

1.root

2. |-- transactionId: integer (nullable = true)

3. |-- predError: integer (nullable = true)

4. |-- value: integer (nullable = true)

5. |-- storeId: integer (nullable = true)

6. |-- productId: integer (nullable = true)

7. |-- f: integer (nullable = true)

DataFrame transactionsDf:

1.+-------------+---------+-----+-------+---------+----+

2.|transactionId|predError|value|storeId|productId| f|

3.+-------------+---------+-----+-------+---------+----+

4.| 1| 3| 4| 25| 1|null|

5.| 2| 6| 7| 2| 2|null|

6.| 3| 3| null| 25| 3|null|

7.+-------------+---------+-----+-------+---------+----+

Options:
A.

transactionsDf.schema.print()

B.

transactionsDf.rdd.printSchema()

C.

transactionsDf.rdd.formatSchema()

D.

transactionsDf.printSchema()

E.

print(transactionsDf.schema)