Weekend Sale 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: sale65best

Free CompTIA DY0-001 Practice Exam with Questions & Answers

Questions 1

A data scientist receives an update on a business case about a machine that has thousands of error codes. The data scientist creates the following summary statistics profile while reviewing the logs for each machine:

DY0-001 Question 1

| Number of machines observed | 3,000,000

| Number of unique error codes observed | 19,000

| Median number of unique codes per machine | 7

| Median number of error transactions | 45

Which of the following is the most likely concern with respect to data design for model ingestion?

Options:
A.

Sparse matrix

B.

Granularity misalignment

C.

Insufficient features

D.

Multivariate outliers

CompTIA DY0-001 Premium Access
Questions 2

A data analyst wants to generate the most data using tables from a database. Which of the following is the best way to accomplish this objective?

Options:
A.

INNER JOIN

B.

LEFT OUTER JOIN

C.

RIGHT OUTER JOIN

D.

FULL OUTER JOIN

Questions 3

A computer vision model is trained to identify cats on a training set that is composed of both cat and dog images. The model predicts a picture of a cat is a dog. Which of the following describes this error?

Options:
A.

Error due to reality

B.

False positive error

C.

Sampling error

D.

Type II error

Questions 4

A data scientist is building a proof of concept for a commercialized machine-learning model. Which of the following is the best starting point?

Options:
A.

Literature review

B.

Model performance evaluation

C.

Hyperparameter tuning

D.

Model selection

Questions 5

Which of the following best describes the minimization of the residual term in a LASSO linear regression?

Options:
A.

|e|

B.

e

C.

0

D.

Questions 6

A data scientist needs to:

Build a predictive model that gives the likelihood that a car will get a flat tire.

Provide a data set of cars that had flat tires and cars that did not.

All the cars in the data set had sensors taking weekly measurements of tire pressure similar to the sensors that will be installed in the cars consumers drive.

Which of the following is the most immediate data concern?

Options:
A.

Granularity misalignment

B.

Multivariate outliers

C.

Insufficient domain expertise

D.

Lagged observations

Questions 7

Which of the following belong in a presentation to the senior management team and/or C-suite executives? (Choose two.)

Options:
A.

Full literature reviews

B.

Code snippets

C.

Final recommendations

D.

High-level results

E.

Detailed explanations of statistical tests

F.

Security keys and login information

Questions 8

Which of the following environmental changes is most likely to resolve a memory constraint error when running a complex model using distributed computing?

Options:
A.

Converting an on-premises deployment to a containerized deployment

B.

Migrating to a cloud deployment

C.

Moving model processing to an edge deployment

D.

Adding nodes to a cluster deployment

Questions 9

A data scientist needs to analyze a company's chemical businesses and is using the master database of the conglomerate company. Nothing in the data differentiates the data observations for the different businesses. Which of the following is the most efficient way to identify the chemical businesses' observations?

Options:
A.

Ingest the data from all of the hard drives and perform exploratory data analysis to identify which business is responsible for chemical operations.

B.

Perform analysis on all of the data and create a summary report on the results relevant to chemical operations.

C.

Consult with the business team to identify which sites are responsible for chemical operations and ingest only the relevant data for analysis.

D.

Ingest data from the hard drive containing the most data and present sample results on the chemical operations.

Questions 10

A data analyst wants to save a newly analyzed data set to a local storage option. The data set must meet the following requirements:

    Be minimal in size

    Have the ability to be ingested quickly

    Have the associated schema, including data types, stored with it

Which of the following file types is the best to use?

Options:
A.

JSON

B.

Parquet

C.

XML

D.

CSV