Summer Special 60% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: bestdeal

Free Databricks Databricks-Certified-Professional-Data-Scientist Practice Exam with Questions & Answers | Set: 4

Questions 31

Which technique you would be using to solve the below problem statement? "What is the probability that individual customer will not repay the loan amount?"

Options:
A.

Classification

B.

Clustering

C.

Linear Regression

D.

Logistic Regression

E.

Hypothesis testing

Questions 32

Select the choice where Regression algorithms are not best fit

Options:
A.

When the dimension of the object given

B.

Weight of the person is given

C.

Temperature in the atmosphere

D.

Employee status

Questions 33

In statistics, maximum-likelihood estimation (MLE) is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters and the normalizing constant usually ignored in MLEs because

Options:
A.

The normalizing constant is always very close to 1

B.

The normalizing constant only has a small impact on the maximum likelihood

C.

The normalizing constant is often zero and can cause division by zero

D.

The normalizing constant doesn't impact the maximizing value

Questions 34

You are building a classifier off of a very high-dimensiona data set similar to shown in the image with 5000 variables (lots of columns, not that many rows). It can handle both dense and sparse input. Which technique is most suitable, and why?

Databricks-Certified-Professional-Data-Scientist Question 34

Options:
A.

Logistic regression with L1 regularization, to prevent overfitting

B.

Naive Bayes, because Bayesian methods act as regularlizers

C.

k-nearest neighbors, because it uses local neighborhoods to classify examples

D.

Random forest because it is an ensemble method

Questions 35

What is one modeling or descriptive statistical function in MADlib that is typically not provided in a standard relational database?

Options:
A.

Expected value

B.

Variance

C.

Linear regression

D.

Quantiles

Questions 36

Your customer provided you with 2. 000 unlabeled records three groups. What is the correct analytical method to use?

Options:
A.

Semi Linear Regression

B.

Logistic regression

C.

Naive Bayesian classification

D.

Linear regression

E.

K-means clustering

Questions 37

A bio-scientist is working on the analysis of the cancer cells. To identify whether the cell is cancerous or not, there has been hundreds of tests are done with small variations to say yes to the problem. Given the test result for a sample of healthy and cancerous cells, which of the following technique you will use to determine whether a cell is healthy?

Options:
A.

Linear regression

B.

Collaborative filtering

C.

Naive Bayes

D.

Identification Test

Questions 38

Which of the following question statement falls under data science category?

Options:
A.

What happened in last six months?

B.

How many products have been sold in a last month?

C.

Where is a problem for sales?

D.

Which is the optimal scenario for selling this product?

E.

What happens, if these scenario continues?

Questions 39

Suppose you have made a model for the rating system, which rates between 1 to 5 stars. And you calculated that RMSE value is 1.0 then which of the following is correct

Options:
A.

It means that your predictions are on average one star off of what people really think

B.

It means that your predictions are on average two star off of what people really think

C.

It means that your predictions are on average three star off of what people really think

D.

It means that your predictions are on average four star off of what people really think

Questions 40

Consider the following confusion matrix for a data set with 600 out of 11,100 instances positive:

In this case, Precision = 50%, Recall = 83%, Specificity = 95%, and Accuracy = 95%.

Select the correct statement

Databricks-Certified-Professional-Data-Scientist Question 40

Options:
A.

Precision is low, which means the classifier is predicting positives best

B.

Precision is low, which means the classifier is predicting positives poorly

C.

problem domain has a major impact on the measures that should be used to evaluate a classifier within it

D.

1 and 3

E.

2 and 3