Summer Special 60% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: bestdeal

Free Databricks Databricks-Certified-Professional-Data-Scientist Practice Exam with Questions & Answers

Questions 1

Which method is used to solve for coefficients bO, b1, ... bn in your linear regression model:

Databricks-Certified-Professional-Data-Scientist Question 1

Options:
A.

Apriori Algorithm

B.

Ridge and Lasso

C.

Ordinary Least squares

D.

Integer programming

Databricks Databricks-Certified-Professional-Data-Scientist Premium Access
Questions 2

Select the correct statement regarding the naive Bayes classification

Options:
A.

it only requires a small amount of training data to estimate the parameters

B.

Independent variables can be assumed

C.

only the variances of the variables for each class need to be determined

D.

for each class entire covariance matrix need to be determined

Questions 3

Question-3: In machine learning, feature hashing, also known as the hashing trick (by analogy to the kernel trick), is a fast and space-efficient way of vectorizing features (such as the words in a language), i.e., turning arbitrary features into indices in a vector or matrix. It works by applying a hash function to the features and using their hash values modulo the number of features as indices directly, rather than looking the indices up in an associative array. So what is the primary reason of the hashing trick for building classifiers?

Options:
A.

It creates the smaller models

B.

It requires the lesser memory to store the coefficients for the model

C.

It reduces the non-significant features e.g. punctuations

D.

Noisy features are removed

Questions 4

Which activity is performed in the Operationalize phase of the Data Analytics Lifecycle?

Options:
A.

Define the process to maintain the model

B.

Try different analytical techniques

C.

Try different variables

D.

Transform existing variables

Questions 5

You are analyzing data in order to build a classifier model. You discover non-linear data and discontinuities that will affect the model. Which analytical method would you recommend?

Options:
A.

Logistic Regression

B.

Decision Trees

C.

Linear Regression

D.

ARIMA

Questions 6

Which of the following is a Continuous Probability Distributions?

Options:
A.

Binomial probability distribution

B.

Negative binomial distribution

C.

Poisson probability distribution

D.

Normal probability distribution

Questions 7

You are studying the behavior of a population, and you are provided with multidimensional data at the individual level. You have identified four specific individuals who are valuable to your study, and would like to find all users who are most similar to each individual. Which algorithm is the most appropriate for this study?

Options:
A.

Association rules

B.

Decision trees

C.

Linear regression

D.

K-means clustering

Questions 8

Which of the below best describe the Principal component analysis

Options:
A.

Dimensionality reduction

B.

Collaborative filtering

C.

Classification

D.

Regression

E.

Clustering

Questions 9

You are creating a regression model with the input income, education and current debt of a customer, what could be the possible output from this model.

Options:
A.

Customer fit as a good

B.

Customer fit as acceptable or average category

C.

expressed as a percent, that the customer will default on a loan

D.

1 and 3 are correct

E.

2 and 3 are correct

Questions 10

Select the correct statement which applies to Principal component analysis (PCA)

Options:
A.

Is a mathematical procedure that transforms a number of (possibly) correlated variables into a (smaller) number of uncorrelated variables.

B.

Is a mathematical procedure that transforms a number of (possibly) correlated variables into a (higher) number of uncorrelated variables

C.

Increase the dimensionality of the data set.

D.

1 and 3 are correct

E.

1 and 2 are correct