Databricks-Machine-Learning-Associate無料問題集「Databricks Certified Machine Learning Associate」

質問 1

A machine learning engineer is trying to scale a machine learning pipeline by distributing its single-node model tuning process. After broadcasting the entire training data onto each core, each core in the cluster can train one model at a time. Because the tuning process is still running slowly, the engineer wants to increase the level of parallelism from 4 cores to 8 cores to speed up the tuning process. Unfortunately, the total memory in the cluster cannot be increased.
In which of the following scenarios will increasing the level of parallelism from 4 to 8 speed up the tuning process?

（A）When the data is particularly wide in shape

（B）When the model is unable to be parallelized

（C）When the entire data can fit on each core

（D）When the tuning process in randomized

（E）When the data is particularly long in shape

正解：C 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 2

A machine learning engineer wants to parallelize the inference of group-specific models using the Pandas Function API. They have developed the apply_model function that will look up and load the correct model for each group, and they want to apply it to each group of DataFrame df.
They have written the following incomplete code block:

Which piece of code can be used to fill in the above blank to complete the task?

（A）mapInPandas

（B）groupedApplyInPandas

（C）predict

（D）applyInPandas

正解：D 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 3

A data scientist is developing a single-node machine learning model. They have a large number of model configurations to test as a part of their experiment. As a result, the model tuning process takes too long to complete. Which of the following approaches can be used to speed up the model tuning process?

（A）Enable autoscaling clusters

（B）Implement MLflow Experiment Tracking

（C）Parallelize with Hyperopt

（D）Scale up with Spark ML

正解：C 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 4

The implementation of linear regression in Spark ML first attempts to solve the linear regression problem using matrix decomposition, but this method does not scale well to large datasets with a large number of variables.
Which of the following approaches does Spark ML use to distribute the training of a linear regression model for large data?

（A）Singular value decomposition

（B）Least-squares method

（C）Logistic regression

（D）Iterative optimization

正解：D 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 5

A data scientist has replaced missing values in their feature set with each respective feature variable's median value. A colleague suggests that the data scientist is throwing away valuable information by doing this.
Which of the following approaches can they take to include as much information as possible in the feature set?

（A）Impute the missing values using each respective feature variable's mean value instead of the median value

（B）Create a constant feature variable for each feature that contained missing values indicating the percentage of rows from the feature that was originally missing

（C）Remove all feature variables that originally contained missing values from the feature set

（D）Refrain from imputing the missing values in favor of letting the machine learning algorithm determine how to handle them

（E）Create a binary feature variable for each feature that contained missing values indicating whether each row's value has been imputed

正解：E 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 6

A data scientist has been given an incomplete notebook from the data engineering team. The notebook uses a Spark DataFrame spark_df on which the data scientist needs to perform further feature engineering. Unfortunately, the data scientist has not yet learned the PySpark DataFrame API.
Which of the following blocks of code can the data scientist run to be able to use the pandas API on Spark?

（A）spark_df.to_pandas()

（B）import pyspark.pandas as ps
df = ps.DataFrame(spark_df)

（C）import pyspark.pandas as ps
df = ps.to_pandas(spark_df)

（D）import pandas as pd
df = pd.DataFrame(spark_df)

正解：B 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 7

An organization is developing a feature repository and is electing to one-hot encode all categorical feature variables. A data scientist suggests that the categorical feature variables should not be one-hot encoded within the feature repository.
Which of the following explanations justifies this suggestion?

（A）One-hot encoding is a potentially problematic categorical variable strategy for some machine learning algorithms.

（B）One-hot encoding is computationally intensive and should only be performed on small samples of training sets for individual machine learning problems.

（C）One-hot encoding is dependent on the target variable's values which differ for each apaplication.

（D）One-hot encoding is not a common strategy for representing categorical feature variables numerically.

正解：A 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 8

A team is developing guidelines on when to use various evaluation metrics for classification problems. The team needs to provide input on when to use the F1 score over accuracy.

Which of the following suggestions should the team include in their guidelines?

（A）The F1 score should be utilized over accuracy when the number of actual positive cases is identical to the number of actual negative cases.

（B）The F1 score should be utilized over accuracy when there is significant imbalance between positive and negative classes and avoiding false negatives is a priority.

（C）The F1 score should be utilized over accuracy when identifying true positives and true negatives are equally important to the business problem.

（D）The F1 score should be utilized over accuracy when there are greater than two classes in the target variable.

正解：B 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

Databricks-Machine-Learning-Associate 無料問題集「Databricks Certified Machine Learning Associate」

弊社を連絡する

関連リンク

トップ試験