Databricks-Machine-Learning-Associate 無料問題集「Databricks Certified Machine Learning Associate」
A machine learning engineer is trying to scale a machine learning pipeline by distributing its single-node model tuning process. After broadcasting the entire training data onto each core, each core in the cluster can train one model at a time. Because the tuning process is still running slowly, the engineer wants to increase the level of parallelism from 4 cores to 8 cores to speed up the tuning process. Unfortunately, the total memory in the cluster cannot be increased.
In which of the following scenarios will increasing the level of parallelism from 4 to 8 speed up the tuning process?
In which of the following scenarios will increasing the level of parallelism from 4 to 8 speed up the tuning process?
正解:C
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
A machine learning engineer wants to parallelize the inference of group-specific models using the Pandas Function API. They have developed the apply_model function that will look up and load the correct model for each group, and they want to apply it to each group of DataFrame df.
They have written the following incomplete code block:

Which piece of code can be used to fill in the above blank to complete the task?
They have written the following incomplete code block:

Which piece of code can be used to fill in the above blank to complete the task?
正解:D
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
A data scientist is developing a single-node machine learning model. They have a large number of model configurations to test as a part of their experiment. As a result, the model tuning process takes too long to complete. Which of the following approaches can be used to speed up the model tuning process?
正解:C
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
The implementation of linear regression in Spark ML first attempts to solve the linear regression problem using matrix decomposition, but this method does not scale well to large datasets with a large number of variables.
Which of the following approaches does Spark ML use to distribute the training of a linear regression model for large data?
Which of the following approaches does Spark ML use to distribute the training of a linear regression model for large data?
正解:D
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
A data scientist has replaced missing values in their feature set with each respective feature variable's median value. A colleague suggests that the data scientist is throwing away valuable information by doing this.
Which of the following approaches can they take to include as much information as possible in the feature set?
Which of the following approaches can they take to include as much information as possible in the feature set?
正解:E
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
A data scientist has been given an incomplete notebook from the data engineering team. The notebook uses a Spark DataFrame spark_df on which the data scientist needs to perform further feature engineering. Unfortunately, the data scientist has not yet learned the PySpark DataFrame API.
Which of the following blocks of code can the data scientist run to be able to use the pandas API on Spark?
Which of the following blocks of code can the data scientist run to be able to use the pandas API on Spark?
正解:B
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
An organization is developing a feature repository and is electing to one-hot encode all categorical feature variables. A data scientist suggests that the categorical feature variables should not be one-hot encoded within the feature repository.
Which of the following explanations justifies this suggestion?
Which of the following explanations justifies this suggestion?
正解:A
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
A team is developing guidelines on when to use various evaluation metrics for classification problems. The team needs to provide input on when to use the F1 score over accuracy.

Which of the following suggestions should the team include in their guidelines?

Which of the following suggestions should the team include in their guidelines?
正解:B
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)