Databricks-Machine-Learning-Associate 無料問題集「Databricks Certified Machine Learning Associate」
A data scientist has been given an incomplete notebook from the data engineering team. The notebook uses a Spark DataFrame spark_df on which the data scientist needs to perform further feature engineering. Unfortunately, the data scientist has not yet learned the PySpark DataFrame API.
Which of the following blocks of code can the data scientist run to be able to use the pandas API on Spark?
Which of the following blocks of code can the data scientist run to be able to use the pandas API on Spark?
正解:B
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
An organization is developing a feature repository and is electing to one-hot encode all categorical feature variables. A data scientist suggests that the categorical feature variables should not be one-hot encoded within the feature repository.
Which of the following explanations justifies this suggestion?
Which of the following explanations justifies this suggestion?
正解:A
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
A data scientist is developing a single-node machine learning model. They have a large number of model configurations to test as a part of their experiment. As a result, the model tuning process takes too long to complete. Which of the following approaches can be used to speed up the model tuning process?
正解:C
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
A data scientist uses 3-fold cross-validation and the following hyperparameter grid when optimizing model hyperparameters via grid search for a classification problem:
* Hyperparameter 1: [2, 5, 10]
* Hyperparameter 2: [50, 100]
Which of the following represents the number of machine learning models that can be trained in parallel during this process?
* Hyperparameter 1: [2, 5, 10]
* Hyperparameter 2: [50, 100]
Which of the following represents the number of machine learning models that can be trained in parallel during this process?
正解:C
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
A machine learning engineer is using the following code block to scale the inference of a single-node model on a Spark DataFrame with one million records:

Assuming the default Spark configuration is in place, which of the following is a benefit of using an Iterator?

Assuming the default Spark configuration is in place, which of the following is a benefit of using an Iterator?
正解:C
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)