DSA-C03 無料問題集「Snowflake SnowPro Advanced: Data Scientist Certification」
You have a binary classification model deployed in Snowflake to predict customer churn. The model outputs a probability score between 0 and 1. You've calculated the following confusion matrix on a holdout set: I I Predicted Positive I Predicted Negative I --1 1 Actual Positive | 80 | 20 | I Actual Negative | 10 | 90 | What are the Precision, Recall, and Accuracy for this model, and what do these metrics tell you about the model's performance? SELECT statement given for true and false condition (True Positive, True Negative, False Positive, False Negative)
正解:C
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
You are building a machine learning model using Snowpark Python to predict house prices. The dataset contains a feature column named 'location' which contains free-form text descriptions of house locations. You want to leverage a pre-trained Large Language Model (LLM) hosted externally to extract structured location features like city, state, and zip code from the free-form text within Snowpark. You want to minimize the data transferred out of Snowflake. Which approach is most efficient and secure?
正解:D
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
You've created a Python stored procedure in Snowflake to train a model. The procedure successfully trains the model, saves it using 'joblib.dump' , and then attempts to upload the model file to an internal stage. However, the upload fails intermittently with a FileNotFoundErroN. The stage is correctly configured, and the stored procedure has the necessary privileges. Which of the following actions are MOST likely to resolve this issue? (Select TWO)
正解:A、B
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
You are tasked with forecasting the daily sales of a specific product for the next 30 days using Snowflake. You have historical sales data for the past 3 years, stored in a Snowflake table named 'SALES DATA', with columns 'SALE DATE (DATE type) and 'SALES AMOUNT' (NUMBER type). You want to use the Prophet library within a Snowflake User-Defined Function (UDF) for forecasting. The Prophet model requires the input data to have columns named 'ds' (for dates) and 'y' (for values). Which of the following code snippets demonstrates the CORRECT way to prepare and pass your data to the Prophet UDF in Snowflake, assuming you've already created the Python UDF 'prophet_forecast'?
正解:D
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
A data scientist is analyzing website click-through rates (CTR) for two different ad campaigns. Campaign A ran for two weeks and had 10,000 impressions with 500 clicks. Campaign B also ran for two weeks with 12,000 impressions and 660 clicks. The data scientist wants to determine if there's a statistically significant difference in CTR between the two campaigns. Assume the population standard deviation is unknown and unequal for the two campaigns. Which statistical test is most appropriate to use, and what Snowflake SQL code would be used to approximate the p-value for this test (assume 'clicks_b' , and are already defined Snowflake variables)?
正解:A
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
You are a data scientist working with a large dataset of customer transactions stored in Snowflake. You need to identify potential fraud using statistical summaries. Which of the following approaches would be MOST effective in identifying unusual spending patterns, considering the need for scalability and performance within Snowflake?
正解:B、D
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
You are developing a model to predict customer churn using Snowflake ML. After training a Gradient Boosting model, you want to understand the relationship between 'number_of_products' and the churn probability. You generate a partial dependence plot (PDP) for 'number_of_products'. The PDP shows a steep increase in churn probability as 'number_of_products' increases from 1 to 3, followed by a plateau. Which of the following statements are the MOST accurate interpretations of this PDP? Assume the dataset is balanced and has undergone proper preprocessing.
正解:B、C
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
You are using Snowflake Cortex to build a customer support chatbot that leverages LLMs to answer customer questions. You have a knowledge base stored in a Snowflake table. The following options describe different methods for using this knowledge base in conjunction with the LLM to generate responses. Which of the following approaches will likely result in the MOST accurate, relevant, and cost-effective responses from the LLM?
正解:C
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
You are a data scientist working for a retail company that stores its transaction data in Snowflake. You need to perform feature engineering on customer purchase history data to build a customer churn prediction model. Which of the following approaches best combines Snowflake's capabilities with a machine learning framework (like scikit-learn) for efficient feature engineering? Assume your data is stored in a table named 'CUSTOMER TRANSACTIONS' with columns like 'CUSTOMER ID, 'TRANSACTION DATE, 'AMOUNT, and 'PRODUCT CATEGORY.
正解:B
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
You are a data scientist working for a retail company. You've been tasked with identifying fraudulent transactions. You have a Snowflake table named 'TRANSACTIONS' with columns 'TRANSACTION ID', 'AMOUNT', 'TRANSACTION DATE', 'CUSTOMER ID', and 'LOCATION'. You suspect outliers in transaction amounts might indicate fraud. Which of the following SQL queries is the MOST efficient and appropriate to identify potential outliers using the Interquartile Range (IQR) method, and incorporate necessary data type considerations for robust percentile calculations? Consider also the computational cost associated with each approach on a large dataset.


正解:C
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
You are building a predictive model for customer churn using linear regression in Snowflake. You have identified several features, including 'CUSTOMER AGE', 'MONTHLY SPEND', and 'NUM CALLS'. After performing an initial linear regression, you suspect that the relationship between 'CUSTOMER AGE and churn is not linear and that older customers might churn at a different rate than younger customers. You want to introduce a polynomial feature of "CUSTOMER AGE (specifically, 'CUSTOMER AGE SQUARED') to your regression model within Snowflake SQL before further analysis with python and Snowpark. How can you BEST create this new feature in a robust and maintainable way directly within Snowflake?


正解:D
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
You are building a time-series forecasting model in Snowflake to predict the hourly energy consumption of a building. You have historical data with timestamps and corresponding energy consumption values. You've noticed significant daily seasonality and a weaker weekly seasonality. Which of the following techniques or approaches would be most appropriate for capturing both seasonality patterns within a supervised learning framework using Snowflake?
正解:B、D
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
You are using Snowpark Feature Store to manage features for your machine learning models. You've created several Feature Groups and now want to consume these features for training a model. To optimize retrieval, you want to use point-in-time correctness. Which of the following actions/configurations are essential to ensure point-in-time correctness when retrieving features using Snowpark Feature Store?
正解:A、C
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
You've deployed a fraud detection model in Snowflake. The model is implemented as a Python UDF that uses a pre-trained scikit-learn model stored as a stage file. Your goal is to enable near real-time fraud detection on incoming transactions. Due to regulatory requirements, you need to maintain a detailed audit trail of all predictions, including the input features, model version, prediction scores, and any errors encountered during the prediction process. Which of the following approaches are valid and efficient for storing these audit logs and predictions in Snowflake?
正解:A、E
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
You are tasked with identifying Personally Identifiable Information (PII) within a Snowflake table named 'customer data'. This table contains various columns, some of which may contain sensitive information like email addresses and phone numbers. You want to use Snowflake's data governance features to tag these columns appropriately. Which of the following approaches is the MOST effective and secure way to automatically identify and tag potential PII columns with the 'PII CLASSIFIED tag in your Snowflake environment, ensuring minimal manual intervention and optimal accuracy?
正解:C
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
You're working on a fraud detection system for an e-commerce platform. You have a table 'TRANSACTIONS with a 'TRANSACTION AMOUNT column. You want to bin the transaction amounts into several risk categories ('Low', 'Medium', 'High', 'Very High') using explicit boundaries. You want the bins to be inclusive of the lower boundary and exclusive of the upper boundary (e.g., [0, 100), [100, 500), etc.). Which of the following SQL statements using the 'WIDTH BUCKET function correctly bins the transaction amounts into these categories, assuming these boundaries: 0, 100, 500, 1000, and infinity, and assigns appropriate labels?


正解:A
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)