[2023年10月30日] トップクラスのAWS-Certified-Data-Analytics-Specialty練習試験問題 [Q76-Q100]

Share

[2023年10月30日] トップクラスのAWS-Certified-Data-Analytics-Specialty練習試験問題

実際問題を使ってAWS-Certified-Data-Analytics-Specialty無料問題集サンプル問題と練習テストエンジン

質問 # 76
A power utility company is deploying thousands of smart meters to obtain real-time updates about power consumption. The company is using Amazon Kinesis Data Streams to collect the data streams from smart meters. The consumer application uses the Kinesis Client Library (KCL) to retrieve the stream data. The company has only one consumer application.
The company observes an average of 1 second of latency from the moment that a record is written to the stream until the record is read by a consumer application. The company must reduce this latency to 500 milliseconds.
Which solution meets these requirements?

  • A. Reduce the propagation delay by overriding the KCL default settings.
  • B. Increase the number of shards for the Kinesis data stream.
  • C. Use enhanced fan-out in Kinesis Data Streams.
  • D. Develop consumers by using Amazon Kinesis Data Firehose.

正解:A

解説:
Explanation
The KCL defaults are set to follow the best practice of polling every 1 second. This default results in average propagation delays that are typically below 1 second.


質問 # 77
A financial company uses Apache Hive on Amazon EMR for ad-hoc queries. Users are complaining of sluggish performance.
A data analyst notes the following:
* Approximately 90% of queries are submitted 1 hour after the market opens.
* Hadoop Distributed File System (HDFS) utilization never exceeds 10%.
Which solution would help address the performance issues?

  • A. Create instance fleet configurations for core and task nodes. Create an automatic scaling policy to scale out the instance groups based on the Amazon CloudWatch YARNMemoryAvailablePercentage metric.
    Create an automatic scaling policy to scale in the instance fleet based on the CloudWatch YARNMemoryAvailablePercentage metric.
  • B. Create instance fleet configurations for core and task nodes. Create an automatic scaling policy to scale out the instance groups based on the Amazon CloudWatch CapacityRemainingGB metric. Create an automatic scaling policy to scale in the instance fleet based on the CloudWatch CapacityRemainingGB metric.
  • C. Create instance group configurations for core and task nodes. Create an automatic scaling policy to scale out the instance groups based on the Amazon CloudWatch CapacityRemainingGB metric. Create an automatic scaling policy to scale in the instance groups based on the CloudWatch CapacityRemainingGB metric.
  • D. Create instance group configurations for core and task nodes. Create an automatic scaling policy to scale out the instance groups based on the Amazon CloudWatch YARNMemoryAvailablePercentage metric.
    Create an automatic scaling policy to scale in the instance groups based on the CloudWatch YARNMemoryAvailablePercentage metric.

正解:C


質問 # 78
A company is planning to do a proof of concept for a machine learning (ML) project using Amazon SageMaker with a subset of existing on-premises data hosted in the company's 3 TB data warehouse. For part of the project, AWS Direct Connect is established and tested. To prepare the data for ML, data analysts are performing data curation. The data analysts want to perform multiple step, including mapping, dropping null fields, resolving choice, and splitting fields. The company needs the fastest solution to curate the data for this project.
Which solution meets these requirements?

  • A. Ingest data into Amazon S3 using AWS DataSync and use Apache Spark scrips to curate the data in an Amazon EMR cluster. Store the curated data in Amazon S3 for ML processing.
  • B. Ingest data into Amazon S3 using AWS DMS. Use AWS Glue to perform data curation and store the data in Amazon S3 for ML processing.
  • C. Create custom ETL jobs on-premises to curate the data. Use AWS DMS to ingest data into Amazon S3 for ML processing.
  • D. Take a full backup of the data store and ship the backup files using AWS Snowball. Upload Snowball data into Amazon S3 and schedule data curation jobs using AWS Batch to prepare the data for ML.

正解:B


質問 # 79
A company is building a data lake and needs to ingest data from a relational database that has time-series data.
The company wants to use managed services to accomplish this. The process needs to be scheduled daily and bring incremental data only from the source into Amazon S3.
What is the MOST cost-effective approach to meet these requirements?

  • A. Use AWS Glue to connect to the data source using JDBC Drivers and ingest the full data. Use AWS DataSync to ensure the delta only is written into Amazon S3.
  • B. Use AWS Glue to connect to the data source using JDBC Drivers and ingest the entire dataset. Use appropriate Apache Spark libraries to compare the dataset, and find the delta.
  • C. Use AWS Glue to connect to the data source using JDBC Drivers. Ingest incremental records only using job bookmarks.
  • D. Use AWS Glue to connect to the data source using JDBC Drivers. Store the last updated key in an Amazon DynamoDB table and ingest the data using the updated key as a filter.

正解:C

解説:
Explanation
https://docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html


質問 # 80
An online retail company with millions of users around the globe wants to improve its ecommerce analytics capabilities. Currently, clickstream data is uploaded directly to Amazon S3 as compressed files. Several times each day, an application running on Amazon EC2 processes the data and makes search options and reports available for visualization by editors and marketers. The company wants to make website clicks and aggregated data available to editors and marketers in minutes to enable them to connect with users more effectively.
Which options will help meet these requirements in the MOST efficient way? (Choose two.)

  • A. Use Amazon Elasticsearch Service deployed on Amazon EC2 to aggregate, filter, and process the data. Refresh content performance dashboards in near-real time.
  • B. Use Kibana to aggregate, filter, and visualize the data stored in Amazon Elasticsearch Service. Refresh content performance dashboards in near-real time.
  • C. Use Amazon Kinesis Data Firehose to upload compressed and batched clickstream records to Amazon Elasticsearch Service.
  • D. Upload clickstream records from Amazon S3 to Amazon Kinesis Data Streams and use a Kinesis Data Streams consumer to send records to Amazon Elasticsearch Service.
  • E. Upload clickstream records to Amazon S3 as compressed files. Then use AWS Lambda to send data to Amazon Elasticsearch Service from Amazon S3.

正解:B、C


質問 # 81
A company wants to improve user satisfaction for its smart home system by adding more features to its recommendation engine. Each sensor asynchronously pushes its nested JSON data into Amazon Kinesis Data Streams using the Kinesis Producer Library (KPL) in Jav a. Statistics from a set of failed sensors showed that, when a sensor is malfunctioning, its recorded data is not always sent to the cloud.
The company needs a solution that offers near-real-time analytics on the data from the most updated sensors. Which solution enables the company to meet these requirements?

  • A. Update the sensors code to use the PutRecord/PutRecords call from the Kinesis Data Streams API with the AWS SDK for Java. Use Kinesis Data Analytics to enrich the data based on a company-developed anomaly detection SQL script. Direct the output of KDA application to a Kinesis Data Firehose delivery stream, enable the data transformation feature to flatten the JSON file, and set the Kinesis Data Firehose destination to an Amazon Elasticsearch Service cluster.
  • B. Set the RecordMaxBufferedTime property of the KPL to "-1" to disable the buffering on the sensor side. Use Kinesis Data Analytics to enrich the data based on a company-developed anomaly detection SQL script. Push the enriched data to a fleet of Kinesis data streams and enable the data transformation feature to flatten the JSON file. Instantiate a dense storage Amazon Redshift cluster and use it as the destination for the Kinesis Data Firehose delivery stream.
  • C. Set the RecordMaxBufferedTime property of the KPL to "0" to disable the buffering on the sensor side. Connect for each stream a dedicated Kinesis Data Firehose delivery stream and enable the data transformation feature to flatten the JSON file before sending it to an Amazon S3 bucket. Load the S3 data into an Amazon Redshift cluster.
  • D. Update the sensors code to use the PutRecord/PutRecords call from the Kinesis Data Streams API with the AWS SDK for Java. Use AWS Glue to fetch and process data from the stream using the Kinesis Client Library (KCL). Instantiate an Amazon Elasticsearch Service cluster and use AWS Lambda to directly push data into it.

正解:A

解説:
https://docs.aws.amazon.com/streams/latest/dev/developing-producers-with-kpl.html The KPL can incur an additional processing delay of up to RecordMaxBufferedTime within the library (user-configurable). Larger values of RecordMaxBufferedTime results in higher packing efficiencies and better performance. Applications that cannot tolerate this additional delay may need to use the AWS SDK directly.


質問 # 82
A company wants to use an automatic machine learning (ML) Random Cut Forest (RCF) algorithm to visualize complex real-world scenarios, such as detecting seasonality and trends, excluding outers, and imputing missing values.
The team working on this project is non-technical and is looking for an out-of-the-box solution that will require the LEAST amount of management overhead.
Which solution will meet these requirements?

  • A. Use calculated fields to create a new forecast and then use Amazon QuickSight to visualize the data.
  • B. Use Amazon QuickSight to visualize the data and then use ML-powered forecasting to forecast the key business metrics.
  • C. Use an AWS Glue ML transform to create a forecast and then use Amazon QuickSight to visualize the data.
  • D. Use a pre-build ML AMI from the AWS Marketplace to create forecasts and then use Amazon QuickSight to visualize the data.

正解:C


質問 # 83
A company that monitors weather conditions from remote construction sites is setting up a solution to collect temperature data from the following two weather stations.
* Station A, which has 10 sensors
* Station B, which has five sensors
These weather stations were placed by onsite subject-matter experts.
Each sensor has a unique ID. The data collected from each sensor will be collected using Amazon Kinesis Data Streams.
Based on the total incoming and outgoing data throughput, a single Amazon Kinesis data stream with two shards is created. Two partition keys are created based on the station names. During testing, there is a bottleneck on data coming from Station A, but not from Station B.
Upon review, it is confirmed that the total stream throughput is still less than the allocated Kinesis Data Streams throughput.
How can this bottleneck be resolved without increasing the overall cost and complexity of the solution, while retaining the data collection quality requirements?

  • A. Increase the number of shards in Kinesis Data Streams to increase the level of parallelism.
  • B. Create a separate Kinesis data stream for Station A with two shards, and stream Station A sensor data to the new stream.
  • C. Reduce the number of sensors in Station A from 10 to 5 sensors.
  • D. Modify the partition key to use the sensor ID instead of the station name.

正解:D

解説:
Explanation
https://docs.aws.amazon.com/streams/latest/dev/kinesis-using-sdk-java-resharding.html
"Splitting increases the number of shards in your stream and therefore increases the data capacity of the stream. Because you are charged on a per-shard basis, splitting increases the cost of your stream"


質問 # 84
A data analyst runs a large number of data manipulation language (DML) queries by using Amazon Athena with the JDBC driver. Recently, a query failed after It ran for 30 minutes. The query returned the following message Java.sql.SGLException: Query timeout The data analyst does not immediately need the query results However, the data analyst needs a long-term solution for this problem Which solution will meet these requirements?

  • A. Save the tables as compressed .csv files
  • B. In the settings for Athena, adjust the DML query timeout limit
  • C. In the Service Quotas console, request an increase for the DML query timeout
  • D. Split the query into smaller queries to search smaller subsets of data.

正解:D


質問 # 85
A company wants to provide its data analysts with uninterrupted access to the data in its Amazon Redshift cluster. All data is streamed to an Amazon S3 bucket with Amazon Kinesis Data Firehose. An AWS Glue job that is scheduled to run every 5 minutes issues a COPY command to move the data into Amazon Redshift.
The amount of data delivered is uneven throughout the day, and cluster utilization is high during certain periods. The COPY command usually completes within a couple of seconds. However, when load spike occurs, locks can exist and data can be missed. Currently, the AWS Glue job is configured to run without retries, with timeout at 5 minutes and concurrency at 1.
How should a data analytics specialist configure the AWS Glue job to optimize fault tolerance and improve data availability in the Amazon Redshift cluster?

  • A. Keep the number of retries at 0. Increase the timeout value. Keep the job concurrency at 1.
  • B. Keep the number of retries at 0. Decrease the timeout value. Keep the job concurrency at 1.
  • C. Keep the number of retries at 0. Decrease the timeout value. Increase the job concurrency.
  • D. Increase the number of retries. Decrease the timeout value. Increase the job concurrency.

正解:C


質問 # 86
A data analyst is designing a solution to interactively query datasets with SQL using a JDBC connection.
Users will join data stored in Amazon S3 in Apache ORC format with data stored in Amazon Elasticsearch Service (Amazon ES) and Amazon Aurora MySQL.
Which solution will provide the MOST up-to-date results?

  • A. Use Amazon DMS to stream data from Amazon ES and Aurora MySQL to Amazon Redshift. Query the data with Amazon Redshift.
  • B. Query all the datasets in place with Apache Spark SQL running on an AWS Glue developer endpoint.
  • C. Use AWS Glue jobs to ETL data from Amazon ES and Aurora MySQL to Amazon S3. Query the data with Amazon Athena.
  • D. Query all the datasets in place with Apache Presto running on Amazon EMR.

正解:B


質問 # 87
A marketing company is storing its campaign response data in Amazon S3. A consistent set of sources has generated the data for each campaign. The data is saved into Amazon S3 as .csv files. A business analyst will use Amazon Athena to analyze each campaign's dat a. The company needs the cost of ongoing data analysis with Athena to be minimized.
Which combination of actions should a data analytics specialist take to meet these requirements? (Choose two.)

  • A. Convert the .csv files to Apache Parquet.
  • B. Compress the .csv files.
  • C. Partition the data by campaign.
  • D. Convert the .csv files to Apache Avro.
  • E. Partition the data by source.

正解:A、C

解説:
https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/


質問 # 88
A hospital is building a research data lake to ingest data from electronic health records (EHR) systems from multiple hospitals and clinics. The EHR systems are independent of each other and do not have a common patient identifier. The data engineering team is not experienced in machine learning (ML) and has been asked to generate a unique patient identifier for the ingested records.
Which solution will accomplish this task?

  • A. An AWS Glue ETL job with the ResolveChoice transform
  • B. An AWS Glue ETL job with the FindMatches transform
  • C. Amazon Kendra
  • D. Amazon SageMaker Ground Truth

正解:B

解説:
Matching Records with AWS Lake Formation FindMatches


質問 # 89
A company uses Amazon Redshift as its data warehouse. A new table has columns that contain sensitive dat a. The data in the table will eventually be referenced by several existing queries that run many times a day.
A data analyst needs to load 100 billion rows of data into the new table. Before doing so, the data analyst must ensure that only members of the auditing group can read the columns containing sensitive data.
How can the data analyst meet these requirements with the lowest maintenance overhead?

  • A. Load all the data into the new table and grant all users read-only permissions to non-sensitive columns. Attach an IAM policy to the auditing group with explicit ALLOW access to the sensitive data columns.
  • B. Load all the data into the new table and grant the auditing group permission to read from the table. Use the GRANT SQL command to allow read-only access to a subset of columns to the appropriate users.
  • C. Load all the data into the new table and grant the auditing group permission to read from the table. Load all the data except for the columns containing sensitive data into a second table. Grant the appropriate users read-only permissions to the second table.
  • D. Load all the data into the new table and grant the auditing group permission to read from the table. Create a view of the new table that contains all the columns, except for those considered sensitive, and grant the appropriate users read-only permissions to the table.

正解:B

解説:
https://aws.amazon.com/blogs/big-data/achieve-finer-grained-data-security-with-column-level-access-control-in-amazon-redshift/


質問 # 90
A company hosts an Apache Flink application on premises. The application processes data from several Apache Kafka clusters. The data originates from a variety of sources, such as web applications mobile apps and operational databases The company has migrated some of these sources to AWS and now wants to migrate the Flink application. The company must ensure that data that resides in databases within the VPC does not traverse the internet The application must be able to process all the data that comes from the company's AWS solution, on-premises resources and the public internet Which solution will meet these requirements with the LEAST operational overhead?

  • A. Implement Flink on Amazon EC2 within the company's VPC Use Amazon Kinesis Data Streams to collect data that comes from applications and databases within the VPC and the public internet Configure Flink to have sources from Kinesis Data Streams and any on-premises Kafka clusters by using AWS Client VPN or AWS Direct Connect
  • B. Create an Amazon Kinesis Data Analytics application by uploading the compiled Flink jar file Use Amazon Kinesis Data Streams to collect data that comes from applications and databases within the VPC and the public internet Configure the Kinesis Data Analytics application to have sources from Kinesis Data Streams and any on-premises Kafka clusters by using AWS Client VPN or AWS Direct Connect
  • C. Implement Flink on Amazon EC2 within the company's VPC Create Amazon Managed Streaming for Apache Kafka (Amazon MSK) clusters in the VPC to collect data that comes from applications and databases within the VPC Use Amazon Kinesis Data Streams to collect data that comes from the public internet Configure Flink to have sources from Kinesis Data Streams Amazon MSK and any on-premises Kafka clusters by using AWS Client VPN or AWS Direct Connect
  • D. Create an Amazon Kinesis Data Analytics application by uploading the compiled Flink jar file Create Amazon Managed Streaming for Apache Kafka (Amazon MSK) clusters in the company's VPC to collect data that comes from applications and databases within the VPC Use Amazon Kinesis Data Streams to collect data that comes from the public internet Configure the Kinesis Data Analytics application to have sources from Kinesis Data Streams. Amazon MSK and any on-premises Kafka clusters by using AWS Client VPN or AWS Direct Connect

正解:D


質問 # 91
A data analyst is designing a solution to interactively query datasets with SQL using a JDBC connection.
Users will join data stored in Amazon S3 in Apache ORC format with data stored in Amazon Elasticsearch Service (Amazon ES) and Amazon Aurora MySQL.
Which solution will provide the MOST up-to-date results?

  • A. Use Amazon DMS to stream data from Amazon ES and Aurora MySQL to Amazon Redshift. Query the data with Amazon Redshift.
  • B. Query all the datasets in place with Apache Spark SQL running on an AWS Glue developer endpoint.
  • C. Use AWS Glue jobs to ETL data from Amazon ES and Aurora MySQL to Amazon S3. Query the data with Amazon Athena.
  • D. Query all the datasets in place with Apache Presto running on Amazon EMR.

正解:B


質問 # 92
A large financial company is running its ETL process. Part of this process is to move data from Amazon S3 into an Amazon Redshift cluster. The company wants to use the most cost-efficient method to load the dataset into Amazon Redshift.
Which combination of steps would meet these requirements? (Choose two.)

  • A. Use S3DistCp to load files into Amazon Redshift.
  • B. Use the UNLOAD command to upload data into Amazon Redshift.
  • C. Use Amazon Redshift Spectrum to query files from Amazon S3.
  • D. Use the COPY command with the manifest file to load data into Amazon Redshift.
  • E. Use temporary staging tables during the loading process.

正解:D、E


質問 # 93
A large company receives files from external parties in Amazon EC2 throughout the day. At the end of the day, the files are combined into a single file, compressed into a gzip file, and uploaded to Amazon S3. The total size of all the files is close to 100 GB daily. Once the files are uploaded to Amazon S3, an AWS Batch program executes a COPY command to load the files into an Amazon Redshift cluster.
Which program modification will accelerate the COPY process?

  • A. Upload the individual files to Amazon S3 and run the COPY command as soon as the files become available.
  • B. Apply sharding by breaking up the files so the distkey columns with the same values go to the same file.
    Gzip and upload the sharded files to Amazon S3. Run the COPY command on the files.
  • C. Split the number of files so they are equal to a multiple of the number of compute nodes in the Amazon Redshift cluster. Gzip and upload the files to Amazon S3. Run the COPY command on the files.
  • D. Split the number of files so they are equal to a multiple of the number of slices in the Amazon Redshift cluster. Gzip and upload the files to Amazon S3. Run the COPY command on the files.

正解:D


質問 # 94
A company wants to improve the data load time of a sales data dashboard. Data has been collected as .csv files and stored within an Amazon S3 bucket that is partitioned by date. The data is then loaded to an Amazon Redshift data warehouse for frequent analysis. The data volume is up to 500 GB per day.
Which solution will improve the data loading performance?

  • A. Split large .csv files, then use a COPY command to load data into Amazon Redshift.
  • B. Compress .csv files and use an INSERT statement to ingest data into Amazon Redshift.
  • C. Use Amazon Kinesis Data Firehose to ingest data into Amazon Redshift.
  • D. Load the .csv files in an unsorted key order and vacuum the table in Amazon Redshift.

正解:A

解説:
Explanation
https://docs.aws.amazon.com/redshift/latest/dg/c_loading-data-best-practices.html


質問 # 95
A company is building a data lake and needs to ingest data from a relational database that has time-series data.
The company wants to use managed services to accomplish this. The process needs to be scheduled daily and bring incremental data only from the source into Amazon S3.
What is the MOST cost-effective approach to meet these requirements?

  • A. Use AWS Glue to connect to the data source using JDBC Drivers and ingest the full data. Use AWS DataSync to ensure the delta only is written into Amazon S3.
  • B. Use AWS Glue to connect to the data source using JDBC Drivers and ingest the entire dataset. Use appropriate Apache Spark libraries to compare the dataset, and find the delta.
  • C. Use AWS Glue to connect to the data source using JDBC Drivers. Store the last updated key in an Amazon DynamoDB table and ingest the data using the updated key as a filter.
  • D. Use AWS Glue to connect to the data source using JDBC Drivers. Ingest incremental records only using job bookmarks.

正解:C


質問 # 96
A media content company has a streaming playback application. The company wants to collect and analyze the data to provide near-real-time feedback on playback issues. The company needs to consume this data and return results within 30 seconds according to the service-level agreement (SLA). The company needs the consumer to identify playback issues, such as quality during a specified timeframe. The data will be emitted as JSON and may change schemas over time.
Which solution will allow the company to collect data for processing while meeting these requirements?

  • A. Send the data to Amazon Kinesis Data Firehose with delivery to Amazon S3. Configure Amazon S3 to trigger an event for AWS Lambda to process. The Lambda function will consume the data and process it to identify potential playback issues. Persist the raw data to Amazon DynamoDB.
  • B. Send the data to Amazon Kinesis Data Streams and configure an Amazon Kinesis Analytics for Java application as the consumer. The application will consume the data and process it to identify potential playback issues. Persist the raw data to Amazon S3.
  • C. Send the data to Amazon Managed Streaming for Kafka and configure an Amazon Kinesis Analytics for Java application as the consumer. The application will consume the data and process it to identify potential playback issues. Persist the raw data to Amazon DynamoDB.
  • D. Send the data to Amazon Kinesis Data Firehose with delivery to Amazon S3. Configure an S3 event trigger an AWS Lambda function to process the data. The Lambda function will consume the data and process it to identify potential playback issues. Persist the raw data to Amazon S3.

正解:B

解説:
Explanation
https://aws.amazon.com/blogs/aws/new-amazon-kinesis-data-analytics-for-java/


質問 # 97
An airline has .csv-formatted data stored in Amazon S3 with an AWS Glue Data Catalog. Data analysts want to join this data with call center data stored in Amazon Redshift as part of a dally batch process. The Amazon Redshift cluster is already under a heavy load. The solution must be managed, serverless, well-functioning, and minimize the load on the existing Amazon Redshift cluster. The solution should also require minimal effort and development activity.
Which solution meets these requirements?

  • A. Create an external table using Amazon Redshift Spectrum for the call center data and perform the join with Amazon Redshift.
  • B. Export the call center data from Amazon Redshift to Amazon EMR using Apache Sqoop. Perform the join with Apache Hive.
  • C. Unload the call center data from Amazon Redshift to Amazon S3 using an AWS Lambda function.
    Perform the join with AWS Glue ETL scripts.
  • D. Export the call center data from Amazon Redshift using a Python shell in AWS Glue. Perform the join with AWS Glue ETL scripts.

正解:A

解説:
Explanation
https://docs.aws.amazon.com/redshift/latest/dg/c-spectrum-external-tables.html


質問 # 98
A mobile gaming company wants to capture data from its gaming app and make the data available for analysis immediately. The data record size will be approximately 20 KB. The company is concerned about achieving optimal throughput from each device. Additionally, the company wants to develop a data stream processing application with dedicated throughput for each consumer.
Which solution would achieve this goal?

  • A. Have the app call the PutRecordBatch API to send data to Amazon Kinesis Data Firehose. Submit a support case to enable dedicated throughput on the account.
  • B. Have the app call the PutRecords API to send data to Amazon Kinesis Data Streams. Host the stream- processing application on Amazon EC2 with Auto Scaling.
  • C. Have the app use Amazon Kinesis Producer Library (KPL) to send data to Kinesis Data Firehose. Use the enhanced fan-out feature while consuming the data.
  • D. Have the app call the PutRecords API to send data to Amazon Kinesis Data Streams. Use the enhanced fan-out feature while consuming the data.

正解:D


質問 # 99
A company has developed an Apache Hive script to batch process data stared in Amazon S3. The script needs to run once every day and store the output in Amazon S3. The company tested the script, and it completes within 30 minutes on a small local three-node cluster.
Which solution is the MOST cost-effective for scheduling and executing the script?

  • A. Create an AWS Glue job with the Hive script to perform the batch operation. Configure the job to run once a day using a time-based schedule.
  • B. Use AWS Lambda layers and load the Hive runtime to AWS Lambda and copy the Hive script.
    Schedule the Lambda function to run daily by creating a workflow using AWS Step Functions.
  • C. Use the AWS Management Console to spin up an Amazon EMR cluster with Python Hue. Hive, and Apache Oozie. Set the termination protection flag to true and use Spot Instances for the core nodes of the cluster. Configure an Oozie workflow in the cluster to invoke the Hive script daily.
  • D. Create an AWS Lambda function to spin up an Amazon EMR cluster with a Hive execution step. Set KeepJobFlowAliveWhenNoSteps to false and disable the termination protection flag. Use Amazon CloudWatch Events to schedule the Lambda function to run daily.

正解:A


質問 # 100
......


DAS-C01認定を達成することで、データ分析の専門家にさまざまなキャリア機会を開くことができます。データ駆動型の意思決定に対する需要の増加に伴い、企業はクラウド内のデータを使用するスキルと知識を持つ個人を探しています。 DAS-C01認定を取得することにより、この分野での専門知識を実証し、キャリアの見通しを高めることができます。さらに、この認定は3年間有効です。つまり、今後数年間、潜在的な雇用主にスキルと知識を紹介し続けることができます。

 

合格させるAmazon AWS-Certified-Data-Analytics-Specialty試験問題でテスト復刻エンジンとPDF:https://www.jpntest.com/shiken/AWS-Certified-Data-Analytics-Specialty-mondaishu

2023年最新の実際に出ると確認されたAmazon AWS-Certified-Data-Analytics-Specialty無料試験問題:https://drive.google.com/open?id=1ZJsGCiYH_tYo5Tp3ZJy71U8pIZkeOnnl

弊社を連絡する

我々は12時間以内ですべてのお問い合わせを答えます。

オンラインサポート時間:( UTC+9 ) 9:00-24:00
月曜日から土曜日まで

サポート:現在連絡