
AWS-Certified-Data-Analytics-Specialty練習Amazon高合格率回答あなたを試験は高確率で合格させます![2024]
最高の方法からパスAWS Certified Data AnalyticsのAWS-Certified-Data-Analytics-Specialty試験合格させます
AWS Certified Data Analytics - Specialty (DAS-C01) 認定試験は、AWS プラットフォーム上でのデータ分析のスキルと知識を検証したいプロフェッショナルを対象としています。データのストレージ、処理、分析、可視化、セキュリティなど、幅広いトピックが試験範囲に含まれます。この認定試験は、大量のデータを扱い、クラウド上でデータを管理および分析するスキルを向上させたいプロフェッショナルに最適です。
質問 # 39
An ecommerce company stores customer purchase data in Amazon RDS. The company wants a solution to store and analyze historical dat a. The most recent 6 months of data will be queried frequently for analytics workloads. This data is several terabytes large. Once a month, historical data for the last 5 years must be accessible and will be joined with the more recent data. The company wants to optimize performance and cost.
Which storage solution will meet these requirements?
- A. Create a read replica of the RDS database to store the most recent 6 months of data. Copy the historical data into Amazon S3. Create an AWS Glue Data Catalog of the data in Amazon S3 and Amazon RDS. Run historical queries using Amazon Athena.
- B. Incrementally copy data from Amazon RDS to Amazon S3. Load and store the most recent 6 months of data in Amazon Redshift. Configure an Amazon Redshift Spectrum table to connect to all historical data.
- C. Use an ETL tool to incrementally load the most recent 6 months of data into an Amazon Redshift cluster. Run more frequent queries against this cluster. Create a read replica of the RDS database to run queries on the historical data.
- D. Incrementally copy data from Amazon RDS to Amazon S3. Create an AWS Glue Data Catalog of the data in Amazon S3. Use Amazon Athena to query the data.
正解:B
解説:
Section: (none)
Explanation
質問 # 40
A manufacturing company uses Amazon Connect to manage its contact center and Salesforce to manage its customer relationship management (CRM) data. The data engineering team must build a pipeline to ingest data from the contact center and CRM system into a data lake that is built on Amazon S3.
What is the MOST efficient way to collect data in the data lake with the LEAST operational overhead?
- A. Use Amazon Kinesis Data Firehose to ingest Amazon Connect data and Amazon Kinesis Data Streams to ingest Salesforce data.
- B. Use Amazon Kinesis Data Firehose to ingest Amazon Connect data and Amazon AppFlow to ingest Salesforce data.
- C. Use Amazon AppFlow to ingest Amazon Connect data and Amazon Kinesis Data Firehose to ingest Salesforce data.
- D. Use Amazon Kinesis Data Streams to ingest Amazon Connect data and Amazon AppFlow to ingest Salesforce data.
正解:A
質問 # 41
An ecommerce company is migrating its business intelligence environment from on premises to the AWS Cloud. The company will use Amazon Redshift in a public subnet and Amazon QuickSight. The tables already are loaded into Amazon Redshift and can be accessed by a SQL tool.
The company starts QuickSight for the first time. During the creation of the data source, a data analytics specialist enters all the information and tries to validate the connection. An error with the following message occurs: "Creating a connection to your data source timed out." How should the data analytics specialist resolve this error?
- A. Add the QuickSight IP address range into the Amazon Redshift security group.
- B. Grant the SELECT permission on Amazon Redshift tables.
- C. Use a QuickSight admin user for creating the dataset.
- D. Create an IAM role for QuickSight to access Amazon Redshift.
正解:B
解説:
Connection to the database times out
Your client connection to the database appears to hang or time out when running long queries, such as a COPY command. In this case, you might observe that the Amazon Redshift console displays that the query has completed, but the client tool itself still appears to be running the query. The results of the query might be missing or incomplete depending on when the connection stopped.
質問 # 42
A company leverages Amazon Athena for ad-hoc queries against data stored in Amazon S3. The company wants to implement additional controls to separate query execution and query history among users, teams, or applications running in the same AWS account to comply with internal security policies.
Which solution meets these requirements?
- A. Create an IAM role for each given use case, assign appropriate permissions to the role for the given use case, and add the role to associate the role with Athena.
- B. Create an S3 bucket for each given use case, create an S3 bucket policy that grants permissions to appropriate individual IAM users. and apply the S3 bucket policy to the S3 bucket.
- C. Create an AWS Glue Data Catalog resource policy for each given use case that grants permissions to appropriate individual IAM users, and apply the resource policy to the specific tables used by Athena.
- D. Create an Athena workgroup for each given use case, apply tags to the workgroup, and create an IAM policy using the tags to apply appropriate permissions to the workgroup.
正解:A
質問 # 43
A financial services company needs to aggregate daily stock trade data from the exchanges into a data store.
The company requires that data be streamed directly into the data store, but also occasionally allows data to be modified using SQL. The solution should integrate complex, analytic queries running with minimal latency.
The solution must provide a business intelligence dashboard that enables viewing of the top contributors to anomalies in stock prices.
Which solution meets the company's requirements?
- A. Use Amazon Kinesis Data Streams to stream data to Amazon S3. Use Amazon Athena as a data source for Amazon QuickSight to create a business intelligence dashboard.
- B. Use Amazon Kinesis Data Streams to stream data to Amazon Redshift. Use Amazon Redshift as a data source for Amazon QuickSight to create a business intelligence dashboard.
- C. Use Amazon Kinesis Data Firehose to stream data to Amazon S3. Use Amazon Athena as a data source for Amazon QuickSight to create a business intelligence dashboard.
- D. Use Amazon Kinesis Data Firehose to stream data to Amazon Redshift. Use Amazon Redshift as a data source for Amazon QuickSight to create a business intelligence dashboard.
正解:A
質問 # 44
A company wants to run analytics on its Elastic Load Balancing logs stored in Amazon S3. A data analyst needs to be able to query all data from a desired year, month, or day. The data analyst should also be able to query a subset of the columns. The company requires minimal operational overhead and the most cost-effective solution.
Which approach meets these requirements for optimizing and querying the log data?
- A. Launch a transient Amazon EMR cluster nightly to transform new log files into Apache ORC format and partition by year, month, and day. Use Amazon Redshift Spectrum to query the data.
- B. Use an AWS Glue job nightly to transform new log files into Apache Parquet format and partition by year, month, and day. Use AWS Glue crawlers to detect new partitions. Use Amazon Athena to query data.
- C. Launch a long-running Amazon EMR cluster that continuously transforms new log files from Amazon S3 into its Hadoop Distributed File System (HDFS) storage and partitions by year, month, and day. Use Apache Presto to query the optimized format.
- D. Use an AWS Glue job nightly to transform new log files into .csv format and partition by year, month, and day. Use AWS Glue crawlers to detect new partitions. Use Amazon Athena to query data.
正解:A
質問 # 45
A global pharmaceutical company receives test results for new drugs from various testing facilities worldwide.
The results are sent in millions of 1 KB-sized JSON objects to an Amazon S3 bucket owned by the company.
The data engineering team needs to process those files, convert them into Apache Parquet format, and load them into Amazon Redshift for data analysts to perform dashboard reporting. The engineering team uses AWS Glue to process the objects, AWS Step Functions for process orchestration, and Amazon CloudWatch for job scheduling.
More testing facilities were recently added, and the time to process files is increasing.
What will MOST efficiently decrease the data processing time?
- A. Use the Amazon Redshift COPY command to move the files from Amazon S3 into Amazon Redshift tables directly. Process the files in Amazon Redshift.
- B. Use Amazon EMR instead of AWS Glue to group the small input files. Process the files in Amazon EMR and load them into Amazon Redshift tables.
- C. Use the AWS Glue dynamic frame file grouping option while ingesting the raw input files. Process the files and load them into Amazon Redshift tables.
- D. Use AWS Lambda to group the small files into larger files. Write the files back to Amazon S3. Process the files using AWS Glue and load them into Amazon Redshift tables.
正解:D
質問 # 46
A company wants to improve the data load time of a sales data dashboard. Data has been collected as .csv files and stored within an Amazon S3 bucket that is partitioned by date. The data is then loaded to an Amazon Redshift data warehouse for frequent analysis. The data volume is up to 500 GB per day.
Which solution will improve the data loading performance?
- A. Compress .csv files and use an INSERT statement to ingest data into Amazon Redshift.
- B. Split large .csv files, then use a COPY command to load data into Amazon Redshift.
- C. Use Amazon Kinesis Data Firehose to ingest data into Amazon Redshift.
- D. Load the .csv files in an unsorted key order and vacuum the table in Amazon Redshift.
正解:B
解説:
Explanation
https://docs.aws.amazon.com/redshift/latest/dg/c_loading-data-best-practices.html
質問 # 47
A large retailer has successfully migrated to an Amazon S3 data lake architecture. The company's marketing team is using Amazon Redshift and Amazon QuickSight to analyze data, and derive and visualize insights. To ensure the marketing team has the most up-to-date actionable information, a data analyst implements nightly refreshes of Amazon Redshift using terabytes of updates from the previous day.
After the first nightly refresh, users report that half of the most popular dashboards that had been running correctly before the refresh are now running much slower. Amazon CloudWatch does not show any alerts.
What is the MOST likely cause for the performance degradation?
- A. The nightly data refreshes left the dashboard tables in need of a vacuum operation that could not be automatically performed by Amazon Redshift due to ongoing user workloads.
- B. The cluster is undersized for the queries being run by the dashboards.
- C. The dashboards are suffering from inefficient SQL queries.
- D. The nightly data refreshes are causing a lingering transaction that cannot be automatically closed by Amazon Redshift due to ongoing user workloads.
正解:A
解説:
Explanation
https://github.com/awsdocs/amazon-redshift-developer-guide/issues/21
質問 # 48
A marketing company is using Amazon EMR clusters for its workloads. The company manually installs third- party libraries on the clusters by logging in to the master nodes. A data analyst needs to create an automated solution to replace the manual process.
Which options can fulfill these requirements? (Choose two.)
- A. Place the required installation scripts in Amazon S3 and execute them through Apache Spark in Amazon EMR.
- B. Place the required installation scripts in Amazon S3 and execute them using custom bootstrap actions.
- C. Install the required third-party libraries in the existing EMR master node. Create an AMI out of that master node and use that custom AMI to re-create the EMR cluster.
- D. Use an Amazon DynamoDB table to store the list of required applications. Trigger an AWS Lambda function with DynamoDB Streams to install the software.
- E. Launch an Amazon EC2 instance with Amazon Linux and install the required third-party libraries on the instance. Create an AMI and use that AMI to create the EMR cluster.
正解:B、E
解説:
https://aws.amazon.com/about-aws/whats-new/2017/07/amazon-emr-now-supports-launching-clusters-with-custom-amazon-linux-amis/ https://docs.aws.amazon.com/de_de/emr/latest/ManagementGuide/emr-plan-bootstrap.html
質問 # 49
A data analyst notices the following error message while loading data to an Amazon Redshift cluster:
"The bucket you are attempting to access must be addressed using the specified endpoint." What should the data analyst do to resolve this issue?
- A. Change the Amazon S3 object's ACL to grant the S3 bucket owner full control of the object.
- B. Specify the correct AWS Region for the Amazon S3 bucket by using the REGION option with the COPY command.
- C. Configure the timeout settings according to the operating system used to connect to the Redshift cluster.
- D. Launch the Redshift cluster in a VPC.
正解:B
解説:
The correct answer is
A Specify the correct AWS Region for the Amazon S3 bucket by using the REGION option with the COPY command.
The error message indicates that the Amazon S3 bucket and the Redshift cluster are not in the same region. To load data from a different region, the COPY command needs to specify the source region using the REGION option. For example, if the Redshift cluster is in US East (N. Virginia) and the S3 bucket is in Asia Pacific (Mumbai), the COPY command should include REGION 'ap-south-1'. This option tells Redshift to use the appropriate endpoint to access the S3 bucket. For more information, see Copy command options and COPY - Amazon Redshift.
質問 # 50
A company is hosting an enterprise reporting solution with Amazon Redshift. The application provides reporting capabilities to three main groups: an executive group to access financial reports, a data analyst group to run long-running ad-hoc queries, and a data engineering group to run stored procedures and ETL processes.
The executive team requires queries to run with optimal performance. The data engineering team expects queries to take minutes.
Which Amazon Redshift feature meets the requirements for this task?
- A. Workload management (WLM)
- B. Short query acceleration (SQA)
- C. Materialized views
- D. Concurrency scaling
正解:C
解説:
Explanation
Materialized views:
質問 # 51
A company stores Apache Parquet-formatted files in Amazon S3 The company uses an AWS Glue Data Catalog to store the table metadata and Amazon Athena to query and analyze the data The tables have a large number of partitions The queries are only run on small subsets of data in the table A data analyst adds new time partitions into the table as new data arrives The data analyst has been asked to reduce the query runtime Which solution will provide the MOST reduction in the query runtime?
- A. Convert the Parquet files to the csv file format..Then attempt to query the data again
- B. Use partition projection to speed up the processing of the partitioned table
- C. Convert the Parquet files to the Apache ORC file format. Then attempt to query the data again
- D. Add more partitions to be used over the table. Then filter over two partitions and put all columns in the WHERE clause
正解:B
質問 # 52
A manufacturing company uses Amazon S3 to store its data. The company wants to use AWS Lake Formation to provide granular-level security on those data assets. The data is in Apache Parquet format. The company has set a deadline for a consultant to build a data lake.
How should the consultant create the MOST cost-effective solution that meets these requirements?
- A. Install Apache Ranger on an Amazon EC2 instance and integrate with Amazon EMR. Using Ranger policies, create role-based access control for the existing data assets in Amazon S3.
- B. Run Lake Formation blueprints to move the data to Lake Formation. Once Lake Formation has the data, apply permissions on Lake Formation.
- C. Create multiple IAM roles for different users and groups. Assign IAM roles to different data assets in Amazon S3 to create table-based and column-based access controls.
- D. To create the data catalog, run an AWS Glue crawler on the existing Parquet data. Register the Amazon S3 path and then apply permissions through Lake Formation to provide granular-level security.
正解:B
解説:
Explanation
https://aws.amazon.com/blogs/big-data/building-securing-and-managing-data-lakes-with-aws-lake-formation/
質問 # 53
Three teams of data analysts use Apache Hive on an Amazon EMR cluster with the EMR File System (EMRFS) to query data stored within each teams Amazon S3 bucket. The EMR cluster has Kerberos enabled and is configured to authenticate users from the corporate Active Directory. The data is highly sensitive, so access must be limited to the members of each team.
Which steps will satisfy the security requirements?
- A. For the EMR cluster Amazon EC2 instances, create a service role that grants full access to Amazon S3.
Create three additional IAM roles, each granting access to each team's specific bucket. Add the service role for the EMR cluster EC2 instances to the trust polices for the additional IAM roles. Create a security configuration mapping for the additional IAM roles to Active Directory user groups for each team. - B. For the EMR cluster Amazon EC2 instances, create a service role that grants no access to Amazon S3.
Create three additional IAM roles, each granting access to each team's specific bucket. Add the service role for the EMR cluster EC2 instances to the trust policies for the additional IAM roles. Create a security configuration mapping for the additional IAM roles to Active Directory user groups for each team. - C. For the EMR cluster Amazon EC2 instances, create a service role that grants full access to Amazon S3.
Create three additional IAM roles, each granting access to each team's specific bucket. Add the service role for the EMR cluster EC2 instances to the trust polices for the base IAM roles. Create a security configuration mapping for the additional IAM roles to Active Directory user groups for each team. - D. For the EMR cluster Amazon EC2 instances, create a service role that grants no access to Amazon S3.
Create three additional IAM roles, each granting access to each team's specific bucket. Add the additional IAM roles to the cluster's EMR role for the EC2 trust policy. Create a security configuration mapping for the additional IAM roles to Active Directory user groups for each team.
正解:A
質問 # 54
An Amazon Redshift database contains sensitive user data. Logging is necessary to meet compliance requirements. The logs must contain database authentication attempts, connections, and disconnections. The logs must also contain each query run against the database and record which database user ran each query.
Which steps will create the required logs?
- A. Enable audit logging for Amazon Redshift using the AWS Management Console or the AWS CLI.
- B. Allow access to the Amazon Redshift database using AWS IAM only. Log access using AWS CloudTrail.
- C. Enable Amazon Redshift Enhanced VPC Routing. Enable VPC Flow Logs to monitor traffic.
- D. Enable and download audit reports from AWS Artifact.
正解:A
質問 # 55
A company analyzes historical data and needs to query data that is stored in Amazon S3. New data is generated daily as .csv files that are stored in Amazon S3. The company's data analysts are using Amazon Athena to perform SQL queries against a recent subset of the overall data.
The amount of data that is ingested into Amazon S3 has increased to 5 PB over time. The query latency also has increased. The company needs to segment the data to reduce the amount of data that is scanned.
Which solutions will improve query performance? (Select TWO.)
- A. Create the data subset in Apache Parquet format each day by using the Athena CREATE TABLE AS SELECT (CTAS) statement. Query the Parquet data.
- B. Create an S3 gateway endpoint. Configure VPC routing to access Amazon S3 through the gateway endpoint.
- C. Configure Athena to use S3 Select to load only the files of the data subset.
- D. Use MySQL Workbench on an Amazon EC2 instance. Connect to Athena by using a JDBC connector. Run the query from MySQL Workbench instead of Athena directly.
- E. Run a daily AWS Glue ETL job to convert the data files to Apache Parquet format and to partition the converted files. Create a periodic AWS Glue crawler to automatically crawl the partitioned data each day.
正解:A、C
解説:
This solution will improve query performance because:
Apache Parquet is a columnar storage format that is optimized for analytics and supports compression1. Parquet files can reduce the amount of data scanned and transferred by Athena, thus improving performance and reducing cost1.
The Athena CREATE TABLE AS SELECT (CTAS) statement allows you to create a new table from the results of a SELECT query2. You can use this statement to convert the CSV files to Parquet format and store them in a different location in S32. You can also specify partitioning keys for the new table, which can further improve query performance by filtering out irrelevant data2.
Querying the Parquet data will be faster and cheaper than querying the CSV data, as Parquet files are more efficient for analytical queries1.
C) Run a daily AWS Glue ETL job to convert the data files to Apache Parquet format and to partition the converted files. Create a periodic AWS Glue crawler to automatically crawl the partitioned data each day.
This solution will improve query performance because:
AWS Glue is a fully managed extract, transform, and load (ETL) service that can be used to prepare and load data for analytics3. You can use AWS Glue to create a job that copies the CSV files from the source S3 bucket to a new S3 bucket, and converts them to Apache Parquet format3.
質問 # 56
......
AWS Certified Data Analytics - Specialty(DAS-C01)認定を取得することは、候補者がAWSプラットフォーム上で分析ソリューションを設計、構築、セキュアに保ち、維持するために必要な知識やスキルを持っていることを示します。この認定は、データ分析のキャリアを前進させ、AWSの顧客と協力してデータ駆動型のソリューションを提供したいと考える専門家にとって有益です。
Amazon AWS-Certified-Data-Analytics-Specialty事前に試験練習テストJPNTest: :https://www.jpntest.com/shiken/AWS-Certified-Data-Analytics-Specialty-mondaishu
AWS-Certified-Data-Analytics-Specialty練習テスト問題回答解釈::https://drive.google.com/open?id=1XgCvywKxiMQWesEhQXkqsr24KHQw7w1P