Professional-Data-Engineer無料問題集「Google Certified Professional Data Engineer」

質問 1

Your company's customer and order databases are often under heavy load. This makes performing analytics against them difficult without harming operations. The databases are in a MySQL cluster, with nightly backups taken using mysqldump. You want to perform analytics with minimal impact on operations. What should you do?

（A）Connect an on-premises Apache Hadoop cluster to MySQL and perform ETL.

（B）Use an ETL tool to load the data from MySQL into Google BigQuery.

（C）Add a node to the MySQL cluster and build an OLAP cube there.

（D）Mount the backups to Google Cloud SQL, and then process the data using Google Cloud Dataproc.

正解：A 解答を投票する

質問 2

Your company maintains a hybrid deployment with GCP, where analytics are performed on your anonymized customer data. The data are imported to Cloud Storage from your data center through parallel uploads to a data transfer server running on GCP. Management informs you that the daily transfers take too long and have asked you to fix the problem. You want to maximize transfer speeds. Which action should you take?

（A）Increase the size of the Google Persistent Disk on your server.

（B）Increase your network bandwidth from your datacenter to GCP.

（C）Increase the CPU size on your server.

（D）Increase your network bandwidth from Compute Engine to Cloud Storage.

正解：B 解答を投票する

質問 3

Which of the following is NOT a valid use case to select HDD (hard disk drives) as the storage for Google Cloud Bigtable?

（A）You will not use the data to back a user-facing or latency-sensitive application.

（B）You will mostly run batch workloads with scans and writes, rather than frequently executing random reads of a small number of rows.

（C）You need to integrate with Google BigQuery.

（D）You expect to store at least 10 TB of data.

正解：C 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 4

You are building new real-time data warehouse for your company and will use Google BigQuery streaming inserts. There is no guarantee that data will only be sent in once but you do have a unique ID for each row of data and an event timestamp. You want to ensure that duplicates are not included while interactively querying data. Which query type should you use?

（A）Use the LAG window function with PARTITION by unique ID along with WHERE LAG IS NOT NULL.

（B）Use the ROW_NUMBER window function with PARTITION by unique ID along with WHERE row equals 1.

（C）Use GROUP BY on the unique ID column and timestamp column and SUM on the values.

（D）Include ORDER BY DESK on timestamp column and LIMIT to 1.

正解：B 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 5

Which of the following job types are supported by Cloud Dataproc (select 3 answers)?

（A）Spark

（B）Pig

（C）Hive

（D）YARN

正解：A、B、C 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 6

Your globally distributed auction application allows users to bid on items. Occasionally, users place identical bids at nearly identical times, and different application servers process those bids. Each bid event contains the item, amount, user, and timestamp. You want to collate those bid events into a single location in real time to determine which user bid first. What should you do?

（A）Have each application server write the bid events to Google Cloud Pub/Sub as they occur. Use a pull subscription to pull the bid events using Google Cloud Dataflow. Give the bid for each item to the user in the bid event that is processed first.

（B）Create a file on a shared file and have the application servers write all bid events to that file. Process the file with Apache Hadoop to identify which user bid first.

（C）Have each application server write the bid events to Cloud Pub/Sub as they occur. Push the events from Cloud Pub/Sub to a custom endpoint that writes the bid event information into Cloud SQL.

（D）Set up a MySQL database for each application server to write bid events into. Periodically query each of those distributed MySQL databases and update a master MySQL database with bid event information.

正解：D 解答を投票する

質問 7

What is the HBase Shell for Cloud Bigtable?

（A）The HBase shell is a command-line tool that performs only user account management functions to grant access to Cloud Bigtable instances.

（B）The HBase shell is a command-line tool that performs administrative tasks, such as creating and deleting tables.

（C）The HBase shell is a GUI based interface that performs administrative tasks, such as creating and deleting tables.

（D）The HBase shell is a hypervisor based shell that performs administrative tasks, such as creating and deleting new virtualized instances.

正解：B 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 8

You want to use a BigQuery table as a data sink. In which writing mode(s) can you use BigQuery as a sink?

（A）Both batch and streaming

（B）Only batch

（C）BigQuery cannot be used as a sink

（D）Only streaming

正解：A 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 9

You have designed an Apache Beam processing pipeline that reads from a Pub/Sub topic. The topic has a message retention duration of one day, and writes to a Cloud Storage bucket. You need to select a bucket location and processing strategy to prevent data loss in case of a regional outage with an RPO of 15 minutes.
What should you do?

（A）1 Use a multi-regional Cloud Storage bucket
2 Monitor Dataflow metrics with Cloud Monitoring to determine when an outage occurs
3 Seek the subscription back in time by 60 minutes to recover the acknowledged messages
4 Start the Dataflow job in a secondary region

（B）1 Use a regional Cloud Storage bucket
2 Monitor Dataflow metrics with Cloud Monitoring to determine when an outage occurs
3 Seek the subscription back in time by one day to recover the acknowledged messages
4 Start the Dataflow job in a secondary region and write in a bucket in the same region

（C）1. Use a dual-region Cloud Storage bucket.
2. Monitor Dataflow metrics with Cloud Monitoring to determine when an outage occurs
3 Seek the subscription back in time by 15 minutes to recover the acknowledged messages
4 Start the Dataflow job in a secondary region

（D）1. Use a dual-region Cloud Storage bucket with turbo replication enabled
2 Monitor Dataflow metrics with Cloud Monitoring to determine when an outage occurs
3 Seek the subscription back in time by 60 minutes to recover the acknowledged messages
4 Start the Dataflow job in a secondary region.

正解：C 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 10

Suppose you have a dataset of images that are each labeled as to whether or not they contain a human face. To create a neural network that recognizes human faces in images using this labeled dataset, what approach would likely be the most effective?

（A）Use deep learning by creating a neural network with multiple hidden layers to automatically detect features of faces.

（B）Use K-means Clustering to detect faces in the pixels.

（C）Use feature engineering to add features for eyes, noses, and mouths to the input data.

（D）Build a neural network with an input layer of pixels, a hidden layer, and an output layer with two categories.

正解：A 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 11

Which of these operations can you perform from the BigQuery Web UI?

（A）Load data with nested and repeated fields.

（B）Upload a 20 MB file.

（C）Upload multiple files using a wildcard.

（D）Upload a file in SQL format.

正解：A 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 12

You are designing a data processing pipeline. The pipeline must be able to scale automatically as load increases. Messages must be processed at least once, and must be ordered within windows of 1 hour. How should you design the solution?

（A）Use Cloud Pub/Sub for message ingestion and Cloud Dataflow for streaming analysis.

（B）Use Cloud Pub/Sub for message ingestion and Cloud Dataproc for streaming analysis.

（C）Use Apache Kafka for message ingestion and use Cloud Dataproc for streaming analysis.

（D）Use Apache Kafka for message ingestion and use Cloud Dataflow for streaming analysis.

正解：A 解答を投票する

質問 13

Which of these are examples of a value in a sparse vector? (Select 2 answers.)

（A）[0, 0, 0, 1, 0, 0, 1]

（B）[0, 1]

（C）[0, 5, 0, 0, 0, 0]

（D）[1, 0, 0, 0, 0, 0, 0]

正解：B、D 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 14

You need to choose a database for a new project that has the following requirements:
* Fully managed
* Able to automatically scale up
* Transactionally consistent
* Able to scale up to 6 TB
* Able to be queried using SQL
Which database do you choose?

（A）Cloud Bigtable

（B）Cloud SQL

（C）Cloud Datastore

（D）Cloud Spanner

正解：D 解答を投票する

質問 15

You want to schedule a number of sequential load and transformation jobs Data files will be added to a Cloud Storage bucket by an upstream process There is no fixed schedule for when the new data arrives Next, a Dataproc job is triggered to perform some transformations and write the data to BigQuery. You then need to run additional transformation jobs in BigQuery The transformation jobs are different for every table These jobs might take hours to complete You need to determine the most efficient and maintainable workflow to process hundreds of tables and provide the freshest data to your end users. What should you do?

（A）1Create an Apache Airflow directed acyclic graph (DAG) in Cloud Composer with sequential tasks by using the Cloud Storage. Dataproc. and BigQuery operators
2 Use a single shared DAG for all tables that need to go through the pipeline
3 Schedule the DAG to run hourly

（B）1 Create an Apache Airflow directed acyclic graph (DAG) in Cloud Composer with sequential tasks by using the Dataproc and BigQuery operators
2 Use a single shared DAG for all tables that need to go through the pipeline.
3 Use a Cloud Storage object trigger to launch a Cloud Function that triggers the DAG

（C）1 Create an Apache Airflow directed acyclic graph (DAG) in Cloud Composer with sequential tasks by using the Cloud Storage, Dataproc. and BigQuery operators
2 Create a separate DAG for each table that needs to go through the pipeline
3 Schedule the DAGs to run hourly

（D）1 Create an Apache Airflow directed acyclic graph (DAG) in Cloud Composer with sequential tasks by using the Dataproc and BigQuery operators.
2 Create a separate DAG for each table that needs to go through the pipeline
3 Use a Cloud Storage object trigger to launch a Cloud Function that triggers the DAG

正解：D 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 16

A TensorFlow machine learning model on Compute Engine virtual machines (n2-standard -32) takes two days to complete framing. The model has custom TensorFlow operations that must run partially on a CPU You want to reduce the training time in a cost-effective manner. What should you do?

（A）Train the model using a VM with a GPU hardware accelerator

（B）Change the VM type to n2-highmem-32

（C）Change the VM type to e2 standard-32

（D）Train the model using a VM with a TPU hardware accelerator

正解：A 解答を投票する

質問 17

You have Cloud Functions written in Node.js that pull messages from Cloud Pub/Sub and send the data to BigQuery. You observe that the message processing rate on the Pub/Sub topic is orders of magnitude higher than anticipated, but there is no error logged in Stackdriver Log Viewer. What are the two most likely causes of this problem? Choose 2 answers.

（A）The subscriber code cannot keep up with the messages.

（B）The subscriber code does not acknowledge the messages that it pulls.

（C）Publisher throughput quota is too small.

（D）Total outstanding messages exceed the 10-MB maximum.

（E）Error handling in the subscriber code is not handling run-time errors properly.

正解：A、E 解答を投票する

Professional-Data-Engineer 無料問題集「Google Certified Professional Data Engineer」

弊社を連絡する

関連リンク

トップ試験