Professional-Data-Engineer無料問題集「Google Certified Professional Data Engineer」

質問 1

Your company is implementing a data warehouse using BigQuery, and you have been tasked with designing the data model You move your on-premises sales data warehouse with a star data schema to BigQuery but notice performance issues when querying the data of the past 30 days Based on Google's recommended practices, what should you do to speed up the query without increasing storage costs?

（A）Partition the data by transaction date

（B）Materialize the dimensional data in views

（C）Shard the data by customer ID

（D）Denormalize the data

正解：B 解答を投票する

質問 2

Your company needs to upload their historic data to Cloud Storage. The security rules don't allow access from external IPs to their on-premises resources. After an initial upload, they will add new data from existing on-premises applications every day. What should they do?

（A）Execute gsutil rsync from the on-premises servers.

（B）Write a job template in Cloud Dataproc to perform the data transfer.

（C）Install an FTP server on a Compute Engine VM to receive the files and move them to Cloud Storage.

（D）Use Cloud Dataflow and write the data to Cloud Storage.

正解：D 解答を投票する

質問 3

You are building a new data pipeline to share data between two different types of applications: jobs generators and job runners. Your solution must scale to accommodate increases in usage and must accommodate the addition of new applications without negatively affecting the performance of existing ones.
What should you do?

（A）Create an API using App Engine to receive and send messages to the applications

（B）Create a table on Cloud Spanner, and insert and delete rows with the job information

（C）Create a table on Cloud SQL, and insert and delete rows with the job information

（D）Use a Cloud Pub/Sub topic to publish jobs, and use subscriptions to execute them

正解：A 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 4

You have important legal hold documents in a Cloud Storage bucket. You need to ensure that these documents are not deleted or modified. What should you do?

（A）Enable the Object Versioning feature. Create a copy in a bucket in a different region.

（B）Set a retention policy. Set the default storage class to Archive for long-term digital preservation.

（C）Set a retention policy. Lock the retention policy.

（D）Enable the Object Versioning feature. Add a lifecycle rule.

正解：C 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 5

Your company is migrating their 30-node Apache Hadoop cluster to the cloud. They want to re-use Hadoop jobs they have already created and minimize the management of the cluster as much as possible. They also want to be able to persist data beyond the life of the cluster. What should you do?

（A）Create a Google Cloud Dataproc cluster that uses persistent disks for HDFS.

（B）Create a Hadoop cluster on Google Compute Engine that uses persistent disks.

（C）Create a Cloud Dataproc cluster that uses the Google Cloud Storage connector.

（D）Create a Google Cloud Dataflow job to process the data.

（E）Create a Hadoop cluster on Google Compute Engine that uses Local SSD disks.

正解：C 解答を投票する

質問 6

Which is not a valid reason for poor Cloud Bigtable performance?

（A）The Cloud Bigtable cluster has too many nodes.

（B）The table's schema is not designed correctly.

（C）The workload isn't appropriate for Cloud Bigtable.

（D）There are issues with the network connection.

正解：A 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 7

You need to store and analyze social media postings in Google BigQuery at a rate of 10,000 messages per minute in near real-time. Initially, design the application to use streaming inserts for individual postings. Your application also performs data aggregations right after the streaming inserts. You discover that the queries after streaming inserts do not exhibit strong consistency, and reports from the queries might miss in-flight data. How can you adjust your application design?

（A）Re-write the application to load accumulated data every 2 minutes.

（B）Load the original message to Google Cloud SQL, and export the table every hour to BigQuery via streaming inserts.

（C）Estimate the average latency for data availability after streaming inserts, and always run queries after waiting twice as long.

（D）Convert the streaming insert code to batch load for individual messages.

正解：C 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 8

How can you get a neural network to learn about relationships between categories in a categorical feature?

（A）Create an embedding column

（B）Create a one-hot column

（C）Create a multi-hot column

（D）Create a hash bucket

正解：A 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 9

You are designing a fault-tolerant architecture to store data in a regional BigOuery dataset. You need to ensure that your application is able to recover from a corruption event in your tables that occurred within the past seven days. You want to adopt managed services with the lowest RPO and most cost-effective solution.
What should you do?

（A）Create a BigQuery table snapshot on a daily basis.

（B）Export the data from BigQuery into a new table that excludes the corrupted data.

（C）Migrate your data to multi-region BigQuery buckets.

（D）Access historical data by using time travel in BigQuery.

正解：D 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 10

What is the HBase Shell for Cloud Bigtable?

（A）The HBase shell is a command-line tool that performs only user account management functions to grant access to Cloud Bigtable instances.

（B）The HBase shell is a command-line tool that performs administrative tasks, such as creating and deleting tables.

（C）The HBase shell is a GUI based interface that performs administrative tasks, such as creating and deleting tables.

（D）The HBase shell is a hypervisor based shell that performs administrative tasks, such as creating and deleting new virtualized instances.

正解：B 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 11

You are running a Dataflow streaming pipeline, with Streaming Engine and Horizontal Autoscaling enabled.
You have set the maximum number of workers to 1000. The input of your pipeline is Pub/Sub messages with notifications from Cloud Storage One of the pipeline transforms reads CSV files and emits an element for every CSV line. The Job performance is low. the pipeline is using only 10 workers, and you notice that the autoscaler is not spinning up additional workers. What should you do to improve performance?

（A）Update the job to increase the maximum number of workers.

（B）Change the pipeline code, and introduce a Reshuffle step to prevent fusion.

（C）Use Dataflow Prime, and enable Right Fitting to increase the worker resources.

（D）Enable Vertical Autoscaling to let the pipeline use larger workers.

正解：C 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 12

You work for a mid-sized enterprise that needs to move its operational system transaction data from an on- premises database to GCP. The database is about 20 TB in size. Which database should you choose?

（A）Cloud Bigtable

（B）Cloud SQL

（C）Cloud Datastore

（D）Cloud Spanner

正解：B 解答を投票する

質問 13

Government regulations in the banking industry mandate the protection of client's personally identifiable information (PII). Your company requires PII to be access controlled encrypted and compliant with major data protection standards In addition to using Cloud Data Loss Prevention (Cloud DIP) you want to follow Google-recommended practices and use service accounts to control access to PII. What should you do?

（A）Assign the required identity and Access Management (IAM) roles to every employee, and create a single service account to access protect resources

（B）Use Cloud Storage to comply with major data protection standards. Use one service account shared by all users

（C）Use one service account to access a Cloud SQL database and use separate service accounts for each human user

（D）Use Cloud Storage to comply with major data protection standards. Use multiple service accounts attached to IAM groups to grant the appropriate access to each group

正解：D 解答を投票する

質問 14

Which of the following is not possible using primitive roles?

（A）Give a user viewer access to BigQuery and owner access to Google Compute Engine instances.

（B）Give UserA owner access and UserB editor access for all datasets in a project.

（C）Give GroupA owner access and GroupB editor access for all datasets in a project.

（D）Give a user access to view all datasets in a project, but not run queries on them.

正解：D 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 15

You are working on a sensitive project involving private user data. You have set up a project on Google Cloud Platform to house your work internally. An external consultant is going to assist with coding a complex transformation in a Google Cloud Dataflow pipeline for your project. How should you maintain users' privacy?

（A）Grant the consultant the Viewer role on the project.

（B）Create an anonymized sample of the data for the consultant to work with in a different project.

（C）Create a service account and allow the consultant to log on with it.

（D）Grant the consultant the Cloud Dataflow Developer role on the project.

正解：C 解答を投票する

質問 16

You work for a large financial institution that is planning to use Dialogflow to create a chatbot for the company's mobile app You have reviewed old chat logs and lagged each conversation for intent based on each customer's stated intention for contacting customer service About 70% of customer requests are simple requests that are solved within 10 intents The remaining 30% of inquiries require much longer, more complicated requests Which intents should you automate first?

（A）Automate intents in places where common words such as "payment" appear only once so the software isn't confused

（B）Automate a blend of the shortest and longest intents to be representative of all intents

（C）Automate the 10 intents that cover 70% of the requests so that live agents can handle more complicated requests

（D）Automate the more complicated requests first because those require more of the agents' time

正解：C 解答を投票する

質問 17

Your company maintains a hybrid deployment with GCP, where analytics are performed on your anonymized customer data. The data are imported to Cloud Storage from your data center through parallel uploads to a data transfer server running on GCP. Management informs you that the daily transfers take too long and have asked you to fix the problem. You want to maximize transfer speeds. Which action should you take?

（A）Increase the size of the Google Persistent Disk on your server.

（B）Increase your network bandwidth from your datacenter to GCP.

（C）Increase the CPU size on your server.

（D）Increase your network bandwidth from Compute Engine to Cloud Storage.

正解：B 解答を投票する

質問 18

You have one BigQuery dataset which includes customers' street addresses. You want to retrieve all occurrences of street addresses from the dataset. What should you do?

（A）Create a discovery scan configuration on your organization with Cloud Data Loss Prevention and create an inspection template that includes the STREET_ADDRESS infoType.

（B）Create a deep inspection job on each table in your dataset with Cloud Data Loss Prevention and create an inspection template that includes the STREET_ADDRESS infoType.

（C）Write a SQL query in BigQuery by using REGEXP_CONTAINS on all tables in your dataset to find rows where the word "street" appears.

（D）Create a de-identification job in Cloud Data Loss Prevention and use the masking transformation.

正解：B 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 19

You are migrating your data warehouse to BigQuery. You have migrated all of your data into tables in a dataset. Multiple users from your organization will be using the data. They should only see certain tables based on their team membership. How should you set user permissions?

（A）Create SQL views for each team in the same dataset in which the data resides, and assign the users
/groups data viewer access to the SQL views

（B）Create authorized views for each team in the same dataset in which the data resides, and assign the users
/groups data viewer access to the authorized views

（C）Assign the users/groups data viewer access at the table level for each table

（D）Create authorized views for each team in datasets created for each team. Assign the authorized views data viewer access to the dataset in which the data resides. Assign the users/groups data viewer access to the datasets in which the authorized views reside

正解：C 解答を投票する

Professional-Data-Engineer 無料問題集「Google Certified Professional Data Engineer」

弊社を連絡する

関連リンク

トップ試験