Databricks-Certified-Professional-Data-Engineer無料問題集「Databricks Certified Professional Data Engineer」

質問 1

A CHECK constraint has been successfully added to the Delta table named activity_details using the following logic:

A batch job is attempting to insert new records to the table, including a record where latitude = 45.50 and longitude = 212.67.
Which statement describes the outcome of this batch insert?

（A）The write will insert all records except those that violate the table constraints; the violating records will be reported in a warning log.

（B）The write will include all records in the target table; any violations will be indicated in the boolean column named valid_coordinates.

（C）The write will fail when the violating record is reached; any records previously processed will be recorded to the target table.

（D）The write will fail completely because of the constraint violation and no records will be inserted into the target table.

（E）The write will insert all records except those that violate the table constraints; the violating records will be recorded to a quarantine table.

正解：D 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 2

A data engineer has created a transactions Delta table on Databricks that should be used by the analytics team.
The analytics team wants to use the table with another tool that requires Apache Iceberg format.
What should the data engineer do?

（A）Create an Iceberg copy of the transactions Delta table which can be used by the analytics team.

（B）Require the analytics team to use a tool that supports Delta table.

（C）Convert the transactions Delta table to Iceberg and enable uniform so that the table can be read as a Delta table.

（D）Enable uniform on the transactions table to 'iceberg' so that the table can be read as an Iceberg table.

正解：C 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 3

Which statement describes the correct use of pyspark.sql.functions.broadcast?

（A）It marks a column as having low enough cardinality to properly map distinct values to available partitions, allowing a broadcast join.

（B）It marks a column as small enough to store in memory on all executors, allowing a broadcast join.

（C）It marks a DataFrame as small enough to store in memory on all executors, allowing a broadcast join.

（D）It caches a copy of the indicated table on attached storage volumes for all active clusters within a Databricks workspace.

（E）It caches a copy of the indicated table on all nodes in the cluster for use in all future queries during the cluster lifetime.

正解：C 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 4

A transactions table has been liquid clustered on the columns product_id, user_id, and event_date.
Which operation lacks support for cluster on write?

（A）INSERT INTO operations

（B）spark.writestream.format('delta').mode('append')

（C）CTAS and RTAS statements

（D）spark.write.format('delta').mode('append')

正解：B 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 5

The marketing team is looking to share data in an aggregate table with the sales organization, but the field names used by the teams do not match, and a number of marketing specific fields have not been approval for the sales org.
Which of the following solutions addresses the situation while emphasizing simplicity?

（A）Use a CTAS statement to create a derivative table from the marketing table configure a production jon to propagation changes.

（B）Add a parallel table write to the current production pipeline, updating a new sales table that varies as required from marketing table.

（C）Create a new table with the required schema and use Delta Lake's DEEP CLONE functionality to sync up changes committed to one table to the corresponding table.

（D）Create a view on the marketing table selecting only these fields approved for the sales team alias the names of any fields that should be standardized to the sales naming conventions.

正解：D 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 6

A data engineer is configuring a pipeline that will potentially see late-arriving, duplicate records.
In addition to de-duplicating records within the batch, which of the following approaches allows the data engineer to deduplicate data against previously processed records as it is inserted into a Delta table?

（A）Rely on Delta Lake schema enforcement to prevent duplicate records.

（B）Perform a full outer join on a unique key and overwrite existing data.

（C）Perform an insert-only merge with a matching condition on a unique key.

（D）Set the configuration delta.deduplicate = true.

（E）VACUUM the Delta table after each batch completes.

正解：C 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 7

What is true for Delta Lake?

（A）Z-ORDERcan only be applied to numeric values stored in Delta Lake tables.

（B）Delta Lake automatically collects statistics on the first 32 columns of each table, which are leveraged in data skipping based on query filters.

（C）Primary and foreign key constraints can be leveraged to ensure duplicate values are never entered into a dimension table.

（D）Views in the Lakehouse maintain a valid cache of the most recent versions of source tables at all times.

正解：B 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 8

A distributed team of data analysts share computing resources on an interactive cluster with autoscaling configured. In order to better manage costs and query throughput, the workspace administrator is hoping to evaluate whether cluster upscaling is caused by many concurrent users or resource-intensive queries.
In which location can one review the timeline for cluster resizing events?

（A）Ganglia

（B）Workspace audit logs

（C）Executor's log file

（D）Cluster Event Log

（E）Driver's log file

正解：D 解答を投票する

質問 9

The following table consists of items found in user carts within an e-commerce website.

The following MERGE statement is used to update this table using an updates view, with schema evaluation enabled on this table.

How would the following update be handled?

（A）The new restored field is added to the target schema, and dynamically read as NULL for existing unmatched records.

（B）The new nested field is added to the target schema, and files underlying existing records are updated to include NULL values for the new field.

（C）The update throws an error because changes to existing columns in the target schema are not supported.

（D）The update is moved to separate ''restored'' column because it is missing a column expected in the target schema.

正解：B 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 10

A Data engineer wants to run unit's tests using common Python testing frameworks on python functions defined across several Databricks notebooks currently used in production.
How can the data engineer run unit tests against function that work with data in production?

（A）Define units test and functions within the same notebook

（B）Run unit tests against non-production data that closely mirrors production

（C）Define and unit test functions using Files in Repos

（D）Define and import unit test functions from a separate Databricks notebook

正解：B 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

Databricks-Certified-Professional-Data-Engineer 無料問題集「Databricks Certified Professional Data Engineer」

弊社を連絡する

関連リンク

トップ試験