Databricks-Certified-Professional-Data-Engineer無料問題集「Databricks Certified Professional Data Engineer」

質問 1

A Delta Lake table representing metadata about content posts from users has the following schema:
user_id LONG, post_text STRING, post_id STRING, longitude FLOAT, latitude FLOAT, post_time TIMESTAMP, date DATE This table is partitioned by the date column. A query is run with the following filter:
longitude < 20 & longitude > -20
Which statement describes how data will be filtered?

（A）The Delta Engine will use row-level statistics in the transaction log to identify the flies that meet the filter criteria.

（B）Statistics in the Delta Log will be used to identify partitions that might Include files in the filtered range.

（C）No file skipping will occur because the optimizer does not know the relationship between the partition column and the longitude.

（D）The Delta Engine will scan the parquet file footers to identify each row that meets the filter criteria.

（E）Statistics in the Delta Log will be used to identify data files that might include records in the filtered range.

正解：E 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 2

Review the following error traceback:

Which statement describes the error being raised?

（A）There is a syntax error because the heartrate column is not correctly identified as a column.

（B）The code executed was PvSoark but was executed in a Scala notebook.

（C）There is a type error because a DataFrame object cannot be multiplied.

（D）There is no column in the table named heartrateheartrateheartrate

（E）There is a type error because a column object cannot be multiplied.

正解：D 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 3

A data engineer, User A, has promoted a new pipeline to production by using the REST API to programmatically create several jobs. A DevOps engineer, User B, has configured an external orchestration tool to trigger job runs through the REST API. Both users authorized the REST API calls using their personal access tokens.
Which statement describes the contents of the workspace audit logs concerning these events?

（A）Because User B last configured the jobs, their identity will be associated with both the job creation events and the job run events.

（B）Because these events are managed separately, User A will have their identity associated with the job creation events and User B will have their identity associated with the job run events.

（C）Because the REST API was used for job creation and triggering runs, user identity will not be captured in the audit logs.

（D）Because the REST API was used for job creation and triggering runs, a Service Principal will be automatically used to identity these events.

（E）Because User A created the jobs, their identity will be associated with both the job creation events and the job run events.

正解：B 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 4

A junior data engineer is working to implement logic for a Lakehouse table namedsilver_device_recordings.
The source data contains 100 unique fields in a highly nested JSON structure.
Thesilver_device_recordingstable will be used downstream to power several production monitoring dashboards and a production model. At present, 45 of the 100 fields are being used in at least one of these applications.
The data engineer is trying to determine the best approach for dealing with schema declaration given the highly-nested structure of the data and the numerous fields.
Which of the following accurately presents information about Delta Lake and Databricks that may impact their decision-making process?

（A）Because Delta Lake uses Parquet for data storage, data types can be easily evolved by just modifying file footer information in place.

（B）Because Databricks will infer schema using types that allow all observed data to be processed, setting types manually provides greater assurance of data quality enforcement.

（C）The Tungsten encoding used by Databricks is optimized for storing string data; newly-added native support for querying JSON strings means that string types are always most efficient.

（D）Human labor in writing code is the largest cost associated with data engineering workloads; as such, automating table declaration logic should be a priority in all migration workloads.

（E）Schema inference and evolution on .Databricks ensure that inferred types will always accurately match the data types used by downstream systems.

正解：B 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 5

An upstream system has been configured to pass the date for a given batch of data to the Databricks Jobs API as a parameter. The notebook to be scheduled will use this parameter to load data with the following code:
df = spark.read.format("parquet").load(f"/mnt/source/(date)")
Which code block should be used to create the date Python variable used in the above code block?

（A）import sys
date = sys.argv[1]

（B）dbutils.widgets.text("date", "null")
date = dbutils.widgets.get("date")

（C）date = spark.conf.get("date")

（D）date = dbutils.notebooks.getParam("date")

（E）input_dict = input()
date= input_dict["date"]

正解：B 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 6

A junior data engineer on your team has implemented the following code block.

The viewnew_eventscontains a batch of records with the same schema as theeventsDelta table.
Theevent_idfield serves as a unique key for this table.
When this query is executed, what will happen with new records that have the sameevent_idas an existing record?

（A）They are updated.

（B）They are inserted.

（C）They are deleted.

（D）They are merged.

（E）They are ignored.

正解：E 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 7

Incorporating unit tests into a PySpark application requires upfront attention to the design of your jobs, or a potentially significant refactoring of existing code.
Which statement describes a main benefit that offset this additional effort?

（A）Improves the quality of your data

（B）Validates a complete use case of your application

（C）Troubleshooting is easier since all steps are isolated and tested individually

（D）Yields faster deployment and execution times

（E）Ensures that all steps interact correctly to achieve the desired end result

正解：C 解答を投票する

質問 8

A nightly job ingests data into a Delta Lake table using the following code:

The next step in the pipeline requires a function that returns an object that can be used to manipulate new records that have not yet been processed to the next table in the pipeline.
Which code snippet completes this function definition?
def new_records():

（A）

（B）

（C）return spark.read.option("readChangeFeed", "true").table ("bronze")

（D）return spark.readStream.load("bronze")

（E）return spark.readStream.table("bronze")

正解：A 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 9

The viewupdatesrepresents an incremental batch of all newly ingested data to be inserted or updated in the customerstable.
The following logic is used to process these records.

Which statement describes this implementation?

（A）The customers table is implemented as a Type 3 table; old values are maintained as a new column alongside the current value.

（B）The customers table is implemented as a Type 2 table; old values are overwritten and new customers are appended.

（C）The customers table is implemented as a Type 1 table; old values are overwritten by new values and no history is maintained.

（D）The customers table is implemented as a Type 2 table; old values are maintained but marked as no longer current and new values are inserted.

（E）The customers table is implemented as a Type 0 table; all writes are append only with no changes to existing values.

正解：D 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 10

A junior developer complains that the code in their notebook isn't producing the correct results in the development environment. A shared screenshot reveals that while they're using a notebook versioned with Databricks Repos, they're using a personal branch that contains old logic. The desired branch nameddev-2.3.9 is not available from the branch selection dropdown.
Which approach will allow this developer to review the current logic for this notebook?

（A）Use Repos to checkout the dev-2.3.9 branch and auto-resolve conflicts with the current branch

（B）Use Repos to pull changes from the remote Git repository and select the dev-2.3.9 branch.

（C）Merge all changes back to the main branch in the remote Git repository and clone the repo again

（D）Use Repos to merge the current branch and the dev-2.3.9 branch, then make a pull request to sync with the remote repository

（E）Use Repos to make a pull request use the Databricks REST API to update the current branch to dev-
2.3.9

正解：B 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

Databricks-Certified-Professional-Data-Engineer 無料問題集「Databricks Certified Professional Data Engineer」

弊社を連絡する

関連リンク

トップ試験