Databricks-Certified-Data-Engineer-Associate 無料問題集「Databricks Certified Data Engineer Associate」
A data engineer wants to create a new table containing the names of customers that live in France.
They have written the following command:

A senior data engineer mentions that it is organization policy to include a table property indicating that the new table includes personally identifiable information (PII).
Which of the following lines of code fills in the above blank to successfully complete the task?
They have written the following command:

A senior data engineer mentions that it is organization policy to include a table property indicating that the new table includes personally identifiable information (PII).
Which of the following lines of code fills in the above blank to successfully complete the task?
正解:C
解答を投票する
A dataset has been defined using Delta Live Tables and includes an expectations clause:
CONSTRAINT valid_timestamp EXPECT (timestamp > '2020-01-01') ON
VIOLATION DROP ROW
What is the expected behavior when a batch of data containing data that violates these constraints is processed?
CONSTRAINT valid_timestamp EXPECT (timestamp > '2020-01-01') ON
VIOLATION DROP ROW
What is the expected behavior when a batch of data containing data that violates these constraints is processed?
正解:B
解答を投票する
A company is collaborating with a partner that does not use Databricks but needs access to a large historical dataset stored in Delta format. The data engineer needs to ensure that the partner can access the data securely, without the need for them to set up an account, and with read-only access. How should the data be shared?
正解:B
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
Identify the impact of ON VIOLATION DROP ROW and ON VIOLATION FAIL UPDATE for a constraint violation.
A data engineer has created an ETL pipeline using Delta Live table to manage their company travel reimbursement detail, they want to ensure that the if the location details has not been provided by the employee, the pipeline needs to be terminated.
How can the scenario be implemented?
A data engineer has created an ETL pipeline using Delta Live table to manage their company travel reimbursement detail, they want to ensure that the if the location details has not been provided by the employee, the pipeline needs to be terminated.
How can the scenario be implemented?
正解:A
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
A data engineer wants to create an external table in Databricks that references data stored in an Azure Data Lake Storage (ADLS) location. The goal is to enable Databricks to access and query this external data without moving it into the Databricks-managed storage. Which step should the data engineer take to successfully create the external table?
正解:A
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
An organization plans to share a large dataset stored in a Databricks workspace on AWS with a partner organization whose Databricks workspace is hosted on Azure. The data engineer wants to minimize data transfer costs while ensuring secure and efficient data sharing. Which strategy will reduce data egress costs associated with cross-cloud data sharing?
正解:A
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
A data engineer is standardizing repository layouts for multiple teams adopting Databricks Asset Bundles. The engineer wants to ensure every project has a single authoritative configuration file at the repository root that defines the bundle name, targets, workspace settings, permissions, and resource mappings (for jobs and pipelines).
What strategy should the data engineer use to meet the goal?
What strategy should the data engineer use to meet the goal?
正解:A
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
A data engineer has been provided a PySpark DataFrame named dfwith columns product and revenue. The data engineer needs to compute complex aggregations to determine each product's total revenue, average revenue, and transaction count. Which code snippet should the data engineer use?
正解:D
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
A data engineer needs to process SQL queries on a large dataset with fluctuating workloads. The workload requires automatic scaling based on the volume of queries, without the need to manage or provision infrastructure. The solution should be cost-efficient and charge only for the compute resources used during query execution. Which compute option should the data engineer use?
正解:A
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
A data engineer is attempting to drop a Spark SQL table my_table. The data engineer wants to delete all table metadata and data.
They run the following command:
DROP TABLE IF EXISTS my_table
While the object no longer appears when they run SHOW TABLES, the data files still exist.
Which of the following describes why the data files still exist and the metadata files were deleted?
They run the following command:
DROP TABLE IF EXISTS my_table
While the object no longer appears when they run SHOW TABLES, the data files still exist.
Which of the following describes why the data files still exist and the metadata files were deleted?
正解:D
解答を投票する