Associate-Developer-Apache-Spark-3.5無料問題集「Databricks Certified Associate Developer for Apache Spark 3.5

質問 1

Given:
python
CopyEdit
spark.sparkContext.setLogLevel("<LOG_LEVEL>")
Which set contains the suitable configuration settings for Spark driver LOG_LEVELs?

（A）ALL, DEBUG, FAIL, INFO

（B）ERROR, WARN, TRACE, OFF

（C）WARN, NONE, ERROR, FATAL

（D）FATAL, NONE, INFO, DEBUG

正解：B 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 2

What is the difference betweendf.cache()anddf.persist()in Spark DataFrame?

（A）Both functions perform the same operation. Thepersist()function provides improved performance asits default storage level isDISK_ONLY.

（B）persist()- Persists the DataFrame with the default storage level (MEMORY_AND_DISK_SER) andcache()- Can be used to set different storage levels to persist the contents of the DataFrame.

（C）cache()- Persists the DataFrame with the default storage level (MEMORY_AND_DISK) andpersist()- Can be used to set different storage levels to persist the contents of the DataFrame

（D）Bothcache()andpersist()can be used to set the default storage level (MEMORY_AND_DISK_SER)

正解：C 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 3

Which UDF implementation calculates the length of strings in a Spark DataFrame?

（A）df.select(length(col("stringColumn")).alias("length"))

（B）spark.udf.register("stringLength", lambda s: len(s))

（C）df.withColumn("length", spark.udf("len", StringType()))

（D）df.withColumn("length", udf(lambda s: len(s), StringType()))

正解：A 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 4

What is the relationship between jobs, stages, and tasks during execution in Apache Spark?
Options:

（A）A stage contains multiple jobs, and each job contains multiple tasks.

（B）A job contains multiple tasks, and each task contains multiple stages.

（C）A job contains multiple stages, and each stage contains multiple tasks.

（D）A stage contains multiple tasks, and each task contains multiple jobs.

正解：C 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 5

A Spark developer is building an app to monitor task performance. They need to track the maximum task processing time per worker node and consolidate it on the driver for analysis.
Which technique should be used?

（A）Configure the Spark UI to automatically collect maximum times

（B）Broadcast a variable to share the maximum time among workers

（C）Use an RDD action like reduce() to compute the maximum time

（D）Use an accumulator to record the maximum time on the driver

正解：C 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 6

A data engineer wants to create a Streaming DataFrame that reads from a Kafka topic called feed.

Which code fragment should be inserted in line 5 to meet the requirement?
Code context:
spark \
.readStream \
.format("kafka") \
.option("kafka.bootstrap.servers","host1:port1,host2:port2") \
.[LINE5] \
.load()
Options:

（A）.option("kafka.topic", "feed")

（B）.option("subscribe", "feed")

（C）.option("subscribe.topic", "feed")

（D）.option("topic", "feed")

正解：B 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 7

A developer is running Spark SQL queries and notices underutilization of resources. Executors are idle, and the number of tasks per stage is low.
What should the developer do to improve cluster utilization?

（A）Increase the value of spark.sql.shuffle.partitions

（B）Enable dynamic resource allocation to scale resources as needed

（C）Increase the size of the dataset to create more partitions

（D）Reduce the value of spark.sql.shuffle.partitions

正解：A 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

質問 8

Given a DataFramedfthat has 10 partitions, after running the code:
result = df.coalesce(20)
How many partitions will the result DataFrame have?

（A）20

（B）1

（C）10

（D）Same number as the cluster executors

正解：C 解答を投票する

解説: (JPNTest メンバーにのみ表示されます)

Associate-Developer-Apache-Spark-3.5 無料問題集「Databricks Certified Associate Developer for Apache Spark 3.5 - Python」

弊社を連絡する

関連リンク

トップ試験