試験AIF-C01-JPN トピック3 問題232 スレッド

Amazon AIF-C01-JPNのリアル試験問題集
問題 #: 232
トピック #: 3

ある企業が、カスタマーサービスAIアシスタント向けに2つの基盤モデル(FM)を比較しています。FMを有用性、正確性、そしてトーンに基づいて評価したいと考えています。そのためには、自動化され、繰り返し実行可能で、人間のレビュー担当者を必要としない評価手法が必要です。
これらの要件を満たす評価手法はどれでしょうか?

A. 文字列の一致

B. 要約評価のための想起指向型アンダースタディ (ROUGE)

C. LLM-裁判官

D. 検索拡張生成 (RAG)

おすすめの解答：C 解答を投票する

AWS documentation describes LLM-as-a-judge as an automated evaluation technique where a large language model is used to assess the outputs of another model based on qualitative criteria such as helpfulness, correctness, tone, and alignment with expectations. This approach enables scalable and repeatable evaluations without requiring human reviewers.
In this scenario, the company needs to compare two foundation models across subjective dimensions that are difficult to measure using traditional metrics. LLM-as-a-judge allows the evaluator model to score or rank responses using predefined evaluation prompts and criteria, ensuring consistent and automated assessment.
The other options do not meet the requirements. String matching and ROUGE focus on lexical similarity and are unsuitable for evaluating tone or helpfulness in customer service interactions. Retrieval Augmented Generation is an architectural pattern, not an evaluation technique.
AWS highlights LLM-as-a-judge as a practical approach for automated qualitative evaluation of generative AI outputs, making it the correct choice.

板仓** 2026-03-31 10:31:25

時間限定で

15%

オフ

プレミアムのAIF-C01-JPN問題集のセルフテストエンジンまたはPDFをゲットしよう

弊社を連絡する

我々は１２時間以内ですべてのお問い合わせを答えます。

オンラインサポート時間：( UTC+9 ) 9:00-24:00
月曜日から土曜日まで

サポート：現在連絡

試験AIF-C01-JPN トピック3 問題232 スレッド

コメント

弊社を連絡する

関連リンク

トップ試験