NCA-GENM 無料問題集「NVIDIA Generative AI Multimodal」
You are developing a multimodal model that combines text and tabular data for predicting customer churn. The text data consists of customer reviews, and the tabular data includes demographics and transaction history. You've preprocessed both datasets. Which of the following approaches would be the MOST effective for integrating these modalities?
正解:B、E
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
You've trained a large multimodal model that takes text and images as input and generates creative stories. While the model produces high-quality stories in general, it occasionally generates outputs that are factually incorrect or nonsensical. Which of the following techniques would be MOST effective in improving the model's factual accuracy and coherence?
正解:B
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
When building a multimodal model using transformers, you observe that the model struggles to attend to the correct image regions when generating text descriptions. Which of the following techniques could you employ to improve the attention mechanism in the model?
正解:D
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
You are developing a text-to-image generation system using a diffusion model. During inference, you notice that the generated images often contain artifacts or inconsistencies. What is the most appropriate strategy to reduce these artifacts and improve the overall image quality?
正解:E
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
You're using NVIDIA Triton to serve a multimodal model: a CLIP text encoder and a StyleGAN image generator. You need to ensure high throughput and minimal latency. Which Triton backend configuration is most suitable for this scenario, assuming both models are optimized for NVIDIA GPUs?
正解:D
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
You're designing a multimodal A1 system for autonomous driving that integrates data from cameras (images), LiDAR (point clouds), radar (time-series), and GPS (geospatial). The system needs to make real-time decisions in complex urban environments. Which hardware and software components are crucial for achieving low latency and high accuracy in data processing and fusion?
正解:C
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
You are training a conditional generative model to generate images based on text descriptions. You notice that the generated images often lack fine-grained details and tend to be blurry, even though the overall structure matches the text description. Which of the following techniques would be MOST effective in improving the image quality and adding finer details?
正解:A
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
You are working with a dataset of handwritten digits and training a Variational Autoencoder (VAE) to generate new digits. After training, you observe that the generated digits are blurry and lack sharp details. Which of the following modifications could potentially improve the quality of the generated digits in your VAE?
正解:C、D
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
You are tasked with building a multimodal generative AI model to create marketing content from product images and descriptions. The image encoder uses a pre-trained ResNet50 model, and the text encoder uses a pre-trained BERT model. After initial training, the generated content frequently misinterprets the image. Which of the following strategies is MOST effective in improving the model's ability to correctly interpret the image within the multimodal context?
正解:B
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
You are working on a sequence-to-sequence model for neural machine translation. You've implemented an attention mechanism, but the model is still struggling with long sentences, often losing context in the later parts of the translation. Which type of attention mechanism is most likely to alleviate this issue effectively?
正解:A
解答を投票する
解説: (JPNTest メンバーにのみ表示されます)