NCP-AII 無料問題集「NVIDIA AI Infrastructure」

You are monitoring a server with 8 GPUs used for deep learning training. You observe that one of the GPUs reports a significantly lower utilization rate compared to the others, even though the workload is designed to distribute evenly. 'nvidia-smi' reports a persistent "XID 13" error for that GPU. What is the most likely cause?

解説: (JPNTest メンバーにのみ表示されます)
You are tasked with implementing a monitoring solution for power consumption and thermal performance in an NVIDIA-powered Ai cluster. You want to collect data from the Baseboard Management Controllers (BMCs) of the servers using Redfish. Which of the following Python code snippets demonstrates the correct approach for authenticating with the BMC and retrieving power and temperature readings?

解説: (JPNTest メンバーにのみ表示されます)
You have an Intel Xeon Gold server with 2 NVIDIA Tesla VI 00 GPUs. After deploying your A1 application, you observe that one GPU is consistently running at a significantly higher temperature than the other What could be a plausible reason for this behavior?

正解:B、E 解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
An Ai infrastructure relies on a liquid cooling system to dissipate heat from multiple NVIDIA GPUs. After a recent software update, users report intermittent performance degradation and system crashes. You suspect a cooling issue. Which TWO of the following checks are the MOST critical in diagnosing the root cause?

正解:D、E 解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
An A1 server exhibits frequent kernel panics under heavy GPU load. 'dmesg' reveals the following error: 'NVRM: Xid (PCl:0000:3B:00): 79, pid=..., name=..., GPU has fallen off the bus.' Which of the following is the least likely cause of this issue?

解説: (JPNTest メンバーにのみ表示されます)
You are installing the NGC CLI using 'pip' behind a corporate proxy. The installation fails due to connection errors. How do you configure pip' to use the proxy during the NGC CLI installation?

正解:B、C、D 解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
When installing multiple NVIDIA GPUs, which of the following factors are MOST important to consider regarding PCIe slot configuration?
(Choose two)

正解:A、E 解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
You are troubleshooting a performance issue with a GPU-accelerated application running inside a Docker container. The 'nvidia-smi' output inside the container shows the GPU is being utilized, but the performance is significantly lower than expected. Which of the following could be the cause of this performance bottleneck?

正解:A、B、C、E 解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
You are using a custom container runtime other than Docker (e.g., containerd) and need to integrate it with the NVIDIA Container Toolkit.
What command would you use to configure the NVIDIA Container Toolkit for this runtime? (Assume your runtime configuration file is located at '/etc/containerd/config.toml')

解説: (JPNTest メンバーにのみ表示されます)
You are running a distributed training job on a multi-GPU server. After several hours, the job fails with a NCCL (NVIDIA Collective Communications Library) error. The error message indicates a failure in inter-GPU communication. 'nvidia-smi' shows all GPUs are healthy. What is the MOST probable cause of this issue?

正解:A、B 解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
You have configured MIG on your A100 GPU, creating several MIG instances. You now want to allocate a specific MIG instance to a Docker container. How would you specify the necessary device option when running the 'docker run' command to ensure the container uses only that MIG instance? Assuming the MIG instance UUID is GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

解説: (JPNTest メンバーにのみ表示されます)
Consider the following *iptables' rule used in an A1 inference server. What is its primary function?
iptables -A INPUT -p tcp --dport 8080 -j ACCEPT

解説: (JPNTest メンバーにのみ表示されます)
You are troubleshooting slow I/O performance in a deep learning training environment utilizing BeeGFS parallel file system. You suspect the metadata operations are bottlenecking the training process. How can you optimize metadata handling in BeeGFS to potentially improve performance?

解説: (JPNTest メンバーにのみ表示されます)
After physically installing a new NVIDIA GPU in a server, you boot the system. You notice that the GPU is not recognized by the operating system. You've verified the card is properly seated and powered. What are the MOST LIKELY causes and solutions? (Select TWO)

正解:B、D 解答を投票する
解説: (JPNTest メンバーにのみ表示されます)
During NVLink Switch configuration, you encounter issues where certain GPUs are not being recognized by the system. Which of the following troubleshooting steps are most likely to resolve this problem?

正解:A、B、E 解答を投票する
解説: (JPNTest メンバーにのみ表示されます)

弊社を連絡する

我々は12時間以内ですべてのお問い合わせを答えます。

オンラインサポート時間:( UTC+9 ) 9:00-24:00
月曜日から土曜日まで

サポート:現在連絡