Pre-Summer Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 70track

Free NVIDIA NCP-AII Practice Exam with Questions & Answers | Set: 3

Questions 21

A system administrator wants to configure MIG for seven slices on an H100 GPU in an NVIDIA HGX system. Which command should be used?

Options:
A.

mig-parted

B.

nvidia-smi

C.

nvcc

D.

nvlink-config

NVIDIA NCP-AII Premium Access
Questions 22

As the infrastructure lead for an NVIDIA AI Factory deployment, you have just uploaded the latest supported firmware packages to your DGX system. It is now critical to ensure all hardware components run the new firmware and the DGX returns to full operational capability. Which sequence best guarantees that all relevant components are correctly running updated firmware?

Options:
A.

Perform a software-driven restart on the operating system of every compute node, then use advanced tools to check firmware status, and reissue update commands if any firmware appears inactive afterward.

B.

Execute a single AC power cycle on the DGX after the update process, then reset the software stack and verify status using diagnostic commands on each node for confirmation of all component updates.

C.

Initiate a cold power cycle on all node trays to activate firmware, follow with a DGX reboot procedure, and use the management interface to finish activating CPLD firmware on the host.

D.

Initiate a cold power cycle on the system to activate firmware for components, reset the BMC using the recommended command, and perform an AC power cycle to ensure EROT and CPLD firmware is activated.

Questions 23

A team is validating a DGX BasePOD deployment. Using cmsh, they run a command to check GPU health across all nodes. What indicates that the system is ready for AI workloads?

Options:
A.

The command output is ignored if the system powers on without errors.

B.

At least half of the GPUs report Status_Health = OK.

C.

All GPUs report Status_Health = OK and Health = OK for each device.

D.

Only the head node ' s GPUs need to be healthy.

Questions 24

A cluster administrator needs to validate transceiver firmware versions across 200 ports using UFM. Which GUI-based method provides a consolidated view?

Options:
A.

Navigate to ’Devices " > select a switch > " Cables ' tab to see ASIC firmware and transceiver versions.

B.

Use " Topology’ view to visually inspect cable icons.

C.

Run mlxlink -d lid- < LID > -m on each port manually.

D.

Export all switch logs and grep for ’FW Version " .

Questions 25

What command is needed to measure BER (Bit Error Rate)?

Options:
A.

mlxconfig -d < device > q

B.

ethtool -S < device >

C.

mlxlink -d < device > -c -e

D.

mstflint -d < device > q full

Questions 26

A customer is designing an AI Factory for enterprise-scale deployments and wants to ensure redundancy and load balancing for the management and storage networks. Which feature should be implemented on the Ethernet switches?

Options:
A.

Implement redundant switches with spanning tree protocol.

B.

MLAG for bonded interfaces across redundant switches.

C.

Use only one switch for all management and storage traffic.

D.

Disable VLANs and use unmanaged switches.

Questions 27

You are installing the operating system as part of the initial setup for a new NVIDIA Base Command Manager cluster. Which two of the following actions are essential for a successful OS installation on the cluster’s head node?

Pick the 2 correct responses below.

Options:
A.

Download the latest BCM ISO and verify its integrity using the provided checksum, then start the installation.

B.

Configure network switches for PXE boot to all compute nodes before installing the OS on the head node.

C.

Set the desired time zone and configure NTP synchronization during the OS installation wizard.

D.

Start the head node OS installation process with the system BIOS set to legacy boot mode instead of UEFI.

Questions 28

An administrator needs to add additional GPUs to an existing server. What are the server requirements to check before installing new GPUs?

Options:
A.

Sufficient networking, water-cooled racks, adequate rack power, sufficient storage, and rack space.

B.

Sufficient storage, sufficient networking, adequate rack power, and compatible hardware.

C.

Sufficient CPU capacity, PCIe slot allocation, sufficient cooling in the data center, and rack space.

D.

Sufficient cooling in the data center, adequate rack power, compatible hardware, and PCIe slot allocation.

Questions 29

After NCCL burn-in reports " transport retry count exceeded, " which corrective action addresses the underlying fabric issue?

Options:
A.

Switch from Ring to Tree algorithms via NCCL_ALGO=TREE

B.

Reduce message size to decrease network utilization

C.

Increase NCCL_IB_TIMEOUT to tolerate longer latencies

D.

Inspect InfiniBand link quality metrics (BER, symbol errors) and replace faulty cables

Questions 30

An infrastructure engineer is preparing a new AI cluster for production use, relying on NVIDIA switches and high-speed optical transceivers for node connectivity. The team is finalizing network validation before launching large-scale training jobs. Why is it critical to confirm and align the firmware version on all switch transceivers prior to production?

Options:
A.

To guarantee that hardware inventory tools can report serial numbers and manufacturer codes for asset management, which is critical for future support and troubleshooting.

B.

To ensure stability, bandwidth, and compatibility across the cluster, avoiding link issues and performance loss.

C.

To allow the network operating system to automatically discover all connected transceivers with heterogeneous firmware.

D.

To reduce GPU memory consumption during distributed training jobs.

Exam Code: NCP-AII
Certification Provider: NVIDIA
Exam Name: NVIDIA AI Infrastructure
Last Update: Jun 6, 2026
Questions: 123