엔비디아가 지포스 RTX 50 시리즈 GPU의 AI 워크로드를 가속화하는 엔비디아 NIM과 AI 블루프린트를 공개했다. NIM과 AI 블루프린트는 개발자와 애호가들이 AI를 로컬에서 구축, 반복, 배포할 수 있도록 도와 AI 접근성을 확대한다.
NIM·AI Blueprint Launch, Accelerating Locally Generated AI
NVIDIA today unveiled NVIDIA NIM and AI Blueprints, which accelerate AI workloads on GeForce RTX 50 Series GPUs. NIM and AI Blueprints expand AI accessibility by enabling developers and enthusiasts to build, iterate, and deploy AI locally.
The GeForce RTX 5090 and 5080 GPUs are based on the Blackwell architecture, which supports new DLSS multi-frame generation, using AI to generate up to three frames per rendered frame to improve FPS.
The RTX 50 series is built to accelerate modern generative AI workloads, delivering up to 3,352 trillion AI operations per second (TOPS) and featuring fifth-generation Tensor Cores and support for FP4 precision to run advanced AI models faster and more efficiently.
At CES last month, NVIDIA unveiled NVIDIA NIM and AI Blueprints for RTX to enable AI developers and enthusiasts to take advantage of these capabilities. NIM and AI Blueprints are optimized for GeForce RTX 50 Series GPUs, enabling cutting-edge AI experiences on AI PCs.
Although AI model development is advancing rapidly, applying these innovations to PCs remains a challenge for many.
Models published on platforms like Hugging Face need to be curated, tuned, and quantified to run on a PC, and integrated into new AI APIs to ensure compatibility with existing tools. They also need to be converted to an optimized inference backend.
NIM microservices for RTX AI PCs and workstations provide community-driven and AI model access, easing complexity and providing multiple deployment options. It includes what’s needed to run optimized models on PCs with RTX GPUs, including: pre-built engines for specific GPUs; Tensor RT SDK; and open-source Tensor RT-LLM library for accelerated inference on Tensor Cores.
NVIDIA is working with Microsoft to enable NIM microservices and AI blueprints for RTX on the Windows Subsystem for Linux, which will enable the same AI containers that run on data center GPUs to run on RTX PCs.
In particular, quantization can optimize AI performance by reducing model size. On a GeForce RTX 4090 using FP16, the Flux.1 [dev] model takes 15 seconds to generate an image with 30 steps. However, on a GeForce RTX 5090 using FP4, an image can be generated in about 5 seconds.
FP4 is natively supported in the Blackwell architecture, making it easier to deploy high-performance AI on local PCs. Additionally, it is integrated into the NIM microservice to effectively optimize models that were previously difficult to quantize.
NIM microservices and AI blueprints are coming soon, with initial hardware support for GeForce RTX 50 series, GeForce RTX 4090 and 4080, and NVIDIA RTX 6000 and 5000 professional GPUs, with additional GPU support planned for the future.