NVIDIA GF100 GPU Fermi Graphics Architecture | Fermi,GF100,NVIDIA GF100 GPU Fermi Architecture,NVIDIA GF100 GPU Codename Fermi Graphics Processor Architecture Analysis

NVIDIA GF100 GPU Fermi Graphics Architecture

Reviews - Featured Reviews: Processors
Written by Olin Coles
Monday, 18 January 2010
NVIDIA GF100 GPU Fermi Architecture The CPU is nearly obsolete. At least that's the way it looks when you understand how quickly GPU's have matured to overtake CPU evolution. While the list of processor manufacturers numbers into the hundreds, most personal computer enthusiasts are concerned with the Central Processing Unit (CPU). It's true that CPU's have long enjoyed the attention of overclockers and hardware enthusiasts while less-powerful audio, network, and storage processors work in the background, yet it is the GPU that handles everything we see displayed from our computer system. Gamers rely on powerful graphics adapters to deliver the fast frame rates needed to enjoy high-performance video games, and everyone else needs graphics processing power to enjoy movies and multimedia. But the term 'video game' is coming extremely close to being a liternal interpretation: games are in fact becoming closer to filmed video footage than ever before. NVIDIA's latest GPU is codenamed GF100, and is the first graphics processor based on the Fermi architecture. In this article, Benchmark Reviews explains the technical architecture behind NVIDIA's GF100 graphics processor and offers an insight into upcoming Fermi-based GeForce video cards. For those who are not familiar, NVIDIA's GF100 GPU is their first graphics processor to support DirectX-11 hardware features such as tessellation and DirectCompute, while also adding heavy particle and turbulence effects. The GF100 GPU is also the successor to the GT200 graphics processor, which launched in the GeForce GTX 280 video card back in June 2008. NVIDIA has since redefined their focus, and GF100 proves a dedication towards next generation gaming effects such as raytracing, order-independent transparency, and fluid simulations. Rest assured, the new GF100 GPU is more powerful than the GT200 could ever be, and early results indicate a Fermi-based video card delivers far more than twice the gaming performance over a GeForce GTX-280. New products are expected to get better with each new revision, but it amazes me how different the end-goals appear to be between CPU manufacturers such as Intel when compared to a GPU manufacturer such as NVIDIA. While Intel keeps loading their processors with a growing cache buffer and refining the fabrication process to achieve higher speeds, NVIDIA designs a GPU that is really made to compute to a higher level by increasing the number of actual processor cores in addition to the refining process. As of the GF100 GPU, NVIDIA delivers 512 processor cores, which now earn the term 'CUDA cores', compared to Intel's six (6) cores on the upcoming 'Gulftown' Core-i7 processor. I realize that we're comparing apples to oranges here, but the end goals are identical: higher performance. Remember this point as I near my conclusion, and you'll understand what the future holds for processor architecture. NVIDIA Fermi GF100 Processor (click for high-resolution) What's new in Fermi? With any new technology, consumers want to know what's new in the product. The goal of this article is to share in-depth information surrounding the Fermi architecture, as well as the new functionality unlocked in GF100. For clarity, the 'GF' letters used in the GF100 GPU name are not an abbreviation for 'GeForce'; they actually denote that this GPU is a Graphics solution based on the Fermi architecture. The next generation of NVIDIA GeForce-series desktop video cards will use the GF100 to promote the following new features: Exceptional Gaming Performance (increased video frame rates and 3D-Vision Surround) First-rate image quality (32x CSAA antialiasing mode) Film-like Geometric Realism (DirectX-11 particle and turbulence effects) Revolutionary Compute Architecture for Gaming (CUDA and PhysX for GPGPU) About NVIDIA Corporation: NVIDIA (Nasdaq: NVDA) is the world leader in visual computing technologies and the inventor of the GPU, a high-performance processor which generates breathtaking, interactive graphics on workstations, personal computers, game consoles, and mobile devices. NVIDIA serves the entertainment and consumer market with its GeForce products, the professional design and visualization market with its Quadro products, and the high-performance computing market with its Tesla products. These products are transforming visually-rich and computationally-intensive applications such as video games, film production, broadcasting, industrial design, financial modeling, space exploration, and medical imaging. NVIDIA Product Lines GeForce - GPUs dedicated to graphics and video. Desktop and notebook PCs equipped with GeForce GPUs deliver unparalleled performance, crisp photos, high-definition video playback, and ultra-realistic games. GeForce notebook GPUs also include advanced power management technology to deliver high performance without sacrificing battery life. Quadro - A complete range of professional solutions engineered to deliver breakthrough performance and quality. Certified for all leading professional graphics applications. #1in professional graphics segment share. NVIDIA Quadro Plex is the industry's first dedicated visual computing system (VCS). Tesla - A massively-parallel multi-threaded architecture for high-performance computing problems. A dedicated, high-performance GPU computing solution that brings supercomputing power to any workstation or server and to standard, CPU-based server clusters. Tesla delivers a 128-processor computing core per GPU, C-language development environment for the GPU, and a suite of developer tools - allowing users to develop applications faster and to deploy them across multiple generations of processors. It also can be used in tandem with multi-core CPU systems to create a scalable computing solution that fits seamlessly into existing workstation or IT infrastructures. Geometric Realism In my introduction I stated that video games are will soon reproduce film-quality video footage as their primary graphic. Expect that gamers will soon enjoy lifelike full-motion graphics that replace animated textures. The 2D style of video games (think original Doom or Duke Nukem) is yesteryear, and the 3D future in video games will look like some of the best CG movies we already enjoy (think Final Fantasy VII: Advent Children or Transformers the Movie to a lesser extent). Gamers already experience Toy Story-like CG graphics is many games, and thanks to NVIDIA's Fermi architecture, the GF100 will bridge the gap to playing games that feel like you're controlling a movie. While programmable shading has allowed PC games to mimic film in per-pixel effects, geometric realism has lagged behind. The most advanced PC games today use one to two million polygons per frame. By contrast, a typical frame in a computer generated film uses hundreds of millions of polygons. This disparity can be partly traced to hardware-while the number of pixel shaders has grown from one to many hundreds, the triangle setup engine has remained a singular unit, greatly affecting the relative pixel versus geometry processing capabilities of today's GPUs. For example, the GeForce GTX 285 video card has more than 150× the shading horsepower of the GeForce FX, but less than 3× the geometry processing rate. The outcome is such that pixels are shaded meticulously, but geometric detail is comparatively modest. In tackling geometric realism, NVIDIA looked to the movies for inspiration. The intimately detailed characters in films are made possible by two key techniques: tessellation and displacement mapping. Tessellation refines large triangles into collections of smaller triangles, while displacement mapping changes their relative position. In conjunction, these two techniques allow arbitrarily complex models to be formed from relatively simple descriptions. GF100's entire graphics pipeline is designed to deliver high performance in tessellation and geometry throughput. The NVIDIA GF100 GPU replaces the traditional geometry processing architecture at the front end of the graphics pipeline with an entirely new distributed geometry processing architecture that is implemented using multiple "PolyMorph Engines" . Each PolyMorph Engine includes a tessellation unit, an attribute setup unit, and other geometry processing units. Each SM has its own dedicated PolyMorph Engine (more details on the Polymorph Engine come in another section). Newly generated primitives are converted to pixels by four Raster Engines that operate in parallel (compared to a single Raster Engine in prior generation GPUs). On-chip L1 and L2 caches enable high bandwidth transfer of primitive attributes between the SM and the tessellation unit as well as between different SMs. Tessellation and all its supporting stages are performed in parallel on GF100, enabling breathtaking geometry throughput. While GF100 includes many enhancements and performance improvements over past GPU architectures, the ability to perform parallel geometry processing is possibly the single most important GF100 architectural improvement. The ability to deliver setup rates exceeding one primitive per clock while maintaining correct rendering order is a significant technical achievement never before done in a GPU. Photograph? It's actually NVIDIA OptiX technology. The Bugatti Veyron was computer-rendered in the image above with path tracing techniques using NVIDIA OptiX technology... it is not a photograph. NVIDIA OptiX integrates easily with game engines, allowing racing games to leverage ray tracing for near-photorealistic glamour shots in gallery mode. Compute Architecture The rasterization pipeline has come a long way, but as games aspire to film quality, graphics is moving toward advanced algorithms that require the GPU to perform general computation along with programmable shading. G80 was the first NVIDIA GPU to include compute features. GF100 benefits from what we learned on G80 in order to significantly improve compute features for gaming. GF100 leverages Fermi's revolutionary compute architecture for gaming applications. In graphics, threads operate independently, with a predetermined pipeline, and exhibit good memory access locality. Compute threads on the other hand often communicate with each other, work in no predetermined fashion, and often read and write to different parts of memory. Major compute features improved on GF100 that will be useful in games include faster context switching between graphics and PhysX, concurrent compute kernel execution and an enhanced caching architecture which is good for irregular algorithms such as ray tracing, and AI algorithms. Atomic operations performance has improved, and allows threads to safely cooperate through work queues, accelerating novel rendering algorithms. For example, fast atomic operations allow transparent objects to be rendered without presorting (order independent transparency) enabling developers to create levels with complex glass environments. For seamless interoperation with graphics, GF100's GigaThread engine reduces context switch time to about 20 microseconds, making it possible to execute multiple compute and physics kernels for each frame. For example, a game may use DirectX 11 to render the scene, switch to CUDA for selective ray tracing, call a Direct Compute kernel for post processing, and perform fluid simulations using PhysX. Delivering Geometric Realism According to NVIDIA, tessellation and displacement mapping is the de facto standard in geometric realism for film. While tessellation and displacement mapping are not rendering techniques new to the industry, up until now they have mostly been used in movie development (such as Disney's Pirates of the Caribbean). With the introduction of DirectX 11 and NVIDIA's GF100 GPU, developers will be able to harness these powerful techniques for gaming applications. In this section Benchmark Reviews will discuss the benefits of tessellation and displacement mapping in regard to game development and high-quality real time graphics rendering. Game assets (such as world objects and characters) are typically created using software modeling packages like Mudbox, ZBrush, 3D Studio Max, Maya, or SoftImage. These packages provide tools based on surfaces with displacement mapping to aid the artist in creating detailed characters and environments. Today, the artist must manually create polygonal models at various levels of detail as required by the various rendering scenarios in the game, required to maintain playable frame-rates. These models are meshes of triangles with associated texture maps needed for proper shading. When used in a game, the model information is sent per frame to the GPU through its host interface. Game developers tend to use relatively simple geometric models due to the limited bandwidth of the PCI-Express bus and the modest geometry throughput of current GPUs. Tessellation In today's complex graphics, tessellation offers the means to store massive amounts of coarse geometry, with expand-on-demand functionality. In the NVIDIA GF100 GPU, tessellation also enables more complex animations. In terms of model scalability, dynamic Level of Detail (LOD) allows for quality and performance trade-offs whenever it can deliver better picture quality over performance without penalty. Comprised of three layers (original geometry, tessellation geometry, and displacement map), the final product is far more detailed in shade and data-expansion than if it were constructed with bump-map technology. In plain terms, tessellation gives the peaks and valleys with shadow detail in-between, while previous-generation technology (bump-mapping) would give the illusion of detail. Using GPU-based tessellation, a game developer can send a compact geometric representation of an object or character and the tessellation unit can produce the correct geometric complexity for the specific scene. Consider the "Imp" character illustrated above. On the far left we see the intial quad mesh used to model the general outline of the figure; this representation is quite compact even when compared to typical game assets. The two middle images of the character are created by finely tessellating the description at the left. The result is a very smooth appearance, free of any of the faceting that resulted from limited geometry. Unfortunately this character, while smooth, is no more detailed than the coarse mesh. The final image on the right was created by applying a displacement map to the smoothly tessellated third character to the left. Tessellation in DirectX-11 Control hull shaders run DX11 pre-expansion routines, and operates explicitly in parallel across all points. Domain shaders run post-expansion operations on maps (u/v or x/y/z/w) and is also implicitly parallel. Fixed function tessellation is configured by Level of Detail (LOD) based on output from the control hull shader, and can also produce triangles and lines if requested. Tessellation is something that is new to NVIDIA GPUs, and was not part of GT200 because of geometry bandwidth bottlenecks from sequential rendering/execution semantics. In regard to the GF100 graphics processor, NVIDIA has added a new PolyMorph and Raster engines to handle world-space processing (PolyMorph) and screen-space processing (Raster). There are sixteen PolyMorph engines and four Raster engines on the GF100, which depend on an improved L2 cache to keep buffered geometric data produced by the pipeline on-die. The end result is an 8x increase in geometric performance over the GT200 GPU in the GeForce GTX285. Not surprisingly, NVIDIA's investment into increased geometric power has resulted in a giant leap over similar output from ATI's Radeon HD 5870. Using the Unigine DX11 benchmark, the NVIDIA GF100 performed at 43 FPS on average compared to 27 FPS for the Radeon HD 5870. Using the Microsoft DirectX-11 SDK, the NVIDIA GF100 renders six cubemap faces in one pass, and compared to the HD5870 the geometry was produced 4x faster. Four-Offset Gather4 The texture unit on previous processor architectures operated at the core clock of the GPU. On GF100, the texture units run at a higher clock, leading to improved texturing performance for the same number of units. GF100's texture units now add support for DirectX-11's BC6H and BC7 texture compression formats, reducing the memory footprint of HDR textures and render targets. The texture units also support jittered sampling through DirectX-11's four-offset Gather4 feature, allowing four texels to be fetched from a 128×128 pixel grid with a single texture instruction. NVIDIA's GF100 implements DirectX-11 four-offset Gather4 in hardware, greatly accelerating shadow mapping, ambient occlusion, and post processing algorithms. With jittered sampling, games can implement smoother soft shadows or custom texture filters efficiently. For example, NVIDIA's own testing indicates that the accelerated jitter sampling on the GF100 is 3.3x faster than Radeon HD5870. The previous GT200 GPU did not offer coverage samples, while the GF100 can deliver 32x CSAA. The jump from 8x AA to 32x CSAA appears to only reduce GF100 performance by approximately 7% in NVIDIA's lab tests. GF100 Compute for Gaming As developers continue to search for novel ways to improve their graphics engines, the GPU will need to excel at a diverse and growing set of graphics algorithms. Since these algorithms are executed via general compute APIs, a robust compute architecture is fundamental to a GPU's graphical capabilities. In essence, one can think of compute as the new programmable shader. GF100's compute architecture is designed to address a wider range of algorithms and to facilitate more pervasive use of the GPU for solving parallel problems. Many algorithms, such as ray tracing, physics, and AI, cannot exploit shared memory-program memory locality is only revealed at runtime. GF100's cache architecture was designed with these problems in mind. With up to 48 KB of L1 cache per Streaming Multiprocessor (SM) and a global L2 cache, threads that access the same memory locations at runtime automatically run faster, irrespective of the choice of algorithm. NVIDIA Codename NEXUS brings CPU and GPU code development together in Microsoft Visual Studio 2008 for a shared process timeline. NEXUS also introduces the first hardware-based shader debugger. NVIDIA's GF100 is the first GPU to ever offer full C++ support, the programming language of choice among game developers. To ease the transition to GPU programming, NVIDIA developed Nexus, a Microsoft Visual Studio programming environment for the GPU. Together with new hardware features that provide better debugging support, developers will be able enjoy CPU-class application development on the GPU. The end results is C++ and Visual Studio integration that brings HPC users into the same platform of development. NVIDIA offers several paths to deliver compute functionality on the GF100 GPU, such as CUDA C++ for video games. Image processing, simulation, and hybrid rendering are three primary functions of GPU compute for gaming. Using NVIDIA's GF100 GPU, interactive ray tracing becomes possible for the first time on a standard PC. Ray tracing performance on the NVIDIA GF100 is roughly 4x faster than it was on the GT200 GPU, according to NVIDIA tests. AI/path finding is a compute intensive process well suited for GPUs. The NVIDIA GF100 can handle AI obstacles approximately 3x better than on the GT200. Benefits from this improvement are faster collision avoidance and shortest path searches for higher-performance path finding. In a PhysX fluid simulation example, there was a 2x improvement in the GF100 over the previous GPU architecture (67 in GT200 to 141 FPS on the GF100). Concurrent kernel calculations receive a 20-40% improvement using Phys X 3.0. In fact, NVIDIA suggests that the new architectural enhancements offer a 2x-3x improvement in compute performance in most cases. Other improvements, such as geometric realism, occur at 8x over GT200 while image quality at 32x CSAA produces 3x faster shadow maps for 2x over GT200. NVIDIA Fermi Architecture NVIDIA promises the GeForce GF100 GPU will be the most powerful graphics processor ever built, and on paper it appears the Fermi GPU architecture can deliver. Benchmark Reviews has tested AMD's latest graphics processor, the 40nm ATI Radeon Cypress GPU, which has held command of the discrete desktop graphics industry with its 1600 shader cores since first released on the Radeon HD 5870. But not all GPU cores are the same, and although ATI uses 1600 in their Juniper-XT, the end result is not a sum of its cores but rather a sum of the total package. Running up to the NVIDIA GF100 GPU, the 40nm process has been nurtured to maturity after nine months of production at TSMC. After plenty of practice, patience, and some design tweaks, the NVIDIA GF100 GPU is the highest-performing graphics processor they've ever built. At the core of Fermi's architecture are improvements to visual quality and GPU-compute productivity, intended to deliver the best gaming experience available from a video card. In this section, we analyze NVIDIA's Fermi architecture and compare the incoming GF100 against the outgoing GT200 graphics processor. GF100 is not another incremental GPU step-up like we had going from G80 to GT200. While processor cores have grown from 128 (G80) and 240 (GT200), they now reach 512 and earn the title of NVIDIA CUDA (Compute Unified Device Architecture) cores. The key here is not only the name, but that the name now implies an emphasis on something more than just graphics. Each Fermi CUDA processor core has a fully pipelined integer arithmetic logic unit (ALU) and floating point unit (FPU). GF100 implements the new IEEE 754-2008 floating-point standard, providing the fused multiply-add (FMA) instruction for both single and double precision arithmetic. FMA improves over a multiply-add (MAD) instruction by doing the multiplication and addition with a single final rounding step, with no loss of precision in the addition. FMA minimizes rendering errors in closely overlapping triangles. GF100 Specifications 512 CUDA Cores 16 Geometry Units 4 Raster Units 64 Texture Units 48 ROP Units 384-bit GDDR5 DirectX-11 API Support NVIDIA Fermi GF100 Block Diagram (click for high-resolution) Based on Fermi's third-generation Streaming Multiprocessor (SM) architecture, GF100 doubles the number of CUDA cores over the previous architecture. NVIDIA GeForce GF100 Fermi GPUs are based on a scalable array of Graphics Processing Clusters (GPCs), Streaming Multiprocessors (SMs), and memory controllers. The NVIDIA GF100 implements four GPCs, sixteen SMs, and six memory controllers. Expect NVIDIA to launch GF100 products with different configurations of GPCs, SMs, and memory controllers to address different price points. CPU commands are read by the GPU via the Host Interface. The GigaThread Engine fetches the specified data from system memory and copies them to the frame buffer. GF100 implements six 64-bit GDDR5 memory controllers (384-bit total) to facilitate high bandwidth access to the frame buffer. The GigaThread Engine then creates and dispatches thread blocks to various SMs. Individual SMs in turn schedules warps (groups of 32 threads) to CUDA cores and other execution units. The GigaThread Engine also redistributes work to the SMs when work expansion occurs in the graphics pipeline, such as after the tessellation and rasterization stages. GF100 implements 512 CUDA cores, organized as 16 SMs of 32 cores each. Each SM is a highly parallel multiprocessor supporting up to 48 warps at any given time. Each CUDA core is a unified processor core that executes vertex, pixel, geometry, and compute kernels. A unified L2 cache architecture services load, store, and texture operations. GF100 has 48 ROP units for pixel blending, antialiasing, and atomic memory operations. The ROP units are organized in six groups of eight. Each group is serviced by a 64-bit memory controller. The memory controller, L2 cache, and ROP group are closely coupled-scaling one unit automatically scales the others. GF100 Streaming MultiProcessor NVIDIA's third generation Streaming Multiprocessor (SM) introduces several architectural innovations that make it their most programmable and efficient yet. Inside each of the 16 streaming multiprocessors are 32 CUDA processors (4x increase over GT200's SM design). Fermi's GF100 CUDA cores are designed for maximum performance and efficiency across all shader workloads. By employing a scalar architecture, full performance is achieved irrespective of input vector size. Operations on the z-buffer (1D) or texture access (2D) attain full utilization of the GPU. In GF100, the newly designed integer ALU supports full 32-bit precision for all instructions, consistent with standard programming language requirements. The integer ALU is also optimized to efficiently support 64-bit and extended precision operations. Various instructions are supported, including Boolean, shift, move, compare, convert, bit-field extract, bitreverse insert, and population count. Each SM has 16 load/store units, allowing source and destination addresses to be calculated for sixteen threads per clock. Supporting units load and store the data at each address to cache or DRAM. Four separate Special Function Units (SFUs) execute transcendental instructions such as sin, cosine, reciprocal, and square root. Graphics interpolation instructions are also performed on the SFU. Each SFU executes one instruction per thread, per clock; a warp (32 threads) executes over eight clocks. The SFU pipeline is decoupled from the dispatch unit, allowing the dispatch unit to issue to other execution units while the SFU is occupied. Complex procedural shaders especially benefit from dedicated hardware for special functions. 32 CUDA Cores (4x GT200) 48KB or 16KB of shared memory (3x GT200) 48KB or 16KB of dedicated L1 cache (none on GT200) 768KB of L2 cache (256KB on GT200) ISA improvements: single/double precision with 32-bit integer operations 4 Texture units New PolyMorph engine The new Fermi architecture is designed to enable 16/48KB of on-die cache storage for improved performance with minimal trip penalty. While the dedicated L1 and texture L1 caches handle most operations, a 768KB L2 cache delivers greater texture coverage and improved computer performance over previous L2 cache (GT200). NVIDIA GF100 Shader Model Diagram (click for high-resolution) The number of ROP (Render Output) units per ROP partition is doubled to 48 and fillrate is greatly improved as a result. 8x MSAA graphics performance is also improved through enhanced ROP compression, and allows 32x CSAA antialiasing mode. The additional ROP units better balance overall GPU throughput, even for portions of the scene that cannot be compressed. GF100 implements a new 32x CSAA antialiasing mode based on eight multisamples and 24 coverage samples. CSAA has also been extended to support alpha-to-coverage on all samples, enabling smoother rendering of foliage and transparent textures. GF100 produces the highest quality antialiasing for both polygon edges and alpha textures with minimal performance penalty. Shadow mapping performance is greatly increased with hardware accelerated DirectX 11 four-offset Gather4. Fermi GF100 Gaming Features During the 2010 International Consumer Electronics Show (aka CES), NVIDIA was host to a day-long deep technical session that covered architecture, technology, and implementation. While the bulk of this article has been spent discussing the former, this section will detail the latter with some examples of Fermi in real-world gaming action. NVIDIA's Tony Tamasai (a familiar personality from the NVIDIA Editor's Day 2008 event) introduced GF100 gaming, and with the help of development staff Benchmark Reviews sat back and enjoyed the show. NVIDIA hopes to push a state-of-the-art experience on video games with the GeForce GF100 GPU. While gamers are familiar with NVIDIA GeForce products and drivers, the company partners with developers on a routine basis to ensure that all games developed (regardless of video card support) will operate flawlessly and perform optimally with GeForce products. NVIDIA estimates that 90% of their resources are dedicated towards video game engineering, although numbers are considered an industry secret. Tony Tamasai mentions that NVIDIA has often worked with game developers up to 2-4 years before production, and help introduce new tools into their SDK to enable modern engineering paired to fresh game titles. There were numerous in-house test results quoted in the NVIDIA press deck during the seminar, but only one DirectX-11 video game actually ran in real-time before our eyes. Dark Void is a video game that NVIDIA first began working on with Airtight around September 2008, and assisted in adding GPU PhysX support to enhance jet-pack turbulence and particle weapon effects. Of course, a game that allows the player to blast into the atmosphere isn't complete without GeForce 3D Vision support. Even some older game titles benefit by the NVIDA GF100 GPU, beyond just an increase in frame rates. For example, Far Cry 2 will receive 32x CSAA functionality native to the game, but drive updates could also further add new features into existing co-developed video games. Supersonic Sled At the end of our deep tech-dive day, NVIDIA's Mark Daly regaled our attention with a demonstration of the Supersonic Sled mini-game. The fundamental purpose of the soon-to-be public demo is to illustrate new GF100 feature sets such as thrust turbulence and particle behavior. The premise is rather simple: control your supersonic rocket sled down the railway while keeping the pilot (and thrusters) in-tact. It's harder than it sounds, especially since you can pelt the sled with chickens and drop boxes at will. I suggested that NVIDIA use corn-on-the-cob instead of chickens, and include popcorn particles if they near the thrust. They said it was a great idea that could be easility implemented. We'll see. NVIDIA 3D-Vision Surround The original release of 3D-Vision was already impressive, as it truly earned our respected Benchmark Reviews Editors Choice Award for 2009. So how can 3D-Vision be outdone? The answer: allow it to work with up to three monitors. The newly dubbed NVIDIA 3D-Vision Surround (stereo) requires three 3D-Vision capable LCD, projector, or DLP devices and offers bezel correction support. Alternatively, NVIDIA Surround (non-stereo) supports mixed displays with common resolution/timing. 3D-Vision Surround is supported on GF100, as well as GTX 200 GPUs, but requires at least two video cards to enable (due to device connection restrictions). NVIDIA GF100-based video cards share the same two-device pipeline limit for video output that previous GeForce GT200-based products had. When I asked Jason Paul as to why NVIDIA has not overcome this hurdle with Fermi despite three video outputs (2x DVI + HDMI) being available on the video card, he stated that this has been a GeForce GPU limitation that has existed for years. Ultimately, you'll be required to use at least two NVIDIA GeForce products in SLI to output three video streams and enjoy NVIDIA 3D-Vision Surround. Final Thoughts NVIDIA has a tough decision on their hands, and Fermi is only the start of things to come. If GPGPU is to continue as it has in the GF100, then NVIDIA will one day need to hedge their bets and develop separate processors: one for retail consumers and another for high-performance computing (HPC). Making things even more challenging to chip designers is the push towards APUs, or Accelerated Processing Units. You'll hear this term used more and more once (if and when) products like Intel's Larrabee or AMD's Fusion finally make it to market. At the beginning of this article I compared the way GPUs have overtaken CPUs in terms of technical development progress with each new architecture. It's easy to make an assertion like this when neither the CPU nor GPU are every going to be directly comparable. To begin with, CPUs are strong at handling a wide variety of operations or controlling applications requiring limited threads, while GPUs are still best suited for multi-threaded applications with limited range of operations. It's true that one day the APU will have both side of the house under control, but until then NVIDIA is setting the pace for innovation while CPUs merely get bigger and faster. As for NVIDIA's Fermi architecture and its real-world performance results in GeForce GF100-based video cards: wait and see. We could always take NVIDIA on their word that GF100 is a remarkable improvement, but Benchmark Reviews really enjoys living up to its name. Samples are expected sometime before March 2010, and you'll see our results posted in our Featured Reviews: Video Cards section as soon as they're available. If the in-house test results are correct, NVIDIA will have easily made-up for the lost ground they've suffered to ATI's Radeon HD 5000 series. Questions? Comments? Benchmark Reviews really wants your feedback. We invite you to leave your remarks in our Discussion Forum. Thank you to NVIDIA for supplying the technical white paper and special deep-tech seminar that made this article possible. Our visitors can download the GF100 Graphics Architecture White Paper and GF100 Compute White Paper directly from NVIDIA. Related Articles: NVIDIA GeForce GTX 480 Fermi Video Card EVGA GTX 460 SC Superclocked Video Card GeForce GTS 450 SLI Scaling Performance ASUS GeForce GT 430 Fermi GF108 Video Card AMD Radeon vs NVIDIA GeForce: Graphics Last Stand NVIDIA GeForce GTX 580 Video Card Performance NVIDIA GeForce GTX 460 768MB Video Card ASUS GeForce GTX-465 Video Card Zotac GeForce GTX-480 Fermi Video Card ASUS ENGTS450 DirectCU TOP Video Card

Comments

# That imp shown here is from Doom 3, isn't it? — Tomer 2012-07-02 05:40

The imp in the picture is from the game "Doom 3", released in 2004. Pretty ironically, that game used normal mapping, and the image shown is probably showing the original hi-poly version of it.

Report Comment

Refresh comments list

Comments have been disabled by the administrator.

NVIDIA GF100 GPU Fermi Architecture

NVIDIA Fermi GF100 Processor (click for high-resolution)

What's new in Fermi?

About NVIDIA Corporation:

NVIDIA Product Lines

Geometric Realism

Photograph? It's actually NVIDIA OptiX technology.

Compute Architecture

Delivering Geometric Realism

Tessellation

Tessellation in DirectX-11

Four-Offset Gather4

GF100 Compute for Gaming

NVIDIA Fermi Architecture

GF100 Specifications

NVIDIA Fermi GF100 Block Diagram (click for high-resolution)

GF100 Streaming MultiProcessor

NVIDIA GF100 Shader Model Diagram (click for high-resolution)

Fermi GF100 Gaming Features

Supersonic Sled

NVIDIA 3D-Vision Surround

Final Thoughts

Related Articles:

Comments