It has been over two years since NVIDIA released its Pascal GPU architecture into the market and it has, for all intents and purposes, lead the way in graphics performance in 3D games during that time. When Pascal-based video cards (10 Series) hit the store shelves, it marked a notably different architecture in 14/16 nm FinFet and a significant reduction in power consumption for the same or greater performance. Today marks the release of NVIDIA’s latest video card and architecture in the Turing-based RTX GPUs. The RTX family will consist of the RTX 2070, RTX 2080, and RTX 2080 Ti initially. In a departure from past releases, consumers will be able to get their hands on the RTX 2080 and current flagship RTX 2080 Ti at launch with the RTX 2070 available a short time later.
The Turing architecture, which NVIDIA says is the biggest leap forward in a decade, is fabricated on TSMC’s 12 nm FFN (FinFet NVIDIA) manufacturing process and includes several new features including a new processor architecture, the Turing SM (Streaming Multiprocessors) is said to deliver a dramatic boost in shading efficiency able to achieve a 50% improvement in performance per CUDA Core compared to Pascal.
In addition to the new SMs, NVIDIA has integrated Tensor Cores which are specialized execution units designed for performing the tensor/matrix operations – the core compute functions used in Deep Learning. With this addition, a new form of Super Sampling dubbed Deep Learning Super Sampling (DLSS) is powered by these Tensor Cores. NVIDIA says, DLSS “leverages a deep neural network to extract multidimensional features of the rendered scened and intelligently combine details from multiple frames to construct a high-quality final image”. DLSS uses fewer input samples than traditional Anti-Aliasing techniques (such as TAA) and avoids the bottlenecks they face allowing for improved performance over the traditional methods.
Real-Time Ray Tracing acceleration has also found its way into the Turing chip and enables a single GPU to render games and complex professional models that have physically accurate shadows, reflections, and refractions, yielding a more photorealistic image on the screen. This ability is provided by Turing’s new Ray Tracing (RT) Cores which accelerate ray tracing and is leveraged by systems and interfaces outside of NVIDIA’s ray tracing technology. APIs such as Microsoft DXR, NVIDIA OptiX, and Vulkan utilize ray tracing in order to deliver a real-time ray tracing experience. While entire scenes are not rendered like this, close to all of what is displayed on the screen will still be generated by rasterization, developers will implement the features on a small part of the image with a few rays and the voids filled in with NVIDIA’s denoiser. We’ve seen a couple of impressive implementations of this already, in Battlefield V for example, performance hit be damned.
Last, but not least, Turing will be the first GPU architecture to support the new GDDR6 memory. GDDR6 is the successor to last generation’s GDDR5X and is said to have better power efficiency by 20% as well as 50% higher effective bandwidth. Along with GDDR6 are new methods of memory compression also designed to increase bandwidth and performance. NVIDIA continues to shy away from the more expensive (in the consumer space) alternative in HBM2 which seems like a good move considering both AMD and NVIDIA will support GDDR6.
What does this all mean? It means that consumers will get a notably faster and more efficient video card with new technologies designed to improve both image quality through new methods of AA and hardware accelerated ray tracing, as well as improved performance overall over the last generation. Being the bleeding edge and leading a technology push does come with drawbacks, namely in pricing. The new cards have a significant price increase over the previous generation.
Below we will go into more detail about the new implementations as well as test performance in our updated GPU testing suite. Read on to see how the card performed and see if the RTX video cards should be next on your shopping list of upgrades.
Turing in Detail
Turing TU102, TU104, and TU106
Turing GPUs will have three pieces of silicon, the full TU102, the highest performing GPU of the Turing line, as well as the TU104 and TU106 GPUs. The latter are scaled down versions and find its way into other products down the stack such as the RTX 2080 and RTX 2070.
TU102 contains six Graphics Processing Clusters (GPCs), 36 Texture Processing Clusters (TPCs), and 72 Streaming Multiprocessors (SMs). Each of the GPCs includes a dedicated raster engine and six TPCs, with each TPC including two SMs. Inside each SM are 64 CUDA Cores, eight Tensor Cores, a 256 KB register file, four texture units, and 96 KB of L1/shared memory able to be configured for various capacities depending on the graphics or compute workloads. A full implementation of the TU102 GPU includes 4,608 CUDA Cores, 72 RT Cores, 576 Tensor Cores, 288 Texture units, and 12 32-bit GDDR6 memory controllers (384-bits total). Attached to each memory controller are eight ROPs and 512 KB of L2 cache. The TU102 GPU contains a total of 96 ROPs and 6144 KB of L2 cache. The new GPU uses the NVLink in its full capacity of two x8 links providing 50 GB/sec in each direction (100 GB/sec total).
The TU104 chip has six GPCs and 48 SMs. Like the TU102, each GPC includes a dedicated raster engine and six TPCs with each TPC containing two SMs. The TU104 (RTX 2080) sports 3072 CUDA cores, 48 RT cores, and 368 Tensor cores. It also supports NVLink but unlike its big brother, one x8 NVLink is included providing 25 GB/sec of bandwidth in each direction (50 GB/sec total). This GPU will be found in both GeForce and Quadro products (RTX 2080 and Quadro RTX 5000).
Last is the Turing 106 GPU. This GPU which will be used in the RTX 2070 ships in October 2018. This die contains most of the new features found in the Turing architecture including the RT cores and Turing Tensor Cores. The RTX 2070 is based on the full implementation of this GPU. It contains three GPCs, 36 SMs, and eight 32-bit memory controllers (256-bit total). Within each GPC there is a raster unit and six TPCs. The full TU106 GPU has 2304 CUDA cores, 288 tensor Cores and 36 RT cores. NVLink is not found on this GPU and does not support SLI. NVIDIA seems to be holding those features for the high-end cards thwarting some users dreams of flagship beating performance at less cost.
Streaming Multiprocessor (SM) Architecture
The Turing architecture incorporates a lot of the features introduced in the Volta GV100 SM architecture including independent thread scheduling similar to Volta as well as supporting concurrent execution of FP32 and INT32 operations. Turing implements a major change of the core execution data paths – typical modern shader workloads use a mix of FP arithmetic instructions with simpler instructions such as integer adds for addressing and fetching data, and floating point compare. In previous shader designs, the floating-point math data sits idle when the non-GP-math instructions run. Turing adds a second parallel execution unit next to every CUDA core so that it’s able to execute these instructions in parallel with floating math.
Traditional graphics workloads partition the 96 KB L1/shared memory as 64 KB of dedicated graphics shader RAM and 32 KB for texture cache and register file spill area. Compute workloads, on the other hand, can divide the 96 KB into 32 KB shared memory and 64 KB L1 cache, or 64 KB shared memory and 32 KB L1 cache.
Turing’s SM also introduces a new unified architecture for shared memory, L1, and texture caching. The unified design allows the L1 cache to leverage resources, increase its hit bandwidth by 2x per TPC compared to Pascal, and allows it to grow larger when shared memory allocations are not using all of the capacity.
Overall, NVIDIA states the changes in SM enable Turing to achieve 50% improvement in delivered performance per CUDA core.
Tensor cores were first introduced in the Volta GV100 GPU, but Turing includes an enhanced version of these cores. The Turing Tensor Core design adds INT8 and INT4 precision modes for inferencing workloads that can tolerate quantization. FP16 is also supported fully for workloads requiring higher precision.
The Tensor cores are what makes it possible to bring real-time deep learning to gaming applications. The TCs accelerate the AI-based features of NVIDIA NGX Neural Services that can enhance graphics, rendering, and other types of client-side applications. Some examples of these features include Deep Learning Super Sampling (DLSS), AI InPainting, AI Super Rez, and AI Slow-Mo.
The Turing Tensor cores are said to provide a significant speedup to matrix operations and used for inference operations and deep learning training in addition to new neural graphics functions.
For gamers, DLSS seems to be the most interesting with the ability to maintain the same image quality yet bring performance up due to the way DLSS is processed through the Tensor Cores. The one caveat, DLSS is only supported by a few games, but it is likely more will incorporate the technology it as time goes on. NVIDIA mentions a total of 25 games that are currently in development:
Ray Tracing Cores
At the center of Turing’s hardware-based ray tracing acceleration is a Ray Tracing (RT) Core which is included within each Streaming Multiprocessor. The RT cores accelerate real-time ray tracing with NVIDIA implementing a hybrid approach between rasterization and ray tracing. Using this method, rasterization is used where it is most effective, while ray tracing is used where it provides the most eye-candy/visual benefits compared to rasterization. For example, reflections, refractions, and shadows.
The RT Cores used to speed up two important operations in ray tracing, Bounding Volume Hierarchy Traversal, and ray-to-triangle intersection testing. BVH will now run on the RT core instead of the CUDA cores which are then freed for other functions.
Software plays a critical role here with RTX being supported by NVIDIA OptiX, Microsoft DirectX for ray tracing, and Vulkan RT. The APIs provide a framework for ray traced applications with a process similar to DirectX game programming. In essence, the game engine gives object data, generates ray directions and starting points, and then receives information back on the ray itself. The rest of the process, such as GPU scheduling, object reversal, memory management, and hardware optimizations are handled by the library. This is said to significantly reduce the amount of work on the developer’s side.
NVIDIA also provided a list of games that will feature real-time ray tracing with more on the way:
- Assetto Corsa Competizione
- Atomic Heart
- Battlefield V
- Mechwarrior 5: Mercenaries
- Metro Exodus
- Shadow of the Tomb Raider
- Justice (Ni Shui Han)
- Project DH
Time limitations prevent going into everything in more detail so below I have listed several more features of the Turing architecture from the NVIDIA whitepapers:
Mesh Shading – Mesh shading advances NVIDIA’s geometry processing architecture by offering a new shader model for the vertex, tessellation, and geometry shading stages of the graphics pipeline, supporting more flexible and efficient approaches for computation of geometry. This more flexible model makes it possible, for example, to support an order of magnitude more objects per scene, by moving the key performance bottleneck of object list processing off of the CPU and into highly parallel GPU mesh shading programs. Mesh shading also enables new algorithms for advanced geometric synthesis and object LOD management.
Variable Rate Shading (VRS) – VRS allows developers to control shading rate dynamically, shading as little as once per sixteen pixels or as often as eight times per pixel. The application specifies shading rate using a combination of a shading-rate surface and a per-primitive (triangle) value. VRS is a very powerful tool that allows developers to shade more efficiently, reducing work in regions of the screen where full resolution shading would not give any visible image quality benefit, and therefore improving frame rate. Several classes of VRS-based algorithms have already been identified, which can vary shading work based on content level of detail (Content Adaptive Shading), rate of content motion (Motion Adaptive Shading), and for VR applications, lens resolution and eye position (Foveated Rendering).
Texture-Space Shading – With texture-space shading, objects are shaded in a private coordinate space (a texture space) that is saved to memory, and pixel shaders sample from that space rather than evaluating results directly. With the ability to cache shading results in memory and reuse/resample them, developers can eliminate duplicate shading work or use different sampling approaches that improve quality.
Multi-View Rendering (MVR) – MVR powerfully extends Pascal’s Single Pass Stereo (SPS). While SPS allowed rendering of two views that were common except for an X offset, MVR allows rendering of multiple views in a single pass even if the views are based on totally different origin positions or view directions. Access is via a simple programming model in which the compiler automatically factors out view independent code, while identifying view-dependent attributes for optimal execution.
Deep Learning Features for Graphics – NVIDIA NGX™ is the new deep learning-based neural graphics framework of NVIDIA RTX Technology. NVIDIA NGX utilizes deep neural networks (DNNs) and set of “Neural Services” to perform AI-based functions that accelerate and enhance graphics, rendering, and other client-side applications. NGX employs the Turing Tensor Cores for deep learning-based operations and accelerates delivery of NVIDIA deep learning research directly to the end-user. Features include ultra-high quality NGX DLSS (Deep Learning Super-Sampling), AI InPainting content-aware image replacement, AI Slow-Mo very high-quality and smooth slow motion, and AI Super Rez smart resolution resizing.
Deep Learning Features for Inference – Turing GPUs deliver exceptional inference performance. The Turing Tensor Cores, along with continual improvements in TensorRT (NVIDIA’s run-time inferencing framework), CUDA, and CuDNN libraries, enable Turing GPUs to deliver outstanding performance for inferencing applications. Turing Tensor Cores also add support for fast INT8 matrix operations to significantly accelerate inference throughput with minimal loss in accuracy. New low-precision INT4 matrix operations are now possible with Turing Tensor Cores and will enable research and development into sub 8-bit neural networks.
Second-Generation NVIDIA NVLink – Turing TU102 and TU104 GPUs incorporate NVIDIA’s NVLink™ high-speed interconnect to provide dependable, high bandwidth and low latency connectivity between pairs of Turing GPUs. With up to 100GB/sec of bi-directional bandwidth, NVLINK makes it possible for customized many workloads to efficiently split across two GPUs and share memory capacity. For gaming workloads, NVLINK’s increased bandwidth and dedicated inter-GPU channel enables new possibilities for SLI, such as new modes or higher resolution display configurations. For large memory workloads, including professional ray tracing applications, scene data can be split across the frame buffer of both GPUs, offering up to 96 GB of shared frame buffer memory (two 48 GB Quadro RTX 8000 GPUs), and memory requests are automatically routed by hardware to the correct GPU based on the location of the memory allocation.
GDDR6 High-Performance Memory Subsystem – Turing is the first GPU architecture to support GDDR6 memory. GDDR6 is the next big advance in high-bandwidth GDDR DRAM memory design. GDDR6 memory interface circuits in Turing GPUs have been completely redesigned for speed, power efficiency and noise reduction, achieving 14 Gbps transfer rates at 20% improved power efficiency compared to GDDR5X memory used in Pascal GPUs.
USB-C and VirtualLink – Turing GPUs include hardware support for USB Type-C™ and VirtualLink™4. VirtualLink is a new open industry standard being developed to meet the power, display, and bandwidth demands of next-generation VR headsets through a single USB-C connector. In addition to easing the setup hassles present in today’s VR headsets, VirtualLink will bring VR to more devices.
Video and Display Engine – Consumer demand for higher resolution displays continues to increase with every passing year. For example, 8K resolution (7680 x 4320) requires four times more pixels than 4K (3820 x 2160). Gamers and hardware enthusiasts also desire displays with higher refresh rates in addition to higher resolution to experience the smoothest possible image. Turing GPUs include an all-new display engine designed for the new wave of displays, supporting higher resolutions, faster refresh rates, and HDR. Turing supports DisplayPort 1.4a allowing 8K resolution at 60 Hz and includes VESA’s Display Stream Compression (DSC) 1.2 technology, providing higher compression that is visually lossless.
Turing GPUs are able to drive two 8K displays at 60 Hz with one cable for each display. 8K Resolution can also be sent over USB-C. The new display engine supports HDR processing natively inside the display pipeline. Turing GPUs include an enhanced NVENC encoder unit that adds support for H.265 (HEVC) 8K encode at 30 FPS. NVIDIA states the new encoder provides up to 25% bitrate savings for HEVC and up to 15% savings for H.264.
The table below shows the specifications for new Turing-based GPUs as well as the previous generation, GTX 1080 Ti for comparison.
|NVIDIA Turing GTX 1080 Ti / RTX 2070 / RTX 2080 / 2080 Ti Specifications|
|GPU Model||GTX 1080 Ti||RTX 2080 Ti||RTX 2080||RTX 2070|
|GPU Base Clock (MHz)|
Reference / Founders Ed.
|1480 / 1480||1350 / 1350||1515||1410|
|GPU Boost Clock (MHz)|
Reference / Founders Ed.
|1582 / 1582||1545 / 1635||1800||1620 / 1710|
|Frame Buffer Memory Size and Type||11 GB GDDR5X||11 GB GDDR6||8 GB GDDR6|
|Memory Clock (Data Rate)||11 Gbps||14 Gbps|
|Memory Bandwidth (GB/sec)||484||616||448||448|
|Texture Fill Rate (Gigatexels /sec)||354.4 / 354.4||420.2 / 444.7||314.6 / 331.2||233.3 / 246.2|
|GigaRays /sec||N/A||10 GR/s||8 GR/s||6 GR/s|
|L2 Cache Size||3 MB||6 MB||4 MB|
Reference / Founders Ed.
|250 / 250W||250 / 260W||215 / 225W||175 / 185W|
|Transistor Count (Billions)||12||18.6||13.6||10.6|
|Die Size (mm²)||471||754||545||445|
|Manufacturing Process||16 nm FinFet||12 nm FinFet|
|Price – MSRP (Reference / Founders)||$699||$999 / $1199||$699 / $799|
As always, we dial up GPUz to confirm some of the specifications and clock speeds. Looks like the curtains match the sheets, so all is okay here. I reached out to GPUz and asked if they would be adding fields for the new NVIDIA cores, and was told until it becomes more popular, its doubtful, though it may be on the advanced tab.
|GPUz – RTX 2080||GPUz – RTX 2080 Ti|
Retail Packaging and Accessories
The retail packaging for the Founder’s Edition cards is newly designed and pretty unique. The packing is rectangular with a slide off top as opposed to the ‘slide out box-in-box’ we are used to. The front of the box displays the GeForce RTX branding prominently in the bottom left-hand corner along with the model number below it. The box color will stick out on store shelves with its design, and a contrasting grey and NVIDIA green color scheme.
Meet the NVIDIA RTX 2080 and RTX 2080Ti
NVIDIA GeForce RTX 2080 and RTX 2080 Ti
Opening up the box reveals the new cards sitting snug in dark grey/black dense foam. Inside the top of the box is another piece of custom cut foam insulation so the card sits with minimal movement inside the packaging. The included accessories, a DP to DVI adapter and Quick Start guide, sit in their own section of the foam hidden outside of a tab that sticks up for ease of getting the box out.
|RTX 2080||RTX 2080 Ti|
A Closer Look
Our first look shows a pretty dramatic design change for the new RTX cards. Gone is the blower-style angular-shaped shell replaced by a more traditional dual fan setup and is also more conservatively styled. The outside of the fans are covered with grey cladding while between the fans is a black along with the GPU model written vertically.
In an effort to cool the GPU more efficiently, the dual axial fans are powered by a three-phase motor intended to limit vibrational noise. They are equipped with 13 fan blades which should move more air without spinning faster, keeping the noise down. The air is primarily exhausted out of the top and bottom of the card as opposed to the previous generation’s blower which blew most of the heated air out of the case.
|RTX 2080||RTX 2080 Ti|
|RTX 2080 Offset 1||RTX 2080 Offset 2|
|RTX 2080 Bottom||RTX 2080 Ti Backplate|
The backplate is not only for aesthetic purposes but is also functional using thermal pads to make contact with hot spots on the PCB and shed that heat load to the backplate. The pads cool the back of the power delivery bits as well as the GDDR6 memory.
Outputs – 3x DP, 1x HDMI, 1x USB Type-C
Both the 2080 and 2080 Ti Founders Edition boards use three Displayport 1.4a outputs supporting up to 8K resolution at 60 Hz from a single cable (with DSC 1.2). Users will also find a single HDMI 2.0b connector (supports HDCP 2.2) and a VirtualLink (USB Type-C) connector designed for next-generation Virtual reality headsets or other high-bandwidth applications.
|RTX 2080 – 8+6-pin PCIe||RTX 2080 Ti – Dual 8-Pin PCIe|
In order to get power to the cards, the GTX 2080 with its 225 W TDP, uses one 8-pin and one 6-pin PCIe connector able to deliver up to 300 W between the two PCIe power leads (150 W + 75 W) and the PCIe slot (75 W). This should allow ample power headroom for overclocking while still being in spec on the power connections. The 2080 Ti, on the other hand, will use two 8-pin PCIe connectors for a total of 375 W of in spec power capabilities. Like the RTX 2080, this will allow for plenty of headroom without being out of specification.
Earlier in the review, we mentioned NVLink making its way into the consumer line of cards (read: Non-professional series Quadro cards). With it comes copious amounts of bandwidth for inter-card communication, up to 100 GB/sec in the 2080 Ti. The increased bandwidth allows for new SLI modes as well as higher resolution display configurations. Perhaps this can support an SLI resurgence as the technology really hasn’t had significant traction in the market over the past couple of years.
|RTX 2080 PCB||RTX 2080 Ti PCB|
|RTX 2080 back of PCB||RTX 2080 Ti Back of PCB|
Taking off the redesigned vapor chamber cooler and backplate consisted of removing around 2 dozen screws in total which then exposes the board. We can see the large cores hidden under very liberal applications of thermal paste, the RTX 2080 having the most I’ve seen. We can also spot the GDDR6 memory on both cards as well as two different power delivery configurations.
|RTX 2080 – 8+2 Phase Power (GPU + Memory)||RTX 2080 Ti – 10+3 Phase Power (GPU + Memory)|
The cards have better-designed power delivery bits with the intent to deliver more stable power to GPU and memory according to NVIDIA. Both cards utilize an iMon DrMOS power supply which features a new dynamic power management system capable of quicker current monitoring (sub-millisecond) allowing the power supply to better control power coming into the GPU. This allows the power supply to provide more power headroom for improved overclocking. The iMON MOSFETs are able to enable/disable power phases depending on load or to maintain efficient operations during low-load situations such as when idle.
|MOSFETs and Caps (OnSemiconductor)||Micron GDDR6|
The RTX 2080 Founders Edition has a 10 phase system in an 8+2 configuration with eight dedicated to the GPU and the other two for the GDDR6 memory. The 2080 Ti uses a 13 phase setup with 10 phases for the GPU and three slated for the memory. The OnSemiconductor MOSFET used Smart Power Stage (SPS) modules and rated up to 70 A (@ 25 °C). They provide both power (current) and temperature monitoring. On the memory side, we can see from both the GPUz screenshot and the photo that Micron IC’s are used in these samples. Since GDDR6, unlike GDDR5X, will be adopted by both AMD and NVIDIA we should see different brands on cards down the road.
|TU104 from RTX 2080||TU102 from RTX 2080 Ti|
Last, but not least, are a couple shots of the TU104 and TU102 dies themselves. The TU102 found in the RTX 2080 Ti measure 754 mm² while the RTX 2080’s TU104 is 554 mm². This compared to the smaller size of the 1080 Ti at 471 mm² and the 1080 at 345 mm². Even with the process shrink, the addition of the new RT and Tensor cores makes it much larger than Pascal.
Test System and Benchmark Methods
Our test system is based off the latest mainstream Intel platform, Z370, and uses the i7-8700K 6c/12t CPU. The CPU is overclocked to 4.7 GHz on all cores/threads with cache set to 4.3 GHz. The clock speeds used provides a good base to minimize any limitations the CPU may have on our titles, particularly when using the lower resolutions, and should be attainable with a good air cooler or better. DRAM is in a 2×8 GB configuration at 3200 MHz with CL15-15-15-35-2T timings which is a middle of the road option that balances performance and cost.
|Test System Components|
|Motherboard||ASRock X370 Taichi|
|CPU||Intel i7 8700K @ 4.7 GHz / 4.3 GHz Cache|
|CPU Cooler||EVGA CLC 240|
|Memory||2×8 GB G.Skill Trident Z 3200 MHz CL15-15-15-35|
|SSD||Toshiba OCZ TR200 480 GB (OS + Applications)|
|Power Supply||EVGA 750W G3|
|Video Card||NVIDIA RTX 2080 and RTX 2080Ti (411.51 drivers)|
Thanks goes out to EVGA for providing the CLC 240 CPU Cooler and 750 W G3 Power Supply to cool and power the system, G.Skill for the Trident Z DRAM, and Toshiba OCZ for the 480 GB TR200 SSDs storage running the OS, benchmarks, and games. With our partners helping out, we are able to build matching test systems to mitigate any differences found between using different hardware. This allows for multiple reviewers in different locations to use the same test system and compare results without additional variables.
Below are the tests we run with a brief description of the settings. We have made some significant changes since the last update adding a few new titles and dropping some of the older games. More details can be found in the GPU Testing procedure article which we have updated with our latest benchmarks.
- UL 3DMark Time Spy – Default settings
- UL 3DMark Fire Strike (Extreme) – Default settings
- Shadow of the Tomb Raider – DX12, “Highest” preset (will add RTX when it has been patched)
- The Division – DX12, Ultra preset, VSync off
- Ashes of the Singularity: Escalation – DX12, Crazy preset, GPU focused
- Far Cry 5 – Ultra defaults
- F1 2018 – Very High defaults, TAA and x16 AF, Australia track, show FPS counter
- World of Tanks: Encore Benchmark – Ultra defaults
- Final Fantasy XV Benchmark – High defaults
Our first set of benchmarks hail from Underwriters Laboratories who acquired Futuremark back in 2014. Earlier in 2018, a rebrand occurred and since that time, Futuremark is now UL. The benchmarks have not changed, just the name. We chose to stick with 3DMark Fire Strike (Extreme) and 3DMark Time Spy as these tests give users a good idea of performance on modern titles.
3DMark Time Spy is a DX12 benchmark designed for Windows 10 PCs. It supports new API features such as asynchronous compute, explicit multi-adapter, and multi-threading and runs at 2560×1440 resolution. 3DMark Fire Strike (Extreme) is a DX11 based test which UL says the graphics are rendered with detail and complexity far beyond other DX11 benchmarks and games. This benchmark runs at 1920×1080.
Of note in this graph. The GTX 1070Ti and GTX 1080 percent values we see are based off the RTX 2080 results/percentage as a baseline. The GTX 1080Ti results/percentage is based off the stock RTX 2080 Ti results.
In the Fire Strike results, we see the Founder’s Edition 2080 score 12575. This result places it close to 18% faster than a highly overclocked out of the factory EVGA GTX 1080 FTW2. If one is to compare founders to founders directly, that gap would grow by a few/several percents. Here the difference between the RTX 2080 and RTX 2080Ti is almost 21%. The RTX 2080 Ti scores a whopping 15203 here beating out the 1080 Ti it replaces by over 8%. That gap is smaller than many would expect but is due to the overclocked 1080 Ti used, as well as the benchmark running at the 1080p resolution where the CPU becomes a significant factor in allowing such powerful cards to stretch their legs.
The Time Spy results have the RTX 2080 scoring 10496 which is nearly 25% faster than the overclocked GTX 1080 used for testing. The RTX 2080 Ti scores 12456 here which is about 19% faster than the overclocked GTX 1080 Ti. I’d expect that value to be closer to 25% or more when comparing founders to founders.
Moving on to the gaming benchmarks, we have updated our testing suite to bring more modern titles into the mix. Gone are GTA V, Crysis 3, and Rise of the Tomb Raider, which were replaced with Shadow of the Tomb Raider, World of Tanks: enCore benchmark, F1 2018, Final Fantasy XV benchmark, and Far Cry 5. We kept The Division and Ashes of the Singularity (though we updated to AOTS: Escalation). The games should provide a good view of the overall performance as many of these are DX12 games.
Sadly, we will not be able to test some of the DLSS features as we are having issues with downloading the file (working with NVIDIA to get it sorted). Ray Tracing will also not be tested here as none of the titles out that we have, currently support the technology. In the future, SoTR will have it along with many other titles so we will circle back when appropriate.
The results below are all in 1920x1080p which will skew the perception of the cards when comparing them to the others with the resolution being so ‘low’. The RTX 2080 and RTX 2080 Ti were not intended for 1080p gameplay but for higher resolutions, in particular, 4K. Those results are found towards the end of this section.
World of Tanks: Encore / F1 2018
Our World of Tanks: enCore benchmark shows the stock RTX 2080 managing nearly 229 FPS in this title. This puts it around 40 FPS more than a GTX 1080 and is within 3 FPS of the Asus Strix 1080 Ti OC used, an impressive showing. The RTX 2080 Ti hits over 263 FPS at stock while beating out the 1080 Ti over 21 FPS.
F1 2018 shows the RTX 2080 reaching 157 FPS here while the RTX 2080 Ti manages 194 FPS. This is 37 and 32 FPS faster than the 1080 and 1080 Ti we have tested against.
Clearly, both cards won’t have an issue on these games with the settings cranked, easily able to handle a 144 Hz monitor.
Far Cry 5 / The Division
In Far Cry 5, the RTX 2080 manages 133 FPS average with the RTX 2080 Ti at 144 FPS. The difference here between the last gen cards and the new Turing units are fairly small due to the low resolution.
Moving on to The Division, the RTX 2080 reached about 130 FPS while the 2080 Ti hit 160 FPS. Again the gaps get larger as the resolution increases. We can take away from here the RTX 2080 hangs in there pretty well at this resolution with an overclocked 1080 Ti.
Shadow of the Tomb Raider / Final Fantasy XV
Shadow of the Tomb Raider results has the RTX 2080 hitting 123 FPS with the 2080 Ti averaging 143 FPS. Here again, the RTX 2080 is keeping up with the 1080 Ti while the 2080 Ti runs away with the show. Even at the lower resolution, 2080 is over 26% faster than the GTX 1080.
In the Final Fantasy XV benchmark, we were unable to test DLSS with this title as downloading proved to be difficult from the get-go. Regardless, testing here shows the RTX 2080 hitting 110 FPS with the 2080 Ti going over 130 FPS. The RTX 2080 actually beat the 1080 Ti in this benchmark as it stands with the RTX 2080 Ti again stealing the spotlight and running away with things. DLSS should only make those gaps larger and the Turing based cards that support DLSS run that much faster.
Ashes of the Singularity: Escalation
Our last game test at the (now painfully low) 1080p resolution is Ashes of the Singularity: Escalation. Here the RTX 2080 hit nearly 83 FPS with the big brother 2080 Ti reaching nearly 102 FPS. Here again, the RTX 2080 is hanging with a 1080 Ti.
2560×1440 and 4K UHD Results
As we said before, NVIDIA does not market these cards for 1080p, but notably higher resolutions. To that end, we tested at 2560×1440 as well as 4K UHD resolutions.
Instead of continuing to give your web wheel a workout, we have taken the liberty of combining all the game results into one graph so users are able to see the difference between the cards. In this graph, the 1080 FTW2 and the Strix GTX 1080 Ti OC and both RTX cards were used to see the generational gap between them.
Our take away here is the RTX 2080 clearly outpaces the overclocked 1080 and is incredibly close to the overclocked 1080 Ti even running faster in Final Fantasy. The 2080 Ti clearly runs notably faster in all tests than the overclocked 1080 Ti to the tune of around 17% average while the RTX 2080 is on average 24% faster in these titles. Again, that gap will grow by a few/several percents due to the factory overclocking on the GTX series card.
The 4K graph dropped the GTX 1080 as it is clearly not a 4K card at these settings. Here we can see the RTX 2080 keeping up with the overclocked 1080 Ti in SoTR, Far Cry 5, F1 2018, and FF XV quite close at stock speeds. Overclocking will close any gaps and surpass the 1080 Ti in most cases. The RTX 2080 Ti, on the other hand, is the story here. This card is officially a 4K UHD capable card reaching 59 FPS+ across all titles using the highest settings we currently have in the games. It beats out the overclocked GTX 1080 Ti. Overclocking and comparing like cards would yield an even greater distance over the GTX 1080 Ti.
Overall the RTX 2080 is a fairly capable 4K card, akin to the last generation GTX 1080 Ti. Some image quality sacrifices will have to be made to reach that magic 60 FPS average. The RTX 2080 Ti though is right there even with our high settings. If you would like to 4K at 60 FPS, users need to look no further than the RTX 2080 Ti. It is here, finally. And with the price of 4K UHD monitors coming down, this may be the next majority resolution in PC gaming over the next few years.
Finally, we get to my favorite part, the overclocking. NVIDIA has added a new feature in an aptly named, NVIDIA Scanner in conjunction with a redesigned EVGA Precision interface.
The NVIDIA Scanner allows users to simply press a button in software (in this case EVGA’s Precision X1), and watch the system automatically find a good overclock. This automation can cut off a lot of time for users manually overclocking their cards. The process is based on a two-way feedback loop using the NV scanner API, a test algorithm, and an NVIDIA workload. This is akin to in-BIOS (low-level) stability testing. The test process according to NVIDIA shouldn’t take longer than 20 minutes. On both the RTX 2080 and 2080 Ti, it took a bit under 10 minutes. After it is completed, the slope is updated to the new curve which the NV scanner found.
NVIDIA also made the FE cards a bit more friendly towards overclocking. While there are still power limits in place, the factory BIOS limit is notably higher than we have seen in the past. Typical power limits on FE cards do not let users go above 10% (give or take), however, both GPUs allow up to a 23% increase in power. In addition, NVIDIA has incorporated a more robust power delivery on the FE cards to send more and cleaner power to the GPU and memory. Phase count, while important, doesn’t mean much without having solid specified parts from the voltage controller (iMON DrMOS) and MOSFETs used. But it looks like NVIDIA stepped up their game to provide a bit more overclocking headroom.
The screenshots below are from EVGA’s Precision X1 software which has the ‘one-button’ overclocking capabilities. The software runs for a while and spits out a number on the left-hand side. It is a score and doesn’t seem to correlate to peak clock speeds, but it is ‘close’. For example, in the 2080 Ti, the result was +145. After applying the results, GPUz reported 1419 base clock (from 1350), less than 100 MHz. Boost clock raised from 1635 to 1704 MHz with an actual core clock speed of 1960 MHz.
|EVGA Precision X1 – Main||Fan Control|
|Temperatures||NV Scanner Result|
The NV Scanner does a pretty good job finding the maximum clock speeds of the core. While there was some meat left on the bone for both cards, there wasn’t much. As a side note, the scanner does not adjust the memory at all and that will have to be a manual process. Overall, the NV scanner did a solid job on the core and will be a value-add for those who would like to overclock but don’t want to put in the time it takes for testing clock speeds. A quick 10 minutes and then a few more playing with memory, one can find the maximum in these cards pretty quickly.
|Pushing the Limits – RTX 2080||Pushing the Limits – RTX 2080 Ti|
For manual overclocking, the results the scanner came up with are already really close to the 23% power limit increase and flirted with the 83 °C temperature limit making small adjustments to the fan slope. Setting the fan speed manually will lower temperatures and raise the boost bins as well. To that end, we managed to top out the RTX 2080 at +129 on the slider which yielded 1644 (base) / 1929 (boost) / 2055 MHz (actual) core clock speeds. After manually adjusting the memory, we were able to achieve 8226 MHz (from 7000 MHz) using a +1227 value on the slider. The 2080 Ti was able to reach + 1422 (base) / 1707 (boost) / 1935 MHz (actual). We were able to set the memory speeds to 8104 MHz (from 7000 MHz) using +1104 on the slider. For both cards, anything past this would cause the core speed to drop as it was hitting the power limit. Much past that and we saw artifacts on screen, so we have in fact reached the limit in our testing.
Temperatures and Power Use
The RTX 2080 and 2080 Ti have made some significant changes to their cooling system including the move from a single blower fan to using two larger axial fans instead. Along with the fan change, the heatsink uses a large ‘dual capacity’ vapor chamber with direct contact to a base plate in which there are multiple heat pipes embedded.
The result here according to NVIDIA is a cooler and quieter running card. The fans sit at 40% on idle and are nearly inaudible around other system fans (2). While we were unable to test the noise levels at this time, when gaming, the fans ramped up a bit but were not loud at all. Once the fans get past the 70% mark or so is when users will likely notice their noise profile. Without a doubt, it is an improvement from the single blower fan setup of past generations.
The RTX 2080 is specified to be a 225 W in the Founder’s Edition form we have in our hands. This coupled with our test system (i7-8700K running at 4.7 GHz 1.2 V) has the system coming in at a peak of 384 W at the wall while testing our two games (Shadow of the Tomb Raider and F1 2018). After running the NV Scanner and overclocking, power use jumped to 400 W in SoTR and 416 W in F1 2018. With that, a quality 550-650 W power supply will happily run the new RTX 2080 without issue.
The RTX 2080 Ti used a little more power than the RTX 2080 in our testing yielding a peak power reading with the GPU at stock clocks of 392 W. This is around eight watts more than a card that can use around 25 W less. I had expected the spread to be a bit more. When overclocking the 2080 Ti through NV Scanner, our peak power jumped up to 419 W at the wall.
The new cooler kept temperatures under control and quiet on both cards during testing. The dual axial fans move a fair amount of air which exhausts out the top and bottom (PCIe slot) of the card. The default fan curve leans on the side of quiet over performance while still keeping temperatures under the default target temperature of 84 °C during our testing. I’d imagine longer gaming sessions will see that temperature go up a bit, but it would be easily mitigated by small adjustments in the fan curve. Overall, the new cooler looks better than the blower style and its new heatsink, base plate, and thermal chamber are doing a good job at keeping the card cool in our test environment and should do the same in user’s homes as well.
So what do we think about NVIDIA’s new Turing based cards? My friends, it is complicated and depends on how you view a couple of factors. One thing we didn’t mention until now (listed in the specifications though) was the price. The cost for a Geforce RTX 2080 will run users $699 for the reference card or $799 for the Founders Edition. The RTX 2080 Ti starts off at $999 for the reference model and $1199 for the Founders Edition (FE) we tested.
Unlike previous generation FEs we have tested, the Turing cards have a more robust power delivery system and cooling whereas previous generation reference/FE cards, it was more of a ‘good enough’ situation. In essence, compared to the reference cards, the NVIDIA Founders Cards have a slight clock speed increase, much like board partner cards. The difference is better thermal management due to the new heatsink and dual vapor chamber along with the twin axial fans. Whether or not that is worth the premium over a reference model is up to the user making the purchase.
But it isn’t all about overclocking. The Turing brought to the table several new features in the hardware which, when software catches up, if software catches up, can be huge on both a performance level as well as better image quality when talking ray tracing. But that is the thing…will this new way of doing things on the developer side take hold and force AMD to respond with hardware support of ray tracing and adding Tensor Cores? Only time will tell on that front. But it is clear NVIDIA is trying to lead this charge and have gone all in doing so. We will circle back in time once we are able to get DLSS working in FF XV and SoTR is patched to see what those results will be.
The value proposition versus the last generation of cards is clearly not the same. For example, we saw significant performance improvements from the GTX 980 Ti to the GTX 1080 Ti (to the tune of 30% or more) with the price going up a mere $50. In this case, we see overall performance gains (without RT or DLSS) sit in the 20% range. However, the price has increased $300 or almost 40% without the performance metrics in ALL TITLES to back it up. We will see significant increases with DLSS in games that use it, but ray tracing, a technology in its infancy, will actually slow things down compared to cards which do not have the hardware to support it. The positive tradeoff here is much better shadows, refractions, and reflections (image quality in general), but it asks the question to prospective buyers if it is ‘worth it’. Those who love the best of the best, this is a hands-down winner. There is nothing in the market that can compete with the RTX 2080 or RTX 2080 Ti. You want 4K UHD at ~60 FPS, The RTX 2080 Ti is your only single card choice – the first single card choice at that resolution with Ultra settings.
So as one can see, it really depends on what angle you approach the card to determine if it is something worth buying. If it is in a buyer’s head they need to see a 1:1 increase in performance and price like in the past, this simply isn’t the card for that person. However, if one can see the forest through the trees and in particular plays some of the popular titles coming out which will feature ray tracing and DLSS technology, the value proposition is a different story altogether. For those on the bubble, like with AMD and Vulkan and all of its promises, it may benefit a wait and see approach to see how the landscape shakes out with games supporting the card’s abilities. But again, from a pure performance standpoint, there is nothing faster in the market now than the RTX 2080 Ti. And if DLSS makes it into enough titles in a short amount of time, there is no looking back.