How GPUs Work: Rendering, AI, Energy Use, and Nvidia’s Market Power

Summary

  • GPUs (Graphics Processing Units) are specialised processors designed to perform many simple calculations in parallel, making them ideal for graphics and AI workloads.
  • CPUs vs GPUs: CPUs excel at a few complex, branching tasks; GPUs excel at huge numbers of similar, repetitive tasks.
  • Rendering pipeline has four key stages: vertex processing, rasterisation, fragment/pixel shading, and writing to the frame buffer.
  • Matrix and tensor operations are the core maths behind neural networks, and GPUs (and TPUs) are optimised to perform them extremely fast.
  • A die is the actual piece of silicon containing the processor circuitry; GPUs and CPUs are dies packaged on boards or within system-on-chip designs.
  • Energy use: powerful GPUs can draw a few hundred watts each; continuous AI workloads can consume energy comparable to home appliances like ACs or water heaters.
  • Nvidia and regulation: Nvidia doesn’t have a legal monopoly on GPUs but has dominant market power, especially in AI. European regulators are probing whether it uses this dominance to unfairly lock in customers.

What Is a GPU?

  • Basic idea
    • A Graphics Processing Unit (GPU) is a highly parallel number-cruncher.
    • It is designed to execute the same type of operation on many data items at once.
  • Analogy
    • Imagine checking exam papers for an entire school:
    • One teacher working alone (like a CPU) can do it, but it takes days.
    • Hundreds of teachers working in parallel (like GPU cores) can finish in an hour.
    • Each GPU core is simpler than a CPU core, but the sheer number of cores allows massive parallelism.

GPU vs CPU: Key Differences

  • CPU (Central Processing Unit)
    • Optimised for:
      • A smaller number of complex, branching tasks.
      • Quickly switching between many different types of tasks.
    • Die area is heavily used for:
      • Complex control logic.
      • Large caches (fast on-chip memory).
      • Features that improve single‑threaded performance and decision-making speed.
  • GPU
    • Optimised for:
      • Huge numbers of similar, repetitive operations.
      • Data-parallel tasks such as graphics rendering, machine learning, simulations, and image processing.
    • Die area is heavily used for:
      • Many repeated compute blocks (thousands of simpler cores).
      • Very wide data paths.
      • High-bandwidth memory controllers and on-chip networks.
    • Often has more total transistors than a CPU and can be physically very large.
  • Workload example: drawing a frame on screen
    • A 1920×1080 display has about 2.07 million pixels per frame.
    • At 60 frames per second, that’s over 120 million pixel updates per second.
    • Each pixel’s colour depends on:
      • Lighting, shadows, textures, material properties, reflections, etc.
    • This is ideal for GPUs because the same steps are applied to many pixels in parallel.

The Four Steps in the Rendering Pipeline

  • 1. Vertex Processing
    • Input: 3D objects broken into triangles; each triangle has vertices (corner points).
    • Work done:
      • Use matrix maths to:
      • Rotate objects.
      • Move (translate) them in 3D space.
      • Apply camera perspective (how 3D appears on a 2D screen).
    • Output: positions of triangle vertices as they should appear on the screen.
  • 2. Rasterisation
    • Input: triangle positions on the screen.
    • Work done:
      • Decide which pixels each triangle covers.
      • Convert geometric triangles into pixel-sized fragments.
    • Output: a set of fragments that are candidates to become pixels.
  • 3. Fragment / Pixel Shading
    • Input: fragments (potential pixels) with basic info.
    • Work done for each fragment:
      • Look up textures (images mapped onto surfaces).
      • Compute lighting (angle and intensity of light sources).
      • Apply shadows and reflections.
      • Combine all effects to determine the final colour and transparency.
    • Output: final colour values for each pixel position.
  • 4. Writing to the Frame Buffer
    • Input: final pixel colours.
    • Work done:
      • Write colours into a special memory area called the frame buffer.
      • The display hardware reads this buffer and shows the image on screen.
    • Output: a complete image frame ready for display.
  • Shaders
    • These steps are executed by small programs called shaders.
    • The GPU runs the same shader code across many vertices or fragments in parallel.

Memory: VRAM, Caches, and Bandwidth

  • VRAM (Video RAM)
    • Dedicated memory on the graphics card.
    • Stores:
      • 3D models.
      • Textures.
      • Intermediate data.
      • Final frame buffer.
    • Designed for high bandwidth – moving large volumes of data per second.
  • Caches and shared memory
    • Smaller, faster memories inside the GPU.
    • Reduce the need to repeatedly fetch the same data from VRAM.
    • Help prevent memory access from becoming a performance bottleneck.
  • Why this matters
    • Many non-graphics tasks (e.g., machine learning, simulations) also involve applying the same operation to huge arrays of numbers.
    • GPUs’ combination of parallel cores and high memory bandwidth makes them ideal for such workloads.

What Is a Die and Where Is the GPU Located?

  • Die
    • A die is the flat piece of silicon that actually contains the transistors and circuits of the chip.
    • Measured in square millimetres (mm²).
    • Both CPUs and GPUs are silicon dies made with similar fabrication technologies (e.g., 3–5 nm nodes).
  • Discrete graphics card
    • The GPU die sits under a heat sink (and often a fan or liquid cooling) on a graphics card.
    • Surrounded by VRAM chips on the same printed circuit board (PCB).
    • The entire card plugs into the motherboard via a high-speed connector (e.g., PCIe).
  • Integrated graphics
    • In many laptops and smartphones, the GPU and CPU are on the same die or in the same package.
    • Such designs are called systems-on-a-chip (SoCs).
    • They integrate CPU cores, GPU cores, memory controllers, and other components in one compact unit.

Are GPUs Smaller Than CPUs?

  • Not inherently smaller
    • GPUs are not smaller because of different physics or transistor types.
    • Both use similar silicon transistor technologies.
  • Architectural differences
    • CPUs:
      • More die area for control logic and caches.
      • Optimised for general-purpose, low-latency, branching code.
    • GPUs:
      • More die area for repeated compute blocks (many cores).
      • Very wide data paths and large register files.
      • Extra hardware for high-bandwidth memory connections, display controllers, on-chip networks, etc.
  • Transistor counts and size
    • High-end GPUs often have more total transistors than many CPUs.
    • They are not necessarily more densely packed per mm².
    • Some GPU packages place dynamic RAM (e.g., HBM) very close to the GPU die via short, high-bandwidth connections.

Matrix and Tensor Operations

  • Matrix operations
    • A matrix is a 2D grid of numbers (rows and columns).
    • Matrix multiplication combines two matrices (say A and B) into a third (C).
      • Example: an element c12 in C depends on a combination like a11b12 + a12b22 (and so on, depending on the full size of the matrices).
    • Matrix operations are central to many algorithms in graphics and machine learning.
  • Tensor operations
    • A tensor generalises matrices to higher dimensions:
      • 1D: vector.
      • 2D: matrix.
      • 3D or more: higher-order tensor (e.g., width × height × colour channels of an image).
    • Tensor operations are the same basic maths (adds, multiplies) extended to more dimensions.
    • Neural networks repeatedly perform these operations on large tensors.

Why Do Neural Networks Use GPUs?

  • 1. Massive parallelism
    • Neural networks consist of layers with many parameters (weights and biases):
      • Modern models can have millions to billions of parameters.
    • Core computation: repeated matrix and tensor multiplications.
    • Same operations (multiplication, addition) applied across large arrays of numbers.
    • GPUs, with thousands of cores, execute these parallel operations very efficiently.
  • 2. High memory bandwidth
    • Training and running neural networks require moving large volumes of data quickly.
    • GPUs are built with very high-bandwidth memory systems (e.g., HBM, GDDR VRAM).
    • This allows them to feed data to compute units fast enough to keep them busy.
  • 3. Specialised tensor hardware
    • Many modern GPUs include tensor cores – units specialised for matrix and tensor operations.
    • Example: NVIDIA H100 Tensor Core GPU can perform around 1.9 quadrillion (1015) FP16/BF16 tensor operations per second.
    • Google’s Tensor Processing Units (TPUs) are custom chips built specifically for neural network maths.

How Much Energy Do GPUs Need?

  • Example scenario
    • Task: train a neural network to predict disease risk based on medical data.
    • Hardware:
      • 4× Nvidia A100 PCIe GPUs, each with board power around 250 W during training.
    • Training duration: 12 hours.
  • Energy during training
    • Assume GPUs are nearly fully used.
    • GPU power during training: 4 × 250 W = 1000 W (1 kW).
    • Energy = power × time ≈ 1 kW × 12 h = 12 kWh.
  • Energy during inference (use in production)
    • Assume only 1 GPU is used for inference, and utilisation is lower.
    • Approximate energy for inference: around 2 kWh over a day (as per the example).
  • Other system components
    • Servers also consume power for:
      • CPUs
      • RAM
      • Storage
      • Cooling fans
      • Networking and power conversion losses
    • Typical rule of thumb: add 30–60% of GPU power for these overheads.
    • In the example, total energy for continuous operation is about 6 kWh/day.
  • Household comparison
    • 6 kWh/day is roughly equivalent to:
    • Running an air-conditioner for 4–6 hours at full compressor power.
    • Running a water heater for about 3 hours.
    • Running about 60 small LED bulbs for 10 hours per day.

Does Nvidia Have a Monopoly?

  • Market position
    • Nvidia does not have a legal monopoly on GPUs in the strict sense.
    • However, it has near-complete dominance in some segments, especially AI computing platforms.
  • Discrete GPUs for personal computers
    • Industry trackers report Nvidia holds roughly 90% market share in discrete GPUs.
    • The remaining share is mostly held by AMD and Intel.
  • Data centre and AI GPUs
    • Nvidia’s strength here comes from:
      • Hardware performance and availability.
      • The CUDA software ecosystem.
    • CUDA is Nvidia’s platform for running general-purpose computation on its GPUs.
    • Many AI frameworks and tools are deeply optimised for CUDA.
    • Switching away from Nvidia often means rewriting or adapting software, which buyers are reluctant to do.
    • As a result, Nvidia GPUs + CUDA are widely treated as the default platform for large-scale neural network training and inference.

Why Are European Regulators Investigating Nvidia?

  • Legal notion of monopoly
    • In competition law, a monopoly is less about having 100% share and more about:
    • Whether a firm can control prices or exclude competitors.
    • Whether it maintains that power through unlawful conduct.
  • Concerns about Nvidia
    • European regulators are examining whether Nvidia uses its dominance to lock in customers.
    • Key issues being probed include:
      • Tying GPU sales to Nvidia software or related components.
      • Providing discounts that depend on buyers also adopting Nvidia software stacks.
      • Practices that could make it harder for rivals to compete on equal terms.
    • The concern is that such behaviour could entrench Nvidia’s market power in AI and data-centre computing.

Key Takeaways

  • GPUs are specialised for parallel, repetitive computation, originally for graphics but now central to AI.
  • The rendering pipeline consists of vertex processing, rasterisation, fragment/pixel shading, and writing to the frame buffer.
  • A die is the silicon piece that holds CPU or GPU circuits; GPUs can be discrete or integrated on the same die as CPUs.
  • Matrix and tensor operations underpin neural networks, and GPUs (and TPUs) are optimised to execute them at massive scale.
  • High-end GPUs consume significant power; continuous AI workloads can use energy comparable to major household appliances.
  • Nvidia dominates key GPU markets, especially AI, and regulators are assessing whether it is leveraging this dominance in anti-competitive ways.
  • Open Practice

Source: The Hindu

Share:

More Posts

Send Us A Message

Login to Get Excited Offer !