Skip to content

CUDA-Accelerated/CPU-Multithreaded Tile-based Software Renderer

License

Notifications You must be signed in to change notification settings

Aeroraven/Ifrit-v2

Repository files navigation

Ifrit-v2

GPU/CPU-Parallelized tile-based software rasterizer.

Successor to following repos:

Features

  • Performance:

    • Multithreaded Rasterization
    • SIMD Vectorization
    • CUDA Acceleration (Incomplete)
      • Double Buffering / Overlapped Memory Transfer
      • Device Vector / Dynamic Array
  • Rendering:

    • Homogeneous Space Clipping
    • Programmable VS/FS
    • Z Pre-Pass (CUDA-Only)
  • Presentation:

    • Terminal Rendering (ASCII Characters/Color Image)

Performance

Test performed on 2048x2048 RGBA FP32 Image + 2048x2048 FP32 Depth Attachment. Time consumption in presentation stage (displaying texture via OpenGL) is ignored.

Frame Rate

Model Triangles CPU Single Thread* CPU Multi-thread* CUDA w/ Copy-back CUDA w/o Copy-back**
Yomiya 70275 38 FPS 80 FPS 123 FPS^ 400 FPS^
Stanford Bunny 208353 20 FPS 80 FPS 124 FPS^ 320 FPS^
Khronos Sponza 786801 2 FPS 10 FPS 105 FPS 239 FPS
Intel Sponza 11241912 1 FPS 7 FPS 99 FPS 112 FPS

*. Under optimization

**. Might be influenced by other applications which utilize GPU

^. Result of previous commit.

Test Environment

  • CPU: 12th Gen Intel(R) Core(TM) i9-12900H

    • Test with 16 threads + AVX2 Instructions
  • GPU: NVIDIA GeForce RTX 3070 Ti Laptop GPU

  • Shading: World-space normal

Dependencies

  • Hardware Requirements:
    • SSE
    • AVX2
    • CUDA 12.4
  • Presentation Dependencies:
    • Terminal (Windows Terminal)
    • OpenGL 3.3
    • GLFW 3.3
    • GLAD
  • Compile Dependencies:
    • CMake 3.28
    • MSVC (Visual Studio 2022)
      • C++17 is required
      • C++20 is recommended for best performance
    • NVCC

Ongoing Plan

  • Bug Fixing
    • Undeterministic Behaviors in CUDA Renderer Piepline
  • CPU Pipeline Optimization
    • Performance: SIMD for tile-level pixel shading
    • Performance: Z Pre-Pass
    • Performance: Reduce fp division
  • CUDA Pipeline Optimization
    • Performance: Batched Geometry Processing
    • Performance: Limited Varyings Or Exhaustive Local Mem Writes
    • Performance: Pixel Processing Bottleneck
    • Performance: Binner / Geometry Processing Low Throughput
    • Performance: Pixel Processing Low Cache Utilization Efficiency
  • Blending
  • Scanline Rasterizer (IMR)

References

About

CUDA-Accelerated/CPU-Multithreaded Tile-based Software Renderer

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published