Skip to content

CUDA accelerated linear algebra / CPU acceleration algorithms.

License

Notifications You must be signed in to change notification settings

Enigmatisms/culina

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

culina

CUDA accelerated linear algebra / CPU acceleration algorithms.

CUDA

  • efficient tiling based large matrix multiplication
  • warp reduce based Matrix Vector multiplication
  • warp reduce based vector dot product
  • warp reduce
  • Flash attention module 1: fused QKV attention (tiling based)
    • softmax(Q^T K / scale) can be easily fused
    • the extra V... well, it's a pain in the ass, TBH
  • coalsescing memory access benchmarking

CPU

  • thread pool (condition variable and simple multi-threading)
  • double buffer (std::timed_mutex and simple multi-threading) with simple benchmarking
  • cache update algorithms:
    • LRU (least recently used)
    • LFU (least frequently used)

About

CUDA accelerated linear algebra / CPU acceleration algorithms.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published