culina

CUDA accelerated linear algebra / CPU acceleration algorithms.

CUDA

efficient tiling based large matrix multiplication
warp reduce based Matrix Vector multiplication
warp reduce based vector dot product
warp reduce
Flash attention module 1: fused QKV attention (tiling based)
- softmax(Q^T K / scale) can be easily fused
- the extra V... well, it's a pain in the ass, TBH
coalsescing memory access benchmarking

CPU

thread pool (condition variable and simple multi-threading)
double buffer (std::timed_mutex and simple multi-threading) with simple benchmarking
cache update algorithms:
- LRU (least recently used)
- LFU (least frequently used)

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.vscode		.vscode
cache		cache
dl		dl
matmul		matmul
thread_pool		thread_pool
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
coalescing.cu		coalescing.cu
reduce.cu		reduce.cu
utils.h		utils.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.vscode

.vscode

cache

cache

dl

dl

matmul

matmul

thread_pool

thread_pool

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

coalescing.cu

coalescing.cu

reduce.cu

reduce.cu

utils.h

utils.h

Repository files navigation

culina

CUDA accelerated linear algebra / CPU acceleration algorithms.

CUDA

CPU

About

Releases

Packages

Languages

License

Enigmatisms/culina

Folders and files

Latest commit

History

Repository files navigation

culina

CUDA accelerated linear algebra / CPU acceleration algorithms.

CUDA

CPU

About

Topics

Resources

License

Stars

Watchers

Forks

Languages