feat: enable flash attention if supported · ollama/ollama@5ab0d7b

Commit

feat: enable flash attention if supported

Browse files

sammcj committed May 16, 2024

1 parent 6ff2dcc commit 5ab0d7b

llm/llama.cpp

Submodule llama.cpp updated 50 files

+77 −29		.github/workflows/build.yml
+18 −0		CMakeLists.txt
+45 −0		CMakePresets.json
+4 −1		README.md
+16 −0		cmake/arm64-windows-llvm.cmake
+6 −0		cmake/arm64-windows-msvc.cmake
+10 −0		common/common.cpp
+1 −0		common/common.h
+1 −1		common/grammar-parser.cpp
+6 −6		common/json-schema-to-grammar.cpp
+5 −5		common/log.h
+3 −3		convert-hf-to-gguf-update.py
+31 −47		convert-hf-to-gguf.py
+155 −25		convert.py
+3 −0		examples/CMakeLists.txt
+1 −0		examples/embedding/embedding.cpp
+21 −6		examples/llava/llava-cli.cpp
+0 −15		examples/llava/llava.cpp
+59 −1		examples/perplexity/README.md
+3 −1		examples/quantize/README.md
+2 −0		examples/rpc/CMakeLists.txt
+74 −0		examples/rpc/README.md
+130 −0		examples/rpc/rpc-server.cpp
+1 −1		examples/server/README.md
+7 −0		examples/server/server.cpp
+5 −2		examples/server/tests/features/steps/steps.py
+1 −1		examples/server/utils.hpp
+0 −1		ggml-backend.c
+1 −1		ggml-cuda.cu
+33 −30		ggml-cuda/upscale.cu
+7 −0		ggml-impl.h
+48 −35		ggml-metal.m
+33 −41		ggml-metal.metal
+2,195 −27		ggml-quants.c
+1,023 −0		ggml-rpc.cpp
+24 −0		ggml-rpc.h
+5 −24		ggml-sycl.cpp
+306 −161		ggml.c
+16 −2		ggml.h
+1 −0		gguf-py/gguf/__init__.py
+11 −5		gguf-py/gguf/gguf_writer.py
+20 −9		gguf-py/gguf/lazy.py
+109 −0		gguf-py/gguf/quants.py
+230 −108		llama.cpp
+3 −0		llama.h
+4 −0		scripts/sync-ggml-am.sh
+1 −1		scripts/sync-ggml.last
+2 −0		scripts/sync-ggml.sh
+44 −13		tests/test-backend-ops.cpp
+46 −0		tests/test-grammar-integration.cpp

Please sign in to comment.