Hacker News Logo

Offline

dayweek

Show HN: Tiny-vLLM – high performance LLM inference engine in C++ and CUDA

35 points|github.com|
yu3zhou4|2hrs