Hacker News Logo

Offline

dayweek

DSpark: Speculative decoding accelerates LLM inference [pdf]

775 points|github.com|
aurenvale|33hrs