lacer2k: "How attention offloading reduces the costs of LLM…" - Mastodon Acerbo.me

lacer2k @lacer2k@mastodon.acerbo.me

How attention offloading reduces the costs of LLM inference at scale https://t.co/8grpFJ7cFV https://t.co/I7rpu1tvCH on https://twitter.com/AcerboLivio/status/1790506633300209687

May 14, 2024, 22:22 · ifttt · · ·

Sign in to participate in the conversation