Fast Inference from Transformers via Speculative Decoding Transformer Models - Search Videos

How to Quadruple LLM Decoding Performance with Speculative Decoding (SpD) and Microscaling (MX) Formats on Qualcomm® Cloud AI 100

How to Quadruple LLM Decoding Performance with Speculative Decoding (SpD) and Microscaling (MX) Formats on Qualcomm® Cloud AI 100

Faster LLMs: Accelerate Inference with Speculative Decoding

Faster LLMs: Accelerate Inference with Speculative Decoding

Transformer Explainer: LLM Transformer Model Visually Explained

Transformer Explainer: LLM Transformer Model Visually Explained

Transformer decoders explained step-by-step from scratch

Transformer decoders explained step-by-step from scratch

MSNLearn With Jay

Building Local AI: Getting Started with vLLM

Building Local AI: Getting Started with vLLM

74 views1 month ago

YouTubeProbably Private

AI Explained: Speculative decoding with vLLM

AI Explained: Speculative decoding with vLLM

1K views3 weeks ago

LK Losses: Optimizing Speculative Decoding

LK Losses: Optimizing Speculative Decoding

YouTubeAI Research Roundup

GigaWorld-Policy: An Efficient Action-Centered World--Action Model (Mar 2026)

GigaWorld-Policy: An Efficient Action-Centered World--Action Model (Mar 2026)

17 views1 week ago

YouTubeAI Paper Slop

Speculative Speculative Decoding for Faster LLM Inference

Speculative Speculative Decoding for Faster LLM Inference

1.3K views3 weeks ago

YouTubeRajistics - data science, AI, and machine learning

NVIDIA's VP of AI Explains Why They Give Away Their Best Models | Kari Briski × Kim Isenberg

NVIDIA's VP of AI Explains Why They Give Away Their Best Models | Kari Briski × Kim Isenberg

1.2K views1 week ago

YouTubeSuperintelligence

ggml and llama.cpp join Hugging Face & Custom AI chips for fast inference - Hacker News (Feb 20, ...

ggml and llama.cpp join Hugging Face & Custom AI chips for fast inference - Hacker News (Feb 20, ...

YouTubeThe Automated Daily

GBV: The AI Speed Hack You Need Now (30% Faster Inference) #Shorts

GBV: The AI Speed Hack You Need Now (30% Faster Inference) #Shorts

YouTubeCollapsedLatents

In-model computation gets real & Cloud inference shifts beyond GPUs - AI News (Mar 17, 2026)

In-model computation gets real & Cloud inference shifts beyond GPUs - AI News (Mar 17, 2026)

21 views2 weeks ago

YouTubeThe Automated Daily

【論文解説】【爆速化】推測デコーディングをスパース計算で検証！驚きの成果公開！

【論文解説】【爆速化】推測デコーディングをスパース計算で検証！驚きの成果公開！

12 views1 month ago

YouTube論文解説チャンネル

What is Speculative decoding - Speculative decoding Explained #generativeai #RAG #ai #llm

What is Speculative decoding - Speculative decoding Explained #generativeai #RAG #ai #llm

273 views3 weeks ago

YouTubeMed Bou | AI Tutorials

Step 3.5 Flash: Fast 11B MoE for Agentic Tasks

Step 3.5 Flash: Fast 11B MoE for Agentic Tasks

63 views1 month ago

YouTubeAI Research Roundup

EP5: Speculative Decoding with Nadav Timor

EP5: Speculative Decoding with Nadav Timor

116 views6 months ago

YouTubeThe Information Bottleneck

Encoder Decoder Network - Computerphile

Encoder Decoder Network - Computerphile

156.6K viewsJun 13, 2018

YouTubeComputerphile

The Hilbert transform

The Hilbert transform

159.2K viewsOct 1, 2017

YouTubeMike X Cohen

Transformer models: Encoder-Decoders

Transformer models: Encoder-Decoders

105.6K viewsJun 14, 2021

YouTubeHugging Face

The Narrated Transformer Language Model

The Narrated Transformer Language Model

346.3K viewsOct 26, 2020

YouTubeJay Alammar

Problems With Encoders And Decoders- Indepth Intuition

Problems With Encoders And Decoders- Indepth Intuition

44.1K viewsAug 5, 2020

YouTubeKrish Naik

Speculative Decoding Explained

Speculative Decoding Explained

7.8K viewsDec 21, 2023

YouTubeTrelis Research

Vision Transformer Attention

Vision Transformer Attention

14.2K viewsOct 21, 2021

Vision Transformers explained

Vision Transformers explained

69.5K viewsJul 1, 2023

YouTubeCode With Aarohi

LLM Jargons Explained: Part 4 - KV Cache

LLM Jargons Explained: Part 4 - KV Cache

10.8K viewsMar 24, 2024

YouTubeSachin Kalsi

Why Isn't ChatGPT Slow? (System Design)

Why Isn't ChatGPT Slow? (System Design)

1.2K views3 months ago

YouTubeTech with infographics

Transformer models: Decoders

Transformer models: Decoders

78.3K viewsJun 14, 2021

YouTubeHugging Face

Set Block Decoding: Faster LLM Inference

Set Block Decoding: Faster LLM Inference

53 views6 months ago

YouTubeAI Research Roundup

Deep Dive: Optimizing LLM inference

Deep Dive: Optimizing LLM inference

47K viewsMar 11, 2024

YouTubeJulien Simon

See more