What Is LLM Inference

AI Inference Takes Center Stage At KubeCon Europe 2026

KubeCon Europe 2026 made AI inference its central focus with major CNCF donations including llm-d, Nvidia's GPU DRA driver ...

Red Hat sees inference as AI’s next battleground — with Kubernetes at the core

Red Hat is pushing Kubernetes inference into the mainstream by contributing llm-d to the CNCF, as enterprises race to run AI ...

Google targets AI inference bottlenecks with TurboQuant

The technique aims to ease GPU memory constraints that limit how enterprises scale AI inference and long-context applications ...

TechRepublic

NVIDIA Boosts LLM Inference Performance With New TensorRT-LLM Software Library

NVIDIA Boosts LLM Inference Performance With New TensorRT-LLM Software Library Your email has been sent As companies like d-Matrix squeeze into the lucrative artificial intelligence market with ...

DIGITIMES

In-depth: Google TurboQuant cuts LLM memory 6x, resets AI inference cost curve

Google has introduced TurboQuant, a compression algorithm that reduces large language model (LLM) memory usage by at least 6x ...

Semiconductor Engineering

LLM Inference on GPUs (Intel)

“Transformer based Large Language Models (LLMs) have been widely used in many fields, and the efficiency of LLM inference becomes hot topic in real applications. However, LLMs are usually ...

CRN

Nvidia Says New Software Will Double LLM Inference Speed On H100 GPU

The AI chip giant says the open-source software library, TensorRT-LLM, will double the H100’s performance for running inference on leading large language models when it comes out next month. Nvidia ...

CNX Software

Select the right hardware for your local LLM deployment with this online guide

When it comes to deploying local LLMs, many people may think that spending more money will deliver more performance, but it's far from reality. That's ...

Business Wire

Red Hat Launches the llm-d Community, Powering Distributed Gen AI Inference at Scale

Forged in collaboration with founding contributors CoreWeave, Google Cloud, IBM Research and NVIDIA and joined by industry leaders AMD, Cisco, Hugging Face, Intel, Lambda and Mistral AI and university ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results