Four Pillars of AI at Enterprise Scale

Running AI at enterprise scale rests on four pillars: the performance of a single inference engine, the scale of distributed serving across many GPUs, the choice of which models to run and how to optimise them, and the trust that comes from controlling where they run. Get any one of those wrong, and the platform stops being economically viable.

Robbie Jerrom (Red Hat) walks through the four pillars, with the bulk of the time spent on the engine room itself: vLLM. We’ll cover what’s actually happening inside an LLM serving stack — why prefill and decode want different hardware, how llm-d scales serving across GPUs, and how reasoning models are quietly changing the economics of every AI platform. Model choice, optimisation, and the hybrid-cloud platform underneath are woven through to show how the four pillars hold each other up. The focus is operational rather than academic: the ideas you can actually act on, explained without the usual wall of jargon.

Speakers:

robbie jerrom portrait image
Robbie Jerrom
Senior Principal Technologist AI at Red Hat