LLM Inference Process

Alluxio Partners with vLLM Production Stack to Accelerate LLM Inference

"Partnering with Alluxio allows us to push the boundaries of LLM inference efficiency," said Junchen Jiang, Head of LMCache Lab at the University of Chicago. "By combining our strengths, we are ...

11d

Chain-of-experts (CoE): A lower-cost LLM framework that increases efficiency and accuracy

Chain-of-experts chains LLM experts in a sequence, outperforming mixture-of-experts (MoE) with lower memory and compute costs.

New technique helps LLMs rein in CoT lengths, optimizing reasoning without exploding compute costs

Carnegie Mellon University researchers propose a new LLM training technique that gives developers more control over chain-of-thought length.

Pliops Announces Collaboration with vLLM Production Stack to Enhance LLM Inference Performance

Pliops contributes its expertise in shared storage and efficient vLLM cache offloading, while LMCache Lab brings a robust scalability framework for multiple instance execution. The combined solution ...

11d

AI21 debuts Maestro AI planning and orchestration system

AI21’s newly debuted Maestro platform is designed to address challenge. The platform, which is described as an AI planning ...

Morningstar2d

John Snow Labs Introduces First Commercially Available Medical Reasoning LLM at NVIDIA GTC

March 20, 2025 (GLOBE NEWSWIRE) -- John Snow Labs, the AI for healthcare company, today announced Medical LLM Reasoner ... making the most plausible inference with limited information, as happens ...

John Snow Labs Introduces First Commercially Available Medical Reasoning LLM at NVIDIA GTC

The model was trained using a recipe inspired by that of deepseek-r1 [3], introducing self-reflection capabilities through reinforcement learning. Developed with NVIDIA tools, the company is releasing ...

Results that may be inaccessible to you are currently showing.

Hide inaccessible results