Companies in the category 'Inference'
These are companies that provide open source tools and platforms for machine learning inference, deploying and running trained AI models to make predictions.
Platform for deploying and managing AI models.
Baseten provides a platform for deploying and managing AI models in production, offering performant model runtimes, cross-cloud high availability, and seamless developer workflows.
Optimized inference platform for Parlant
Emcie is an optimized inference platform for Parlant, an open-source context engineering framework for building customer-facing AI agents. Emcie reduces inference costs by up to 10x by automatically distilling large language model completions into smaller, fine-tuned models that maintain accuracy and behavioral alignment.
No headlines available
AI inference platform for production
BentoML is an open-source unified inference platform for deploying and scaling AI models in production. It enables ML teams to build, ship, and scale AI inference services with any model on any cloud, offering full control over deployment without the complexity of managing infrastructure. BentoML was acquired by Modular in February 2026.
Distributed AI inference on local devices
EXO Labs builds open-source software infrastructure that enables AI models to be trained and run across clusters of everyday consumer devices, from a single MacBook to thousands of interconnected machines. Its flagship project, exo, uses pipeline-parallel inference to distribute large language models across heterogeneous hardware without requiring a master-worker architecture. The company was founded in 2024 by Oxford University researchers and is headquartered in Oxford, UK.
COSS Weekly Newsletter
Stay up to date with the latest news, funding rounds, and announcements from the COSS universe.
Check out COSS Weekly on the webOn-device AI inference SDK and platform
Nexa AI is an on-device AI deployment and research company that provides a unified inference stack for running large language models and multimodal AI models locally on NPUs, GPUs, and CPUs. Its flagship product, NexaSDK, enables developers to deploy state-of-the-art AI models across mobile (Android and iOS), desktop (Windows, macOS, Linux), automotive, and IoT platforms with a single line of code. The company also offers Hyperlink, a private, offline AI assistant application powered by NexaSDK.
Optimizes GenAI inference for faster models
Pruna AI provides an open-source AI model optimization framework that applies efficiency methods like caching, pruning, quantization, and distillation to enhance AI model performance.
Generative AI platform for media models
FAL.AI offers a platform for building and deploying scalable AI applications, focusing on serverless inference. Their core product leverages technologies like serverless functions and GPUs. They serve enterprise clients.

