CategoriesInference

Companies in the category 'Inference'

These are companies that provide open source tools and platforms for machine learning inference, deploying and running trained AI models to make predictions.

Items per page:
Sort by:
Showing 1-7 of 7 items
Baseten

Platform for deploying and managing AI models.

Baseten provides a platform for deploying and managing AI models in production, offering performant model runtimes, cross-cloud high availability, and seamless developer workflows.

Location: San Francisco, CA, USA
Founded: 2019
Industries
Software
Technologies
Model DeploymentLLM Workflows
Sectors
Enterprise
Licenses
MIT
Updated: March 14, 2026
4 people
3 headlines
Emcie

Optimized inference platform for Parlant

Emcie is an optimized inference platform for Parlant, an open-source context engineering framework for building customer-facing AI agents. Emcie reduces inference costs by up to 10x by automatically distilling large language model completions into smaller, fine-tuned models that maintain accuracy and behavioral alignment.

Location: Herzliya, Israel
Founded: 2023
Industries
Software
Technologies
AI AgentsLLMs
Sectors
Enterprise
Licenses
Apache-2.0

No headlines available

Updated: March 14, 2026
2 people
0 headlines
BentoML

AI inference platform for production

BentoML is an open-source unified inference platform for deploying and scaling AI models in production. It enables ML teams to build, ship, and scale AI inference services with any model on any cloud, offering full control over deployment without the complexity of managing infrastructure. BentoML was acquired by Modular in February 2026.

Location: San Francisco, CA, USA
Founded: 2019
Industries
Cloud Infrastructure
Technologies
MLOpsModel Deployment
Sectors
Developers
Licenses
Apache-2.0
AnnouncementFebruary 10, 2026
Modular: BentoML Joins Modular
Updated: March 13, 2026
2 people
1 headlines
EXO Labs

Distributed AI inference on local devices

EXO Labs builds open-source software infrastructure that enables AI models to be trained and run across clusters of everyday consumer devices, from a single MacBook to thousands of interconnected machines. Its flagship project, exo, uses pipeline-parallel inference to distribute large language models across heterogeneous hardware without requiring a master-worker architecture. The company was founded in 2024 by Oxford University researchers and is headquartered in Oxford, UK.

Location: Oxford, United Kingdom
Founded: 2024
Industries
Software
Technologies
Edge AIDistributed Computing
Sectors
Developers
Licenses
Apache-2.0
Updated: March 13, 2026
2 people
4 headlines

COSS Weekly Newsletter

Stay up to date with the latest news, funding rounds, and announcements from the COSS universe.

Check out COSS Weekly on the web

All information submitted through this form is handled in accordance with the Privacy Policy of Chinstrap Community.

Nexa AI

On-device AI inference SDK and platform

Nexa AI is an on-device AI deployment and research company that provides a unified inference stack for running large language models and multimodal AI models locally on NPUs, GPUs, and CPUs. Its flagship product, NexaSDK, enables developers to deploy state-of-the-art AI models across mobile (Android and iOS), desktop (Windows, macOS, Linux), automotive, and IoT platforms with a single line of code. The company also offers Hyperlink, a private, offline AI assistant application powered by NexaSDK.

Location: Cupertino, CA, USA
Founded: 2023
Industries
Technologies
Edge AILLMs
Sectors
Enterprise
Licenses
Apache-2.0
Updated: March 13, 2026
2 people
3 headlines
Pruna AI

Optimizes GenAI inference for faster models

Pruna AI provides an open-source AI model optimization framework that applies efficiency methods like caching, pruning, quantization, and distillation to enhance AI model performance.

Location: Paris, France
Founded: 2021
Industries
Data & Analytics
Technologies
AI PerformanceAI/ML
Sectors
Enterprise
Licenses
Apache-2.0
Updated: July 8, 2025
4 people
5 headlines
FAL.AI

Generative AI platform for media models

FAL.AI offers a platform for building and deploying scalable AI applications, focusing on serverless inference. Their core product leverages technologies like serverless functions and GPUs. They serve enterprise clients.

Location: San Francisco, CA, USA
Founded: 2021
Industries
Media & Entertainment
Technologies
MLOpsLLMs
Sectors
Enterprise
Licenses
Apache-2.0
Updated: May 12, 2025
2 people
12 headlines