CategoriesInference

Companies in the category 'Inference'

These are companies that provide open source tools and platforms for machine learning inference, deploying and running trained AI models to make predictions.

Filter by

View as:

GalleryTable

Items per page:

Sort by:

Showing 1-7 of 7 items

Baseten

Platform for deploying and managing AI models.

Baseten provides a platform for deploying and managing AI models in production, offering performant model runtimes, cross-cloud high availability, and seamless developer workflows.

Location: San Francisco, CA, USA

Founded: 2019

Industries

Software

Technologies

Model DeploymentLLM Workflows

Sectors

Enterprise

Licenses

MIT

FundingJanuary 20, 2026

SiliconAngle: AI inference startup Baseten hits $5B valuation with $300M round backed by Nvidia

Media MentionMarch 16, 2026

NVIDIA Investor Relations: NVIDIA Enters Production With Dynamo, the Broadly Adopted Inference Operating System for AI Factories

Updated: March 14, 2026

4 people

3 headlines

Emcie

Optimized inference platform for Parlant

Emcie is an optimized inference platform for Parlant, an open-source context engineering framework for building customer-facing AI agents. Emcie reduces inference costs by up to 10x by automatically distilling large language model completions into smaller, fine-tuned models that maintain accuracy and behavioral alignment.

Location: Herzliya, Israel

Founded: 2023

Industries

Software

Technologies

AI AgentsLLMs

Sectors

Enterprise

Licenses

Apache-2.0

No headlines available

Updated: March 14, 2026

2 people

0 headlines

BentoML

AI inference platform for production

BentoML is an open-source unified inference platform for deploying and scaling AI models in production. It enables ML teams to build, ship, and scale AI inference services with any model on any cloud, offering full control over deployment without the complexity of managing infrastructure. BentoML was acquired by Modular in February 2026.

Location: San Francisco, CA, USA

Founded: 2019

Industries

Cloud Infrastructure

Technologies

MLOpsModel Deployment

Sectors

Developers

Licenses

Apache-2.0

AnnouncementFebruary 10, 2026

Modular: BentoML Joins Modular

Updated: March 13, 2026

2 people

1 headlines

EXO Labs

Distributed AI inference on local devices

EXO Labs builds open-source software infrastructure that enables AI models to be trained and run across clusters of everyday consumer devices, from a single MacBook to thousands of interconnected machines. Its flagship project, exo, uses pipeline-parallel inference to distribute large language models across heterogeneous hardware without requiring a master-worker architecture. The company was founded in 2024 by Oxford University researchers and is headquartered in Oxford, UK.

Location: Oxford, United Kingdom

Founded: 2024

Industries

Software

Technologies

Edge AIDistributed Computing

Sectors

Developers

Licenses

Apache-2.0

OSS News & ViewsOctober 16, 2025

Tom's Hardware: Two Nvidia DGX Spark systems fused with M3 Ultra Mac Studio to deliver 2.8x gain in AI benchmarks — EXO Labs demonstrates disaggregated AI inference serving

AnnouncementFebruary 17, 2026

3DVF: Using AI with a 1998 processor and only 128 MB of RAM, achievable 27 years later

Updated: March 13, 2026

2 people

4 headlines

COSS Weekly Newsletter

Stay up to date with the latest news, funding rounds, and announcements from the COSS universe.

Check out COSS Weekly on the web

Nexa AI

On-device AI inference SDK and platform

Nexa AI is an on-device AI deployment and research company that provides a unified inference stack for running large language models and multimodal AI models locally on NPUs, GPUs, and CPUs. Its flagship product, NexaSDK, enables developers to deploy state-of-the-art AI models across mobile (Android and iOS), desktop (Windows, macOS, Linux), automotive, and IoT platforms with a single line of code. The company also offers Hyperlink, a private, offline AI assistant application powered by NexaSDK.

Location: Cupertino, CA, USA

Founded: 2023

Industries

Technologies

Edge AILLMs

Sectors

Enterprise

Licenses

Apache-2.0

AnnouncementMarch 4, 2026

Qualcomm Developer Blog: Qualcomm teams up with Nexa AI and Docker to bring AI to IoT and Robotics with NexaSDK for Linux

AnnouncementApril 14, 2026

Qualcomm: Nexa AI is now part of Qualcomm AI Hub.

Updated: March 13, 2026

2 people

3 headlines

Pruna AI

Optimizes GenAI inference for faster models

Pruna AI provides an open-source AI model optimization framework that applies efficiency methods like caching, pruning, quantization, and distillation to enhance AI model performance.

Location: Paris, France

Founded: 2021

Industries

Data & Analytics

Technologies

AI PerformanceAI/ML

Sectors

Enterprise

Licenses

Apache-2.0

FundingNovember 18, 2024

Tech Funding News: Pruna AI Bags €6.5M for Faster, Greener, and Cheaper AI Development

Media MentionMarch 20, 2025

TechCrunch: Pruna AI open-sources its AI model optimization framework

Updated: July 8, 2025

4 people

5 headlines

FAL.AI

Generative AI platform for media models

FAL.AI offers a platform for building and deploying scalable AI applications, focusing on serverless inference. Their core product leverages technologies like serverless functions and GPUs. They serve enterprise clients.

Location: San Francisco, CA, USA

Founded: 2021

Industries

Media & Entertainment

Technologies

MLOpsLLMs

Sectors

Enterprise

Licenses

Apache-2.0

Media MentionMarch 9, 2026

Business Insider: 17 creator-economy startups to watch in 2026, according to VCs

AnnouncementApril 17, 2026

MarketersMEDIA: Seedance 2.0 API Goes Live on fal, Expanding Access to Next-Generation AI Video Generation Infrastructure

Updated: May 12, 2025

2 people

12 headlines