Unstructured Technologies
San Francisco, CA, USA
Founded 2022
Unstructured provides an open-source ETL platform and commercial API for ingesting and preprocessing unstructured documents—such as PDFs, HTML, Word files, and images—into formats ready for use with large language models. The platform supports over 64 file types and offers connectors to enterprise data sources, enabling organizations to build retrieval-augmented generation (RAG) pipelines and agentic AI workflows at scale.
Websites:
Last Updated: March 22, 2026
Current Valuation
$230M
as of March 14, 2024 (Source)
Funding Summary
$65M
Total reported funding
Announcement
March 9, 2026
Business Wire: Unstructured and Teradata Partner to Make Enterprise Data AI-Ready at Scale
Announcement
February 18, 2026
Business Wire: Unstructured Awarded $2M AFWERX TACFI to Advance Multimodal Data Pipelines and Test & Evaluation Frameworks for Generative AI
Announcement
January 16, 2026
Business Wire: Unstructured Awarded $1 Million DAF DTO Contract to Deliver AI Data Layer for Scalable, Cost-Controlled GenAI at the Tactical Edge
Key People
Core OSS Projects
Open-source ETL library for converting complex documents into clean, structured formats for language models
License: Apache-2.0
Business Information
Industries
Data & Analytics
Technologies
LLMs
RAG
ETL
Unstructured Data
Data Pipelines
Document Processing
Sectors
Enterprise
Licenses
Apache-2.0
Socials and Communities
Cossmology Badge
COSS Weekly Newsletter
Stay up to date with the latest news, funding rounds, and announcements from the COSS universe.
Check out COSS Weekly on the web
