CategoriesWeb Scraping

Companies in the category 'Web Scraping'

These are companies that provide open source tools for extracting data from websites.

Items per page:
Sort by:
Showing 1-5 of 5 items
Crawl4AI

Open-source LLM-friendly web crawler

Crawl4AI is an open-source, LLM-friendly web crawler and scraper designed for AI agents, RAG pipelines, and data pipelines. It seeks to deliver AI-ready web crawling tailored for large language models, AI agents, and data pipelines.

Location: Singapore
Founded: 2024
Industries
Various
Technologies
RAGData Extraction
Sectors
Developers
Licenses
Apache-2.0
Updated: March 25, 2026
1 person
3 headlines
Apify

Web scraping and data extraction platform

Apify is a full-stack web scraping and data extraction platform that enables developers and enterprises to extract structured data from any website and automate web workflows. The platform offers a marketplace of over 19,000 pre-built Actors for scraping popular sites, as well as tools to build, deploy, and monetize custom scraping solutions. Apify also develops Crawlee, an open-source web crawling and browser automation library for Node.js and Python.

Location: Prague, Czech Republic
Founded: 2015
Industries
Data & Analytics
Technologies
Browser AutomationData Extraction
Sectors
Developers
Licenses
Apache-2.0
Updated: March 14, 2026
3 people
3 headlines
ScrapeGraphAI

AI-powered web scraping API for agents

ScrapeGraphAI provides an AI-powered web scraping API designed for autonomous AI agents and developers. The platform uses large language models and graph logic to extract structured data from any website through natural language prompts, eliminating the need for custom selectors or proxy management. ScrapeGraphAI offers multiple API endpoints including SmartScraper, SearchScraper, SmartCrawler, and an agentic browser automation interface, and is built on top of its open-source Python library of the same name.

Location: Padua, Italy
Founded: 2024
Industries
Data & Analytics
Technologies
AI AgentsLLMs
Sectors
Developers
Licenses
MIT
Updated: March 13, 2026
2 people
2 headlines
Firecrawl

Web crawling API for LLMs, providing clean data.

Firecrawl provides a real-time web crawling API that delivers structured data. Their solution uses techniques like headless browsers and adaptive extraction to efficiently extract and transform web data, and provides the flexibility to extract exactly the data users need. Their customers are primarily enterprise customers.

Location: San Francisco, CA, USA
Founded: 2022
Industries
MarketingMarket Research
Technologies
LLMsData Extraction
Sectors
Enterprise
Licenses
AGPL-3.0
Updated: August 20, 2025
2 people
8 headlines

COSS Weekly Newsletter

Stay up to date with the latest news, funding rounds, and announcements from the COSS universe.

Check out COSS Weekly on the web

All information submitted through this form is handled in accordance with the Privacy Policy of Chinstrap Community.

Reworkd

Automated web data extraction platform.

Reworkd is an AI-driven automation platform that enables the automation of business processing workflows, aiming to democratize access to AI through community-driven solutions.

Location: San Francisco, CA, USA
Founded: 2022
Industries
Software
Technologies
AI AgentsMulti-Agent Systems
Sectors
Enterprise
Licenses
Unknown
Updated: May 3, 2025
3 people
1 headlines