Firecrawl
Web scraping and crawling tools powered by Firecrawl API
The @tooly/firecrawl
package provides AI-ready web scraping and crawling tools powered by the Firecrawl API. Extract content from websites, crawl entire domains, and perform web searches with AI assistance.
Installation
npm install @tooly/firecrawl
Quick Start
import { generateText } from 'ai'
import { openai } from '@ai-sdk/openai'
import { createAITools } from '@tooly/firecrawl'
const tools = createAITools(process.env.FIRECRAWL_API_KEY!)
const result = await generateText({
model: openai('gpt-4.1-nano'),
messages: [
{
role: 'user',
content: 'Scrape the content from https://example.com and summarize it',
},
],
tools,
})
console.log(result.text)
Setup
1. Get Your Firecrawl API Key
- Sign up at Firecrawl
- Go to your API Keys page
- Create a new API key
- Copy the key for use in your application
2. Environment Variables
Store your API key securely:
FIRECRAWL_API_KEY=fc_your_api_key_here
3. Initialize the Tools
import { createAITools } from '@tooly/firecrawl'
const tools = createAITools(process.env.FIRECRAWL_API_KEY!)
Available Tools
The Firecrawl package provides the following AI tools:
scrapeUrl
Scrapes a single URL and extracts its content in various formats.
Parameters:
url
(string, required): The URL to scrapeformats
(array, optional): Output formats to return (default: ["markdown"])- Available formats:
markdown
,html
,rawHtml
,links
,screenshot
,screenshot@fullPage
,extract
,json
,changeTracking
- Available formats:
headers
(object, optional): Custom headers for the requestincludeTags
(array, optional): HTML tags to include in outputexcludeTags
(array, optional): HTML tags to exclude from outputonlyMainContent
(boolean, optional): Extract only main content (default: false)timeout
(number, optional): Request timeout in millisecondswaitFor
(number, optional): Time to wait before scrapingactions
(array, optional): Actions to perform before scraping
Example:
const result = await generateText({
model: openai('gpt-4.1-nano'),
messages: [
{
role: 'user',
content: 'Scrape the latest blog post from https://blog.example.com and extract the main content',
},
],
tools,
})
crawlUrl
Crawls a website starting from a given URL and extracts content from multiple pages.
Parameters:
url
(string, required): The starting URL to crawllimit
(number, optional): Maximum number of pages to crawlscrapeOptions
(object, optional): Options for scraping each pagemaxDepth
(number, optional): Maximum crawl depthallowedDomains
(array, optional): Domains allowed for crawlingblockedDomains
(array, optional): Domains to exclude from crawlingallowBackwardLinks
(boolean, optional): Allow crawling backward linksallowExternalLinks
(boolean, optional): Allow crawling external linksignoreSitemap
(boolean, optional): Ignore sitemap when crawling
Example:
const result = await generateText({
model: openai('gpt-4.1-nano'),
messages: [
{
role: 'user',
content: 'Crawl https://docs.example.com and extract documentation from up to 10 pages',
},
],
tools,
})
mapUrl
Maps and discovers URLs from a website without scraping content.
Parameters:
url
(string, required): The URL to mapsearch
(string, optional): Search query to filter URLsignoreSitemap
(boolean, optional): Ignore sitemap when mappingincludeSubdomains
(boolean, optional): Include subdomains in mappinglimit
(number, optional): Maximum number of URLs to return
Example:
const result = await generateText({
model: openai('gpt-4.1-nano'),
messages: [
{
role: 'user',
content: 'Map all the URLs on https://example.com to understand the site structure',
},
],
tools,
})
search
Performs web search and returns relevant content.
Parameters:
query
(string, required): Search querylimit
(number, optional): Maximum number of resultslang
(string, optional): Language code for searchcountry
(string, optional): Country code for searchlocation
(string, optional): Location for searchscrapeOptions
(object, optional): Options for scraping search results
Example:
const result = await generateText({
model: openai('gpt-4.1-nano'),
messages: [
{
role: 'user',
content: 'Search for "AI tools 2024" and summarize the top 5 results',
},
],
tools,
})
batchScrape
Scrapes multiple URLs efficiently in a single batch operation.
Parameters:
urls
(array, required): Array of URLs to scrape (1-1000 URLs)formats
(array, optional): Output formats for each URLheaders
(object, optional): Custom headers for requestsincludeTags
(array, optional): HTML tags to includeexcludeTags
(array, optional): HTML tags to excludeonlyMainContent
(boolean, optional): Extract only main contenttimeout
(number, optional): Timeout per URL
Example:
const result = await generateText({
model: openai('gpt-4.1-nano'),
messages: [
{
role: 'user',
content: 'Scrape these product pages and compare their features: https://product1.com, https://product2.com',
},
],
tools,
})
checkCrawlStatus
Checks the status of a previously initiated crawl operation.
Parameters:
id
(string, required): The crawl job ID
Example:
const result = await generateText({
model: openai('gpt-4.1-nano'),
messages: [
{
role: 'user',
content: "Check the status of crawl job abc123 and let me know if it's complete",
},
],
tools,
})
AI Framework Integration
AI SDK (Recommended)
import { generateText } from 'ai'
import { openai } from '@ai-sdk/openai'
import { createAITools } from '@tooly/firecrawl'
const tools = createAITools(process.env.FIRECRAWL_API_KEY!)
const result = await generateText({
model: openai('gpt-4.1-nano'),
messages: [
{
role: 'user',
content: "Scrape the competitor's pricing page and analyze their strategy",
},
],
tools,
})
OpenAI SDK
import OpenAI from 'openai'
import { createOpenAIFunctions } from '@tooly/firecrawl'
const openai = new OpenAI()
const { tools, executeFunction } = createOpenAIFunctions(process.env.FIRECRAWL_API_KEY!)
const completion = await openai.chat.completions.create({
model: 'gpt-4.1-nano',
messages: [
{
role: 'user',
content: 'Extract contact information from this company website',
},
],
tools,
})
Anthropic SDK
import Anthropic from '@anthropic-ai/sdk'
import { createAnthropicTools } from '@tooly/firecrawl'
const anthropic = new Anthropic()
const { tools, executeFunction } = createAnthropicTools(process.env.FIRECRAWL_API_KEY!)
const completion = await anthropic.messages.create({
model: 'claude-3-5-sonnet-20241022',
messages: [
{
role: 'user',
content: 'Research the latest trends in web design by scraping design blogs',
},
],
tools,
})
Usage Examples
Content Research
const result = await generateText({
model: openai('gpt-4.1-nano'),
messages: [
{
role: 'user',
content: 'Research the top 3 AI startups by scraping their websites and summarizing their value propositions',
},
],
tools,
})
Competitive Analysis
const result = await generateText({
model: openai('gpt-4.1-nano'),
messages: [
{
role: 'user',
content: "Crawl our competitor's documentation site and identify features we don't have",
},
],
tools,
})
Market Intelligence
const result = await generateText({
model: openai('gpt-4.1-nano'),
messages: [
{
role: 'user',
content: 'Search for recent news about our industry and extract key insights',
},
],
tools,
})
Content Monitoring
const result = await generateText({
model: openai('gpt-4.1-nano'),
messages: [
{
role: 'user',
content: 'Monitor these news sites for mentions of our brand and alert me to any negative coverage',
},
],
tools,
})
Best Practices
Rate Limiting
Firecrawl has built-in rate limiting. For high-volume operations:
- Use
batchScrape
for multiple URLs - Add delays between requests when needed
- Monitor your API usage in the dashboard
Content Extraction
- Use
onlyMainContent: true
to extract just the main content - Specify
formats
based on your needs (markdown for text, html for structure) - Use
includeTags
andexcludeTags
to filter content
Error Handling
const result = await generateText({
model: openai('gpt-4.1-nano'),
messages: [
{
role: 'user',
content: 'Scrape this URL and handle any errors gracefully: https://example.com',
},
],
tools,
})
// The tools automatically handle errors and return structured responses
Performance Optimization
- Use
mapUrl
first to discover relevant pages before crawling - Set appropriate
timeout
values for slow websites - Use
limit
parameters to control resource usage - Consider using
actions
for JavaScript-heavy sites
Troubleshooting
Common Issues
-
403 Forbidden: The website blocks automated requests
- Try adding custom headers to mimic browser requests
- Use different user agents
-
Timeout Errors: Website is slow to respond
- Increase the
timeout
parameter - Add
waitFor
delay before scraping
- Increase the
-
Empty Content: JavaScript-rendered content not captured
- Use
actions
to interact with the page before scraping - Try different formats like
html
instead ofmarkdown
- Use
-
Rate Limiting: Too many requests
- Use
batchScrape
for multiple URLs - Add delays between operations
- Check your API limits in the dashboard
- Use
Getting Help
- Check the Firecrawl Documentation
- Review error messages returned by the tools
- Monitor your usage in the Firecrawl Dashboard
Type Safety
The package includes full TypeScript support with Zod validation:
import type { ScrapeUrlParams, ScrapeResponse } from '@tooly/firecrawl'
// All parameters and responses are fully typed
const params: ScrapeUrlParams = {
url: 'https://example.com',
formats: ['markdown', 'html'],
onlyMainContent: true,
}