Firecrawl
Web scraping and crawling tools powered by Firecrawl API
The @tooly/firecrawl package provides AI-ready web scraping and crawling tools powered by the Firecrawl API. Extract content from websites, crawl entire domains, and perform web searches with AI assistance.
Installation
npm install @tooly/firecrawlQuick Start
import { generateText } from 'ai'
import { openai } from '@ai-sdk/openai'
import { createAITools } from '@tooly/firecrawl'
const tools = createAITools(process.env.FIRECRAWL_API_KEY!)
const result = await generateText({
model: openai('gpt-4.1-nano'),
messages: [
{
role: 'user',
content: 'Scrape the content from https://example.com and summarize it',
},
],
tools,
})
console.log(result.text)Setup
1. Get Your Firecrawl API Key
- Sign up at Firecrawl
- Go to your API Keys page
- Create a new API key
- Copy the key for use in your application
2. Environment Variables
Store your API key securely:
FIRECRAWL_API_KEY=fc_your_api_key_here3. Initialize the Tools
import { createAITools } from '@tooly/firecrawl'
const tools = createAITools(process.env.FIRECRAWL_API_KEY!)Available Tools
The Firecrawl package provides the following AI tools:
scrapeUrl
Scrapes a single URL and extracts its content in various formats.
Parameters:
url(string, required): The URL to scrapeformats(array, optional): Output formats to return (default: ["markdown"])- Available formats:
markdown,html,rawHtml,links,screenshot,screenshot@fullPage,extract,json,changeTracking
- Available formats:
headers(object, optional): Custom headers for the requestincludeTags(array, optional): HTML tags to include in outputexcludeTags(array, optional): HTML tags to exclude from outputonlyMainContent(boolean, optional): Extract only main content (default: false)timeout(number, optional): Request timeout in millisecondswaitFor(number, optional): Time to wait before scrapingactions(array, optional): Actions to perform before scraping
Example:
const result = await generateText({
model: openai('gpt-4.1-nano'),
messages: [
{
role: 'user',
content: 'Scrape the latest blog post from https://blog.example.com and extract the main content',
},
],
tools,
})crawlUrl
Crawls a website starting from a given URL and extracts content from multiple pages.
Parameters:
url(string, required): The starting URL to crawllimit(number, optional): Maximum number of pages to crawlscrapeOptions(object, optional): Options for scraping each pagemaxDepth(number, optional): Maximum crawl depthallowedDomains(array, optional): Domains allowed for crawlingblockedDomains(array, optional): Domains to exclude from crawlingallowBackwardLinks(boolean, optional): Allow crawling backward linksallowExternalLinks(boolean, optional): Allow crawling external linksignoreSitemap(boolean, optional): Ignore sitemap when crawling
Example:
const result = await generateText({
model: openai('gpt-4.1-nano'),
messages: [
{
role: 'user',
content: 'Crawl https://docs.example.com and extract documentation from up to 10 pages',
},
],
tools,
})mapUrl
Maps and discovers URLs from a website without scraping content.
Parameters:
url(string, required): The URL to mapsearch(string, optional): Search query to filter URLsignoreSitemap(boolean, optional): Ignore sitemap when mappingincludeSubdomains(boolean, optional): Include subdomains in mappinglimit(number, optional): Maximum number of URLs to return
Example:
const result = await generateText({
model: openai('gpt-4.1-nano'),
messages: [
{
role: 'user',
content: 'Map all the URLs on https://example.com to understand the site structure',
},
],
tools,
})search
Performs web search and returns relevant content.
Parameters:
query(string, required): Search querylimit(number, optional): Maximum number of resultslang(string, optional): Language code for searchcountry(string, optional): Country code for searchlocation(string, optional): Location for searchscrapeOptions(object, optional): Options for scraping search results
Example:
const result = await generateText({
model: openai('gpt-4.1-nano'),
messages: [
{
role: 'user',
content: 'Search for "AI tools 2024" and summarize the top 5 results',
},
],
tools,
})batchScrape
Scrapes multiple URLs efficiently in a single batch operation.
Parameters:
urls(array, required): Array of URLs to scrape (1-1000 URLs)formats(array, optional): Output formats for each URLheaders(object, optional): Custom headers for requestsincludeTags(array, optional): HTML tags to includeexcludeTags(array, optional): HTML tags to excludeonlyMainContent(boolean, optional): Extract only main contenttimeout(number, optional): Timeout per URL
Example:
const result = await generateText({
model: openai('gpt-4.1-nano'),
messages: [
{
role: 'user',
content: 'Scrape these product pages and compare their features: https://product1.com, https://product2.com',
},
],
tools,
})checkCrawlStatus
Checks the status of a previously initiated crawl operation.
Parameters:
id(string, required): The crawl job ID
Example:
const result = await generateText({
model: openai('gpt-4.1-nano'),
messages: [
{
role: 'user',
content: "Check the status of crawl job abc123 and let me know if it's complete",
},
],
tools,
})AI Framework Integration
AI SDK (Recommended)
import { generateText } from 'ai'
import { openai } from '@ai-sdk/openai'
import { createAITools } from '@tooly/firecrawl'
const tools = createAITools(process.env.FIRECRAWL_API_KEY!)
const result = await generateText({
model: openai('gpt-4.1-nano'),
messages: [
{
role: 'user',
content: "Scrape the competitor's pricing page and analyze their strategy",
},
],
tools,
})OpenAI SDK
import OpenAI from 'openai'
import { createOpenAIFunctions } from '@tooly/firecrawl'
const openai = new OpenAI()
const { tools, executeFunction } = createOpenAIFunctions(process.env.FIRECRAWL_API_KEY!)
const completion = await openai.chat.completions.create({
model: 'gpt-4.1-nano',
messages: [
{
role: 'user',
content: 'Extract contact information from this company website',
},
],
tools,
})Anthropic SDK
import Anthropic from '@anthropic-ai/sdk'
import { createAnthropicTools } from '@tooly/firecrawl'
const anthropic = new Anthropic()
const { tools, executeFunction } = createAnthropicTools(process.env.FIRECRAWL_API_KEY!)
const completion = await anthropic.messages.create({
model: 'claude-3-5-sonnet-20241022',
messages: [
{
role: 'user',
content: 'Research the latest trends in web design by scraping design blogs',
},
],
tools,
})Usage Examples
Content Research
const result = await generateText({
model: openai('gpt-4.1-nano'),
messages: [
{
role: 'user',
content: 'Research the top 3 AI startups by scraping their websites and summarizing their value propositions',
},
],
tools,
})Competitive Analysis
const result = await generateText({
model: openai('gpt-4.1-nano'),
messages: [
{
role: 'user',
content: "Crawl our competitor's documentation site and identify features we don't have",
},
],
tools,
})Market Intelligence
const result = await generateText({
model: openai('gpt-4.1-nano'),
messages: [
{
role: 'user',
content: 'Search for recent news about our industry and extract key insights',
},
],
tools,
})Content Monitoring
const result = await generateText({
model: openai('gpt-4.1-nano'),
messages: [
{
role: 'user',
content: 'Monitor these news sites for mentions of our brand and alert me to any negative coverage',
},
],
tools,
})Best Practices
Rate Limiting
Firecrawl has built-in rate limiting. For high-volume operations:
- Use
batchScrapefor multiple URLs - Add delays between requests when needed
- Monitor your API usage in the dashboard
Content Extraction
- Use
onlyMainContent: trueto extract just the main content - Specify
formatsbased on your needs (markdown for text, html for structure) - Use
includeTagsandexcludeTagsto filter content
Error Handling
const result = await generateText({
model: openai('gpt-4.1-nano'),
messages: [
{
role: 'user',
content: 'Scrape this URL and handle any errors gracefully: https://example.com',
},
],
tools,
})
// The tools automatically handle errors and return structured responsesPerformance Optimization
- Use
mapUrlfirst to discover relevant pages before crawling - Set appropriate
timeoutvalues for slow websites - Use
limitparameters to control resource usage - Consider using
actionsfor JavaScript-heavy sites
Troubleshooting
Common Issues
-
403 Forbidden: The website blocks automated requests
- Try adding custom headers to mimic browser requests
- Use different user agents
-
Timeout Errors: Website is slow to respond
- Increase the
timeoutparameter - Add
waitFordelay before scraping
- Increase the
-
Empty Content: JavaScript-rendered content not captured
- Use
actionsto interact with the page before scraping - Try different formats like
htmlinstead ofmarkdown
- Use
-
Rate Limiting: Too many requests
- Use
batchScrapefor multiple URLs - Add delays between operations
- Check your API limits in the dashboard
- Use
Getting Help
- Check the Firecrawl Documentation
- Review error messages returned by the tools
- Monitor your usage in the Firecrawl Dashboard
Type Safety
The package includes full TypeScript support with Zod validation:
import type { ScrapeUrlParams, ScrapeResponse } from '@tooly/firecrawl'
// All parameters and responses are fully typed
const params: ScrapeUrlParams = {
url: 'https://example.com',
formats: ['markdown', 'html'],
onlyMainContent: true,
}