Skip to main content
Website scraping lets you turn any website into chatbot knowledge. FIFE.BOT uses Firecrawl to crawl pages, extract clean text, and index it for retrieval.

Adding a Website Source

1

Open Knowledge Base tab

Go to your chatbot → Knowledge Base tab.
2

Click Add Source → Website

Enter the root URL of the website you want to scrape (e.g. https://docs.example.com).
3

Discover pages

FIFE.BOT crawls the sitemap and lists all discovered pages. You’ll see:
  • Page URL
  • Page title (if available)
  • Estimated content size
4

Select pages

Check the pages you want to include. You can select all or pick individual pages.
5

Scrape

Click Scrape selected. Each page is:
  1. Fetched via Firecrawl
  2. Cleaned (HTML → text)
  3. Split into chunks
  4. Embedded as vectors
  5. Indexed for search

Adding More Pages Later

You can add more pages to an existing website source at any time. Open the source, click Add pages, and select additional URLs from the sitemap.

Auto-Reindex

Website sources are automatically re-scraped on a schedule to keep your knowledge base up to date. The scrape worker runs every 30 minutes and processes sources based on their configured reindex interval.
How many knowledge sources you can attach per chatbot depends on your plan (see Billing & Plans). Website scraping is one type of source; scheduled re-index keeps pages up to date.

Routing Instructions

You can add routing instructions to a website source to give the AI extra context. For example:
“This source contains our product documentation. When referencing it, always include the relevant product version number.”

Processing Status

StatusMeaning
ReadyPage is indexed and searchable
ProcessingPage is being scraped and embedded
ErrorScraping failed — hover to see the error message, click to retry

Troubleshooting

IssueSolution
No pages discoveredCheck if the site has a sitemap.xml. Some SPAs don’t expose one.
Page scrape failedThe page might be behind authentication, have anti-bot protection, or return errors.
Content seems outdatedCheck the last reindex date. You can manually trigger a resync.
Too many pagesSelect only the most relevant pages. You don’t need to scrape your entire site.