Docs
Knowledge Base

Website Crawling / Import

Automatically import your existing help center or website content into the Ticket0 knowledge base.

If you have an existing help center, FAQ page, or documentation site, Ticket0 can crawl it and import the content as knowledge-base articles automatically.

Adding a source

  1. Go to Knowledge base in the sidebar
  2. Select the Website sources tab
  3. Click Add source
  4. Configure the crawl:
    • Root URL — the start of the crawl, e.g. https://help.yourcompany.com
    • Max depth — how many link levels to follow from the root (default: 3; set to 0 to crawl only the root page)
    • Include patterns — comma-separated URL path fragments to include (e.g. /articles, /docs). Leave empty to include everything under the root.
    • Exclude patterns — path fragments to skip (e.g. /login, /account). Applied after the include filter.
    • Refresh intervalmanual, daily, or weekly. Controls automatic re-crawls.
  5. Save — the first crawl kicks off in the background

What gets imported

Ticket0 fetches each matching page, extracts the main content (stripping navigation, footers, and other boilerplate), and creates a knowledge-base article per page — or per chunk, for long pages. Imported articles are:

  • Published automatically so they're immediately eligible for AI retrieval
  • Tagged crawled and filed under the category website so you can filter them from operator-authored articles
  • Linked to the source crawl page so the next refresh updates the same article in place

Because imported articles are live from the moment they land, spot-check the first few after each new source is added. Extracted content sometimes includes stray navigation, sidebar links, or outdated policies — edit or delete those before operators or customers see them in drafts.

Reviewing imported articles

In the Articles tab, filter the list by the tag crawled (or by the website category) to isolate auto-imported content. From there you can:

  • Edit any article to correct extraction mistakes
  • Flip an article from Published → Draft to temporarily remove it from AI retrieval
  • Delete individual articles you don't want in the knowledge base

The original crawl source keeps a record of every page it fetched, so deleted articles are recreated on the next refresh unless the underlying page is excluded by your include/exclude patterns or removed from the source site. If you never want a page imported, add its path to the source's Exclude patterns before re-crawling.

Re-crawling

Active sources re-run on their configured Refresh interval:

  • Manual — never re-crawls; you trigger each run from the source's row in the Website sources tab
  • Daily — re-crawls once every 24 hours
  • Weekly — re-crawls once every 7 days

On each refresh, changed pages update their existing articles in place, and new pages create new articles. Deleted pages are not automatically removed from the knowledge base — you need to archive or delete those articles manually (or add their paths to Exclude patterns so they stop being refreshed).

Which sites are supported

The crawler works on any publicly accessible website that serves plain HTML. There's no special integration for Intercom, Zendesk, Notion, or other help-center platforms — as long as the content is public and reachable over HTTPS, Ticket0 can extract it the same way. Sites that require login, use heavy client-side rendering without an HTML fallback, or block crawlers via robots.txt will produce empty or failed pages in the crawl.

On this page