Skip to main content
Train your chatbot by adding content it can use to answer questions. SiteGPT supports multiple content sources including websites, files, and cloud storage integrations.

Content sources

SiteGPT supports four main types of training data to build your chatbot’s knowledge base: Add website content to train your chatbot:
  • Multiple Links - Import content from multiple URLs at once by pasting a list
  • Sitemap - Import all URLs from your sitemap.xml file automatically
  • Scrape Website - Recursively crawl and extract content from an entire website
  • YouTube - Import transcripts from YouTube videos, playlists, or channels
Learn more: Website Links documentation

2. Files (documents and cloud storage)

Upload files or connect cloud storage:
  • Local Files - Upload PDF, DOCX, TXT, CSV, and other document formats from your computer
  • Notion - Import pages from your Notion workspace
  • Google Drive - Sync documents from Google Drive
  • Dropbox - Import files from Dropbox
  • OneDrive - Connect to Microsoft OneDrive
  • Box - Import files from Box
Learn more: Files documentation

3. Text Snippets

Add plain text content directly without uploading files or adding links:
  • Perfect for FAQs, product descriptions, or company information
  • Supports up to 10,000 characters per chatbot
  • Ideal for content that doesn’t exist on your website or in documents
Learn more: Text Snippets documentation

4. Custom Responses (Q&A)

Override AI responses with pre-written answers for specific questions:
  • Create exact question-answer pairs
  • Ensure consistent responses for pricing, policies, or critical information
  • Takes priority over other content sources when questions match
Learn more: Custom Responses documentation

Adding content

1

Navigate to content

From your chatbot dashboard, choose the content type:
  • Links - For website content
  • Files - For documents and cloud storage
  • Text Snippets - For plain text content
  • Custom Responses - For Q&A pairs
2

Click Add or configure

  • For Links and Files: Click the Add Links or Add Files button
  • For Text Snippets: Paste your content directly in the text area
  • For Custom Responses: Click Add Custom Response
3

Choose your source (for Links and Files)

Select the type of content you want to add from the modal.
4

Configure and import

For Links:
  • Multiple Links: Paste URLs (one per line), configure advanced options, and click Add Links
  • Sitemap: Enter your sitemap URL, set max pages, configure filters, and click Add Links
  • Scrape Website: Enter your website URL, set recursion depth and max pages, configure filters, and click Add Links
  • YouTube: Enter video/playlist/channel URL and click Add YouTube Content
For Files:
  • Local Files: Click Browse File to select files (drag and drop is not supported)
  • Cloud Storage: Authenticate with your account, select files/folders using the picker, and click Sync Selected Files
For Text Snippets:
  • Paste your text content (up to 10,000 characters) and click Save Changes
For Custom Responses:
  • Enter the question and your exact answer, then click Save
5

Wait for processing (Links and Files only)

Content is processed in the background:
  • Small amounts (10-50 pages/files): 2-5 minutes
  • Medium amounts (50-200 pages/files): 5-15 minutes
  • Large amounts (200+ pages/files): 15-30+ minutes
You can close the page - processing continues in the background.Note: Text Snippets and Custom Responses are available immediately after saving.

Content processing

When you add content, SiteGPT:
  1. Extracts text from your pages or files
  2. Chunks content into manageable segments for better retrieval
  3. Creates embeddings using AI for semantic search
  4. Indexes content for fast retrieval during conversations
  5. Makes it available to your chatbot immediately after processing

Managing content

View content status

Both Links and Files pages show:
  • Trained - Successfully processed and available to the chatbot
  • Pending - Currently being processed
  • Failed - Processing failed (hover for error details)

Bulk actions

Select multiple items to:
  • Resync - Reprocess content to pick up changes
  • Delete - Remove content from your chatbot

Search and filter

Use the search bar to find specific content, and filter by status to see trained, pending, or failed items.

Advanced configuration options

  • Basic Settings
  • Auto-sync
  • URL Filtering
  • Content Filtering
  • Custom Headers
Max pages to scrape
  • Set the maximum number of pages to import from the sitemap
  • Limited by your remaining link quota
  • Example: If you have 100 links remaining, you can import up to 100 pages
  • Basic Settings
  • Auto-sync
  • URL Filtering
  • Content Filtering
  • Custom Headers
Recursion depth
  • Number of levels (1-5) to scrape from the website
  • 1 means only the root level pages will be scraped
  • 2 means root level pages and pages linked from them
  • 3 means three levels deep, and so on
  • Higher depth = more pages discovered and scraped
Max pages to scrape
  • Set the maximum number of pages to import from the website
  • Limited by your remaining link quota
  • The crawler will stop when it reaches this limit
  • Supported formats: PDF, CSV, DOC, DOCX, TXT, and other document formats
  • File size limit: Up to 10 MB per file
  • Multiple files: You can select and upload multiple files at once
  • Drag and drop: Not supported - use the “Browse File” button to select files
  • File management: After selecting files, you can review and remove individual files before uploading
Connection management
  • Multiple connections: You can create multiple connections to the same service (e.g., multiple Google Drive accounts)
  • Connection status: Each connection shows whether access is granted or revoked
  • Access control: You can revoke and re-grant access to connections at any time
File selection
  • Picker interface: Use the native picker interface for each service to select files/folders
  • Notion: Select specific pages from your workspace
  • Other services: Select individual files or entire folders to sync
Syncing behavior
  • Initial sync: When you first select files, click “Sync Selected Files” to process them
  • Add more files: You can add more files to an existing connection at any time
  • Modify selection: For Notion, you can modify your page selection; for other services, add more files
  • File management: View all synced files, their status, and last sync time in the connection interface

Auto-sync

Keep your chatbot’s knowledge current by enabling auto-sync for website content:
  1. When adding links, look for the Auto-sync frequency option
  2. Choose sync frequency based on your plan:
    • Never (Manual only) - No automatic syncing (all plans)
    • Monthly (Growth) - Sync every month (Growth plan and above)
    • Weekly (Scale) - Sync every week (Scale plan and above)
    • Daily (Enterprise) - Sync every day (Enterprise plan only)
  3. SiteGPT automatically checks for updates and retrains your chatbot
View and manage auto-sync jobs in the Auto-Sync Jobs page.

Content limits

Your plan determines how much content you can add:
  • Links quota: Number of web pages you can train on
  • Files quota: Number of files you can upload
Check your current usage in Account > Usage. Upgrade your plan to add more content.

Best practices

Choose quality content

  • Add pages with clear, well-written information
  • Avoid duplicate or redundant content
  • Include FAQs and common questions
  • Add product documentation and guides

Organize your content

  • Use descriptive file names
  • Remove outdated or irrelevant pages
  • Keep content up to date with auto-sync
  • Group related content together

Optimize for answers

  • Use clear headings and structure
  • Write in a question-and-answer format when possible
  • Include specific details and examples
  • Avoid overly technical jargon unless necessary

Troubleshooting

If your content isn’t training, check these common issues:
  • Verify URLs are publicly accessible (not behind login)
  • Check that pages contain text content (not just images)
  • Ensure you haven’t exceeded your content limits
  • Try adding the URL individually instead of bulk
Improve your chatbot’s answer quality with these tips:
  • Add more relevant content sources
  • Remove duplicate or conflicting information
  • Use the Q&A feature to override specific answers
  • Adjust your chatbot’s prompt and persona settings
Training large amounts of content takes time. Here’s what to expect:
  • Large sites may take 30+ minutes to process
  • You can close the page - training continues in the background
  • You’ll see status updates when you return to the page
  • Check the status filter to see pending items
Start with your most important pages (homepage, product pages, FAQs) and add more content over time.

Next steps

I