Content sources
SiteGPT supports four main types of training data to build your chatbot’s knowledge base:1. Website Links (web content)
Add website content to train your chatbot:- Multiple Links - Import content from multiple URLs at once by pasting a list
- Sitemap - Import all URLs from your sitemap.xml file automatically
- Scrape Website - Recursively crawl and extract content from an entire website
- YouTube - Import transcripts from YouTube videos, playlists, or channels
2. Files (documents and cloud storage)
Upload files or connect cloud storage:- Local Files - Upload PDF, DOCX, TXT, CSV, and other document formats from your computer
- Notion - Import pages from your Notion workspace
- Google Drive - Sync documents from Google Drive
- Dropbox - Import files from Dropbox
- OneDrive - Connect to Microsoft OneDrive
- Box - Import files from Box
3. Text Snippets
Add plain text content directly without uploading files or adding links:- Perfect for FAQs, product descriptions, or company information
- Supports up to 10,000 characters per chatbot
- Ideal for content that doesn’t exist on your website or in documents
4. Custom Responses (Q&A)
Override AI responses with pre-written answers for specific questions:- Create exact question-answer pairs
- Ensure consistent responses for pricing, policies, or critical information
- Takes priority over other content sources when questions match
Adding content
1
Navigate to content
From your chatbot dashboard, choose the content type:
- Links - For website content
- Files - For documents and cloud storage
- Text Snippets - For plain text content
- Custom Responses - For Q&A pairs
2
Click Add or configure
- For Links and Files: Click the Add Links or Add Files button
- For Text Snippets: Paste your content directly in the text area
- For Custom Responses: Click Add Custom Response
3
Choose your source (for Links and Files)
Select the type of content you want to add from the modal.
4
Configure and import
For Links:
- Multiple Links: Paste URLs (one per line), configure advanced options, and click Add Links
- Sitemap: Enter your sitemap URL, set max pages, configure filters, and click Add Links
- Scrape Website: Enter your website URL, set recursion depth and max pages, configure filters, and click Add Links
- YouTube: Enter video/playlist/channel URL and click Add YouTube Content
- Local Files: Click Browse File to select files (drag and drop is not supported)
- Cloud Storage: Authenticate with your account, select files/folders using the picker, and click Sync Selected Files
- Paste your text content (up to 10,000 characters) and click Save Changes
- Enter the question and your exact answer, then click Save
5
Wait for processing (Links and Files only)
Content is processed in the background:
- Small amounts (10-50 pages/files): 2-5 minutes
- Medium amounts (50-200 pages/files): 5-15 minutes
- Large amounts (200+ pages/files): 15-30+ minutes
Content processing
When you add content, SiteGPT:- Extracts text from your pages or files
- Chunks content into manageable segments for better retrieval
- Creates embeddings using AI for semantic search
- Indexes content for fast retrieval during conversations
- Makes it available to your chatbot immediately after processing
Managing content
View content status
Both Links and Files pages show:- Trained - Successfully processed and available to the chatbot
- Pending - Currently being processed
- Failed - Processing failed (hover for error details)
Bulk actions
Select multiple items to:- Resync - Reprocess content to pick up changes
- Delete - Remove content from your chatbot
Search and filter
Use the search bar to find specific content, and filter by status to see trained, pending, or failed items.Advanced configuration options
Multiple Links configuration
Multiple Links configuration
- Auto-sync
- Content Filtering
- Custom Headers
Auto-sync frequency
- Never (Manual only) - Default, no automatic syncing
- Daily (Enterprise) - Sync every day (requires Enterprise plan)
- Weekly (Scale) - Sync every week (requires Scale plan)
- Monthly (Growth) - Sync every month (requires Growth plan)
Sitemap configuration
Sitemap configuration
- Basic Settings
- Auto-sync
- URL Filtering
- Content Filtering
- Custom Headers
Max pages to scrape
- Set the maximum number of pages to import from the sitemap
- Limited by your remaining link quota
- Example: If you have 100 links remaining, you can import up to 100 pages
Scrape Website configuration
Scrape Website configuration
- Basic Settings
- Auto-sync
- URL Filtering
- Content Filtering
- Custom Headers
Recursion depth
- Number of levels (1-5) to scrape from the website
- 1 means only the root level pages will be scraped
- 2 means root level pages and pages linked from them
- 3 means three levels deep, and so on
- Higher depth = more pages discovered and scraped
- Set the maximum number of pages to import from the website
- Limited by your remaining link quota
- The crawler will stop when it reaches this limit
File upload configuration
File upload configuration
- Supported formats: PDF, CSV, DOC, DOCX, TXT, and other document formats
- File size limit: Up to 10 MB per file
- Multiple files: You can select and upload multiple files at once
- Drag and drop: Not supported - use the “Browse File” button to select files
- File management: After selecting files, you can review and remove individual files before uploading
Cloud storage integration configuration
Cloud storage integration configuration
Connection management
- Multiple connections: You can create multiple connections to the same service (e.g., multiple Google Drive accounts)
- Connection status: Each connection shows whether access is granted or revoked
- Access control: You can revoke and re-grant access to connections at any time
- Picker interface: Use the native picker interface for each service to select files/folders
- Notion: Select specific pages from your workspace
- Other services: Select individual files or entire folders to sync
- Initial sync: When you first select files, click “Sync Selected Files” to process them
- Add more files: You can add more files to an existing connection at any time
- Modify selection: For Notion, you can modify your page selection; for other services, add more files
- File management: View all synced files, their status, and last sync time in the connection interface
Auto-sync
Keep your chatbot’s knowledge current by enabling auto-sync for website content:- When adding links, look for the Auto-sync frequency option
- Choose sync frequency based on your plan:
- Never (Manual only) - No automatic syncing (all plans)
- Monthly (Growth) - Sync every month (Growth plan and above)
- Weekly (Scale) - Sync every week (Scale plan and above)
- Daily (Enterprise) - Sync every day (Enterprise plan only)
- SiteGPT automatically checks for updates and retrains your chatbot
Content limits
Your plan determines how much content you can add:- Links quota: Number of web pages you can train on
- Files quota: Number of files you can upload
Best practices
Choose quality content
- Add pages with clear, well-written information
- Avoid duplicate or redundant content
- Include FAQs and common questions
- Add product documentation and guides
Organize your content
- Use descriptive file names
- Remove outdated or irrelevant pages
- Keep content up to date with auto-sync
- Group related content together
Optimize for answers
- Use clear headings and structure
- Write in a question-and-answer format when possible
- Include specific details and examples
- Avoid overly technical jargon unless necessary
Troubleshooting
Content not training
Content not training
If your content isn’t training, check these common issues:
- Verify URLs are publicly accessible (not behind login)
- Check that pages contain text content (not just images)
- Ensure you haven’t exceeded your content limits
- Try adding the URL individually instead of bulk
Poor answer quality
Poor answer quality
Improve your chatbot’s answer quality with these tips:
- Add more relevant content sources
- Remove duplicate or conflicting information
- Use the Q&A feature to override specific answers
- Adjust your chatbot’s prompt and persona settings
Training takes too long
Training takes too long
Training large amounts of content takes time. Here’s what to expect:
- Large sites may take 30+ minutes to process
- You can close the page - training continues in the background
- You’ll see status updates when you return to the page
- Check the status filter to see pending items
Start with your most important pages (homepage, product pages, FAQs) and add more content over time.
Next steps
Integrate with your website
Deploy your chatbot on your website
Retrain and update
Keep your chatbot’s knowledge up to date
Text snippets
Add plain text content directly
Custom responses
Add Q&A pairs for specific questions
Auto-sync
Automatically sync content changes
Adding content overview
Complete guide to all content sources