Skip to main content

Data Sources

SiteGPT supports training your chatbot from multiple data sources beyond just websites. This allows you to create a comprehensive AI assistant that can answer questions based on your entire knowledge base.

Available Data Sources

Website

Crawl your website automatically via sitemap or page-by-page

Google Drive

Import documents, sheets, and presentations from Google Drive

Notion

Connect your Notion workspace and import pages

YouTube

Train on video transcripts from YouTube channels or playlists

Dropbox

Import files and documents from Dropbox

OneDrive

Connect Microsoft OneDrive to import documents

SharePoint

Import content from Microsoft SharePoint sites

Confluence

Connect Atlassian Confluence for internal documentation

GitBook

Import documentation from GitBook

Box

Connect Box cloud storage

How Data Source Training Works

  1. Connect - Authenticate with your data source (OAuth for most services)
  2. Select - Choose which files, folders, or pages to import
  3. Train - SiteGPT processes the content and trains your chatbot
  4. Sync - Enable auto-sync to keep content updated (where supported)

Supported File Types

When importing from cloud storage services like Google Drive, Dropbox, OneDrive, or Box, SiteGPT can process:
File TypeExtensions
Documents.pdf, .doc, .docx, .txt, .rtf
Spreadsheets.xls, .xlsx, .csv
Presentations.ppt, .pptx
Web.html, .htm

Best Practices

Create dedicated folders for chatbot training content. This makes it easier to manage what gets imported and keeps your chatbot focused.
Customers using 3+ data sources see 40% fewer “I don’t know” responses. Combine your website with internal docs and FAQs.
Enable auto-sync where available, or set reminders to retrain monthly. Outdated content leads to incorrect answers.
Only import content you want the chatbot to reference. Exclude internal-only documents, draft content, or sensitive information.

Adding a Data Source

  1. Navigate to your chatbot’s Training tab
  2. Click Add Data Source
  3. Select your preferred data source type
  4. Follow the authentication prompts
  5. Select the content to import
  6. Click Start Training
Training time depends on the amount of content. A typical import of 50-100 documents takes 2-5 minutes.

Need Help?

If you encounter issues connecting a data source, check the specific integration guide for troubleshooting steps, or contact our support team.