What is a page?
A page equals 2,500 cleaned characters of text content. This is roughly equivalent to:- A typical web page with moderate content
- 1-2 pages of a PDF document
- About 400-500 words of text
“Cleaned characters” means the actual text content after removing HTML tags, scripts, styling, and other non-content elements.
Why pages?
The pages-based quota system provides several benefits:Simplicity
One number to track instead of separate limits for links and files
Flexibility
Use your quota however you want — all web pages, all files, or any mix
Transparency
Clear understanding of exactly how much content you can add
Fairness
You pay for content, not arbitrary file counts
Plan limits
Each plan includes a generous pages quota:| Plan | Pages Quota | Approximate Content |
|---|---|---|
| Starter | 1,000 pages | ~400,000 words |
| Growth | 10,000 pages | ~4 million words |
| Scale | 50,000 pages | ~20 million words |
| Enterprise | 500,000 pages | ~200 million words |
How pages are calculated
When you add content to your chatbot, SiteGPT automatically calculates how many pages it will consume:Web pages
Each URL you add is processed to extract the text content. The cleaned text is measured in characters, then divided by 2,500 to determine the page count. Example: A blog post with 5,000 characters of clean text = 2 pagesFiles
Uploaded files (PDFs, DOCXs, etc.) are converted to text and measured the same way. Example: A 10-page PDF with ~25,000 characters = 10 pagesRaw text
When you paste text directly, the character count determines the pages. Example: 7,500 characters of pasted content = 3 pagesViewing your usage
You can check your pages usage in several places:Managing your quota
Before adding content
When you add new links or files, SiteGPT estimates the page count before processing. If the content would exceed your quota, you’ll see a warning.Removing content
Deleting links or files immediately frees up those pages for new content.Upgrading your plan
If you need more pages, you can upgrade your plan at any time from your billing page.Tips for optimizing page usage
Be selective with URLs
Be selective with URLs
Instead of adding your entire sitemap, focus on the most relevant pages — product docs, FAQs, and key landing pages.
Use exclude patterns
Use exclude patterns
When adding sitemaps, use exclude patterns to skip pages that aren’t relevant for support (e.g.,
/blog/* if blog content isn’t needed).Consolidate documents
Consolidate documents
If you have many small files, consider combining them into fewer, larger documents.
Review periodically
Review periodically
Use the content management pages to identify and remove outdated or low-value content.
FAQs
What counts as a 'cleaned character'?
What counts as a 'cleaned character'?
Cleaned characters are the actual readable text content after removing HTML tags, JavaScript, CSS, navigation menus, footers, and other non-content elements. This ensures you’re only using quota for meaningful training content.
Do images count toward my pages quota?
Do images count toward my pages quota?
No. Only text content counts toward your pages quota. Images, videos, and other media are not included in the calculation.
What happens if I exceed my quota?
What happens if I exceed my quota?
You won’t be able to add new content until you either remove existing content or upgrade your plan. Your chatbot will continue to work normally with its current training data.
Can I see how many pages each item uses?
Can I see how many pages each item uses?
Yes! The Links and Files pages show the page count for each item. Hover over the page count for a tooltip explaining the calculation.