Sites
Scheduled Content Automation for External Websites
Import public websites, extract metadata, and turn updates into governed Site content
Run immediate or scheduled Web Imports with URL masks, depth controls, HTML and PDF options, job history, metadata extraction rules, completion triggers for agents, and Public AI Chat freshness.
External content monitoring
Manual Content Monitoring Is Inefficient
Teams waste valuable time on repetitive monitoring tasks that can be automated. The harder part is not visiting a public website once; it is checking the right sources on schedule, tracking what changed, and turning the update into action.
Manual checks are also easy to lose. Someone notices a competitor update, a regulatory page changes, or documentation gets refreshed, but there is no durable version history and no consistent handoff into analysis. Clear Ideas Web Import turns public websites into scheduled Site content, with versioning, search, agent analysis, and Public AI Chat synchronization around the imported record. Metadata extraction workflows can then turn selected content into structured Site attributes for search, filtering, and review.
Content automation
Automated Website Monitoring and Content Synchronization
Clear Ideas Web Import captures content from public websites on your schedule. Version management tracks changes, completion triggers can enqueue governed agent runs, and metadata extraction rules can populate structured attributes while Public AI Chat stays current with the latest imported material.
- Scheduled AutomationSet it and forget it. Import content daily, weekly, or monthly. Web Import runs automatically in the background and stores results for review.
- Version Management and Change TrackingEvery import creates a new version only when content changes. Track exactly what changed, when, and compare versions side-by-side for competitive analysis.
- Agent IntegrationAfter a manual or scheduled import completes, Clear Ideas can enqueue a governed agent run with the selected Site and variable set so imported content turns into analysis without a separate handoff.
- Metadata Extraction WorkflowsAttach extraction rules to a Site or folder so approved agents can populate structured metadata after files arrive, using replace or merge behavior.
- Powers Public AI ChatPublic AI Chat can stay synchronized with imported content so visitors get answers from the latest external sources.



Web import features
What the Import Job Actually Controls
A governed import job defines the source scope, link depth, HTML and PDF handling, destination folder, schedule, version behavior, extraction rules, and downstream AI use so imported content becomes governed Site material.
- Import ConfigurationCapture any public HTTP or HTTPS website, restrict imports to specific sections, control link depth, include HTML pages and PDFs, and place results in organized destination folders.
- Scheduling OptionsRun imports hourly, daily, weekly, or monthly with timezone-aware timing, active or inactive schedules, last-run and next-run visibility, and execution history.
- Version ManagementMaintain one-to-one URL mapping, automatic versioning on changes, access to previous versions, smart change detection, and a historical archive.
- AI IntegrationConnect imported content to Public AI Chat, scheduled agent runs, metadata extraction, full-text search, and AI Enhanced Search.
- Metadata ExtractionRun Site or folder-scoped extraction rules that apply a selected agent, track last run status, and keep structured attributes available for search and review.
- Management and ControlUse active or inactive schedules, immediate import testing, job status streams, cancellation, schedule editing, and counts for discovered, processed, succeeded, failed, skipped, active, and total items.
- Security That Grows With YouHandle public content with encrypted storage, role-based access control, Site-scoped isolation, and audit trails.
Automation scenarios
When Scheduled Imports Become Operational Intelligence
The value shows up when a recurring public source matters to a team: competitor websites, regulatory pages, documentation portals, industry news, or other material that needs to be tracked without relying on someone remembering to check.
- Competitive Intelligence MonitoringTrack competitor websites, summarize product or pricing updates, and deliver weekly intelligence without manual review.
- Regulatory Compliance TrackingMonitor government or standards pages, identify new rules, assess operational impact, and alert compliance teams.
- Structured Metadata OperationsExtract contract terms, portfolio details, policy attributes, renewal dates, or diligence fields into consistent metadata after content is uploaded or imported.
- Documentation Synchronization for Public ChatImport official docs on a schedule so Public AI Chat answers from current source material.
- Industry News and Research AggregationTrack multiple public sources, extract recurring topics, and generate daily or weekly digests for research teams.
Rollout
Compare Options and Plan Your Rollout
Use the feature comparison and a product walkthrough to choose the right deployment model.
Governed agents
Automate Analysis With Governed Agents
Web Import creates the data foundation. Governed Agents turn that data into intelligence.
- Competitor Change SummaryA weekly import triggers an agent that compares versions, extracts key changes, summarizes them in executive briefing format, and emails the marketing team.
- Regulatory Risk AssessmentA daily import triggers an agent that identifies new regulations, assesses impact on operations, flags high-risk changes, and creates compliance tasks.
- Content Quality MonitoringA daily import triggers an agent that checks for broken links, verifies technical accuracy, generates quality reports, and alerts the documentation team.
Connected workflows
Where Imported Content Goes Next
Imported pages become more useful when they feed Clear Ideas Sites, Public AI Chat, and Governed Agents. Public AI Chat can answer from them, agents can analyze changes, and metadata extraction can keep structured fields searchable beside the imported record.
- Public AI ChatImport external documentation to power public chat on your website so visitors get answers from your latest external sources.
- Governed AgentsAutomate analysis of imported content. Trigger governed agent runs on content changes and turn every import into intelligence.
- VDR for Due DiligenceMonitor target company websites during M&A and build a competitive intelligence repository with historical versions.
Related reading
Automation Operations and Scheduled Work
Reference material for scheduled content imports, triggers, and automation operations.
- Workflow Scheduling and the Automation CalendarPlan recurring automation and see how scheduled work fits into operational calendars.
- Webhook Integrations for No-Code TriggersUse external events to trigger governed automation without custom code.
- Metadata-Powered Document SearchHow extracted metadata helps teams find and review documents using structured attributes.
- Centralized Workflow Operations Scale AutomationWhy repeatable automation needs centralized operations, run visibility, and evidence.
Frequently Asked Questions
What types of websites can I import with Web Import?
Web Import works with any publicly accessible HTTP or HTTPS website—no authentication required. It captures HTML pages and linked PDF documents. Common use cases include competitor websites, regulatory agency pages, industry news sites, documentation portals, and public knowledge bases. The content must be publicly viewable without login credentials.
How do scheduled imports work and what frequencies are available?
Scheduled imports run automatically in the background at your chosen frequency: Hourly (every hour at a specified minute), Daily (once per day at a specified time), Weekly (on specific days of the week at specified times), or Monthly (on specific dates or expressions like "first Monday"). All schedules are timezone-aware and can be activated, deactivated, or modified at any time.
How does version management track changes to imported content?
Clear Ideas maintains a one-to-one mapping between source URLs and imported files. When content changes, a new version is automatically created while preserving all previous versions. You can view any historical version, compare versions side-by-side to see exactly what changed, and track how external content changed over time. Versions are only created when actual content changes are detected, avoiding unnecessary duplication.
Can I control which parts of a website get imported?
Yes, you have granular control over import scope. Use URL masking to restrict imports to specific sections (e.g., only pages under "/docs/"), set link depth to control how many levels deep the crawler follows links (1-3 levels), specify destination folders for organized storage, and choose whether to include linked PDF documents. These controls help you capture exactly the content you need without excess.
How do I trigger Governed Agents automatically when content changes?
Agents can be configured to run on schedules that align with your import schedules. For example, set a weekly import for Monday at 8 AM, then schedule an agent run for Monday at 9 AM to process the imported content. The agent can compare versions, extract changes, summarize updates, identify risks, and send notifications without manual intervention.
Can content automation extract structured metadata?
Yes. Site administrators can configure metadata extraction workflow rules for a whole Site or a specific folder. Each rule references an approved agent, can replace or merge metadata, and tracks last run status so teams can see whether extraction completed successfully.
Can I test Web Import before setting up a schedule?
Absolutely. Every site includes an immediate Web Import option for testing. Navigate to your site, click Web Import, enter the URL, configure your settings (URL masking, depth, destination folder), and run it immediately. Review the results to verify settings work as expected, then create a schedule using those same settings for automated recurring imports.
How does Web Import integrate with Public AI Chat?
Content imported via Web Import is indexed for Public AI Chat when the chat is configured to use that Site. Scheduled imports help keep public answers aligned with external documentation after each import and indexing cycle completes.
How are imported files organized and stored?
Imported files are organized in your Clear Ideas site following the source website structure. You specify a destination folder, and Web Import creates subfolders mirroring the URL path structure. Each URL maps to one file location, with versions tracked automatically. Files are searchable through semantic and AI-enhanced search, and subject to the same role-based permissions as all other content in your site.
Can I import content from multiple websites into one site?
Yes, you can create multiple import schedules within a single Clear Ideas site, each targeting different source websites. Specify different destination folders for each source to keep content organized. This is ideal for competitive intelligence (monitoring multiple competitors), regulatory tracking (multiple government agencies), or news aggregation (multiple industry publications).
How much storage do scheduled imports consume?
Storage consumption depends on the volume of content imported and change frequency. As versions are only created when content actually changes, storage is used efficiently.