Skip to main content

Web

The web module provides tools for interacting with web content. These tools allow you to fetch and process web pages in various formats.

Available tools

Web-fetch page

Fetches content from a web page and returns it in a clean, readable format with preserved structure.

Parameters

  • url (string, required)

    • The URL to fetch
    • Must be a valid HTTP or HTTPS URL
  • format (string, optional)

    • Output format for the page content
    • Options:
      • text (default): Returns clean, readable text with preserved structure
      • html: Returns raw HTML content
    • Use text for most cases when you want to read or analyze the content
    • Only use html when you need to process the raw HTML structure
  • headers (object, optional)

    • Custom headers to send with the request
    • Useful for:
      • Authentication
      • Setting user agent
      • Custom request headers

Example Usage

from tyler.models import Agent, Thread, Message

# Create an agent with web tools
agent = Agent(
model_name="gpt-4o",
purpose="To help with web content",
tools=["web"]
)

# Create a thread with a request to fetch a web page
thread = Thread()
message = Message(
role="user",
content="Can you fetch and summarize the content from https://example.com?"
)
thread.add_message(message)

# Process the thread - agent will use web-fetch_page tool
processed_thread, new_messages = await agent.go(thread)

Response Format

The tool returns a dictionary with:

  • content: The fetched content in the requested format
  • metadata: Additional information about the page
    • title: Page title if available
    • url: Final URL (after any redirects)
    • status: HTTP status code
    • headers: Response headers

Error Handling

The tool handles common errors:

  • Invalid URLs
  • Network timeouts
  • HTTP errors (4xx, 5xx)
  • Invalid content types

Error responses include:

  • Error message
  • Error type
  • HTTP status code (if applicable)

Best practices

  1. Use Text Format by Default

    • The text format is optimized for readability
    • Preserves important structure while removing clutter
    • Better for content analysis and summarization
  2. Handle Large Pages

    • Consider using pagination for large content
    • Process content in chunks if needed
    • Be aware of rate limiting and robots.txt
  3. Respect Website Policies

    • Check robots.txt
    • Use appropriate delays between requests
    • Include proper user agent headers
  4. Security Considerations

    • Only fetch from trusted sources
    • Be cautious with user-provided URLs
    • Validate and sanitize content

Configuration

No special configuration or environment variables are required for web tools.

Common use cases

  1. Content Extraction

    • Fetch articles for summarization
    • Extract specific information from web pages
    • Gather data for analysis
  2. Web Scraping

    • Collect structured data from websites
    • Monitor page changes
    • Archive content
  3. Site Analysis

    • Check page availability
    • Analyze page structure
    • Validate links
  4. Content Integration

    • Import content from external sources
    • Aggregate information
    • Cross-reference data