An Open WebUI tool that reads newsletters from an IMAP mailbox, converts them to clean Markdown, and enriches articles via the Jina AI Reader API. Processed emails are tagged curated so they are never surfaced twice.
Find a file
Gian cce388a812 feat: add configurable limits for newsletters and search
Co-authored-by: aider (openrouter/deepseek/deepseek-v4-flash) <aider@aider.chat>
2026-05-12 10:37:06 +00:00
LICENSE Initial commit 2026-05-11 11:31:37 +00:00
nc-code.py feat: add configurable limits for newsletters and search 2026-05-12 10:37:06 +00:00
README.md feat: add configurable limits for newsletters and search 2026-05-12 10:37:06 +00:00

newsletter-curator

An Open WebUI tool that reads newsletters from an IMAP mailbox, converts them to clean Markdown, and enriches articles via the Jina AI Reader API. Processed emails are tagged curated so they are never surfaced twice.

Features

Method Description
latest_newsletters(limit) List unprocessed newsletters from the configured lookback window (limit defaults to MAX_NEWSLETTERS valve)
read_newsletter(uid) Extract full content of a newsletter and mark it as curated
search_newsletters(query, limit) Keyword search with optional semantic reranking (Jina Embeddings); limit defaults to SEARCH_LIMIT valve
fetch_article(url) Fetch a full article via Jina AI Reader (r.jina.ai)
extract_research_topics(uid) Extract headings, anchor texts, and bold phrases as research prompts
export_newsletter_json(uid) Export full newsletter as JSON (readonly, does not mark as curated)

HTML cleaning strips scripts, navigation, footers, and forms, then converts the remaining structure to Markdown. Content is truncated at 48000 characters when reading a newsletter (12000 for the initial preview shown in lists) to stay within LLM context limits.

Semantic search is optional. When enabled (SEMANTIC_SEARCH_ENABLED=true) the keyword results are reranked by cosine similarity using Jina Embeddings — no NumPy required.

Secure secret management IMAP password and Jina API key can be loaded from environment variables (NEWSLETTER_IMAP_PASSWORD, NEWSLETTER_JINA_API_KEY) with valve values as fallback.

Rate limiting A tokenbucket limiter prevents excessive calls to the Jina API (configurable rate and burst size).

Retry logic Network operations (IMAP, Jina API) are retried with exponential backoff (up to 3 attempts).

Input validation UIDs and URLs are validated before being used; PDF attachment filenames are sanitised.

Threadsafe IMAP All IMAP operations are serialised through a reentrant lock (threading.RLock()), guaranteeing robust concurrent access.

PDF attachment limits Attachments larger than MAX_PDF_SIZE_MB (default 10MB) are skipped and logged.

Article cache Successfully fetched articles are cached inmemory for 1hour (TTL), reducing repeated API calls.

Sender filtering Use FROM_WHITELIST (commaseparated email addresses) to limit processing to specific senders.

Graceful degradation If the Jina API is unreachable, the tool falls back to the newsletter content alone.

Configurable limits The maximum number of newsletters listed (MAX_NEWSLETTERS) and search results (SEARCH_LIMIT) are now valves instead of hardcoded defaults.

Requirements

  • Python 3.10+
  • Open WebUI (the Tools class is loaded as a custom tool)
pip install beautifulsoup4 pydantic cachetools

Configuration

All settings live in the Tools.Valves Pydantic model and are editable from the Open WebUI tool settings panel.
Secrets (IMAP_PASSWORD, JINA_API_KEY) can also be set via environment variables:

  • NEWSLETTER_IMAP_PASSWORD
  • NEWSLETTER_JINA_API_KEY
Setting Default Description
IMAP_HOST (required) IMAP server hostname (commaseparated for multiple accounts)
IMAP_PORT 993 SSL port
IMAP_USER (required) Full email address
IMAP_PASSWORD (required) Account password or app password
IMAP_FOLDER Inbox Folder to watch (use IMAP path notation, e.g. INBOX/Newsletters)
LOOKBACK_DAYS 7 How many days back to scan
JINA_API_KEY (optional) API key for r.jina.ai and Jina Embeddings
SEMANTIC_SEARCH_ENABLED false Enable embeddingbased reranking (requires JINA_API_KEY)
FROM_WHITELIST (empty) Commaseparated sender email addresses to allow (empty = no filter)
MAX_PDF_SIZE_MB 10 Maximum PDF attachment size in MB (0 = no limit)
JINA_RATE_LIMIT 5.0 Maximum Jina API calls per second
JINA_BURST 10 Burst capacity for Jina API calls
NETWORK_TIMEOUT 30 Timeout in seconds for all external network requests
LOG_LEVEL INFO Logging level (DEBUG, INFO, WARNING, ERROR)
MAX_NEWSLETTERS 15 Maximum number of newsletters to list in one latest_newsletters call
SEARCH_LIMIT 5 Maximum number of search results returned by search_newsletters

Gmail users: Enable IMAP in Gmail settings and generate an App Password — do not use your main account password.

Usage

Once installed in Open WebUI the tool is invoked automatically by the assistant. Typical workflow:

  1. Ask "What newsletters arrived this week?" → calls latest_newsletters
  2. Ask "Read newsletter #42" → calls read_newsletter(uid="42")
  3. Ask "Find newsletters about AI agents" → calls search_newsletters
  4. Ask "Fetch this article: https://…" → calls fetch_article
  5. Ask "What topics should I dig into from newsletter #42?" → calls extract_research_topics
  6. Ask "Export newsletter #42 as JSON" → calls export_newsletter_json

Safety Guardrails

fetch_article — explicit consent required

fetch_article must not be called automatically. The assistant should only invoke it when:

  • The user explicitly asks to read or explore a specific link, or
  • A link appears central to understanding the newsletter and the user has confirmed after being asked.

It must not be called:

  • Automatically after read_newsletter without explicit user confirmation
  • On unsubscribe, tracking, social network, or navigation links
  • When the newsletter content alone is sufficient to answer the request
  • On more than one link per conversation turn unless explicitly requested

read_newsletter — sideeffect: marks email as curated

Calling read_newsletter tags the email with the curated flag in the IMAP mailbox. This is not reversible from within this tool. The email will no longer appear in latest_newsletters results. Only call it when the user actually wants to read that newsletter.

export_newsletter_json — no side effects

export_newsletter_json is readonly and does not mark the email as curated. It is safe to call without altering the mailbox state.

Credential hygiene

  • Never commit IMAP_PASSWORD or JINA_API_KEY to version control.
  • Use an App Password (Gmail/Outlook) rather than your main account password.
  • Restrict the IMAP account to read/write on the newsletters folder only where your provider allows it.
  • Rotate the Jina API key if it is exposed.

Network requests

  • fetch_article sends the full URL to the Jina AI public API (r.jina.ai). Do not pass URLs that contain session tokens, private file references, or other sensitive data.
  • Jina Embeddings calls (api.jina.ai) transmit up to 512 characters of email content. Avoid enabling semantic search on confidential mailboxes.

Content truncation

Newsletter bodies are truncated at 48000 characters when read via read_newsletter. The initial preview in latest_newsletters and search snippets use a shorter limit of 12000 characters. Content beyond those limits is silently dropped. If a newsletter is unusually long, the assistant may miss sections that appear late in the email.

Architecture

code                  ← singlefile Open WebUI tool (Python)
├── RateLimiter       ← tokenbucket rate limiter for external APIs
├── HTMLCleaner       ← HTML → Markdown, strips noise, truncates
└── Tools
    ├── Valves        ← Pydantic settings model (Open WebUI UI)
    ├── _imap_session ← context manager: connect → select → logout (threadsafe)
    ├── *_sync        ← synchronous IMAP/HTTP methods with retries
    └── public async  ← asyncio.to_thread wrappers exposed to the LLM

Synchronous IMAP operations run inside asyncio.to_thread() to avoid blocking the Open WebUI event loop.

Search uses a twopass strategy: subjectline matches are returned first, then body matches, with optional cosinesimilarity reranking as a third pass.

License

MIT — see LICENSE.