- Python 100%
|
|
||
|---|---|---|
| LICENSE | ||
| nc-code.py | ||
| README.md | ||
newsletter-curator
An Open WebUI tool that reads newsletters from an IMAP mailbox, converts them to clean Markdown, and enriches articles via the Jina AI Reader API. Processed emails are tagged curated so they are never surfaced twice.
Features
| Method | Description |
|---|---|
latest_newsletters(limit) |
List unprocessed newsletters from the configured lookback window (limit defaults to MAX_NEWSLETTERS valve) |
read_newsletter(uid) |
Extract full content of a newsletter and mark it as curated |
search_newsletters(query, limit) |
Keyword search with optional semantic re‑ranking (Jina Embeddings); limit defaults to SEARCH_LIMIT valve |
fetch_article(url) |
Fetch a full article via Jina AI Reader (r.jina.ai) |
extract_research_topics(uid) |
Extract headings, anchor texts, and bold phrases as research prompts |
export_newsletter_json(uid) |
Export full newsletter as JSON (read‑only, does not mark as curated) |
HTML cleaning strips scripts, navigation, footers, and forms, then converts the remaining structure to Markdown. Content is truncated at 48 000 characters when reading a newsletter (12 000 for the initial preview shown in lists) to stay within LLM context limits.
Semantic search is optional. When enabled (SEMANTIC_SEARCH_ENABLED=true) the keyword results are re‑ranked by cosine similarity using Jina Embeddings — no NumPy required.
Secure secret management – IMAP password and Jina API key can be loaded from environment variables (NEWSLETTER_IMAP_PASSWORD, NEWSLETTER_JINA_API_KEY) with valve values as fallback.
Rate limiting – A token‑bucket limiter prevents excessive calls to the Jina API (configurable rate and burst size).
Retry logic – Network operations (IMAP, Jina API) are retried with exponential backoff (up to 3 attempts).
Input validation – UIDs and URLs are validated before being used; PDF attachment filenames are sanitised.
Thread‑safe IMAP – All IMAP operations are serialised through a re‑entrant lock (threading.RLock()), guaranteeing robust concurrent access.
PDF attachment limits – Attachments larger than MAX_PDF_SIZE_MB (default 10 MB) are skipped and logged.
Article cache – Successfully fetched articles are cached in‑memory for 1 hour (TTL), reducing repeated API calls.
Sender filtering – Use FROM_WHITELIST (comma‑separated email addresses) to limit processing to specific senders.
Graceful degradation – If the Jina API is unreachable, the tool falls back to the newsletter content alone.
Configurable limits – The maximum number of newsletters listed (MAX_NEWSLETTERS) and search results (SEARCH_LIMIT) are now valves instead of hardcoded defaults.
Requirements
- Python 3.10+
- Open WebUI (the
Toolsclass is loaded as a custom tool)
pip install beautifulsoup4 pydantic cachetools
Configuration
All settings live in the Tools.Valves Pydantic model and are editable from the Open WebUI tool settings panel.
Secrets (IMAP_PASSWORD, JINA_API_KEY) can also be set via environment variables:
NEWSLETTER_IMAP_PASSWORDNEWSLETTER_JINA_API_KEY
| Setting | Default | Description |
|---|---|---|
IMAP_HOST |
(required) | IMAP server hostname (comma‑separated for multiple accounts) |
IMAP_PORT |
993 |
SSL port |
IMAP_USER |
(required) | Full email address |
IMAP_PASSWORD |
(required) | Account password or app password |
IMAP_FOLDER |
Inbox |
Folder to watch (use IMAP path notation, e.g. INBOX/Newsletters) |
LOOKBACK_DAYS |
7 |
How many days back to scan |
JINA_API_KEY |
(optional) | API key for r.jina.ai and Jina Embeddings |
SEMANTIC_SEARCH_ENABLED |
false |
Enable embedding‑based re‑ranking (requires JINA_API_KEY) |
FROM_WHITELIST |
(empty) | Comma‑separated sender email addresses to allow (empty = no filter) |
MAX_PDF_SIZE_MB |
10 |
Maximum PDF attachment size in MB (0 = no limit) |
JINA_RATE_LIMIT |
5.0 |
Maximum Jina API calls per second |
JINA_BURST |
10 |
Burst capacity for Jina API calls |
NETWORK_TIMEOUT |
30 |
Timeout in seconds for all external network requests |
LOG_LEVEL |
INFO |
Logging level (DEBUG, INFO, WARNING, ERROR) |
MAX_NEWSLETTERS |
15 |
Maximum number of newsletters to list in one latest_newsletters call |
SEARCH_LIMIT |
5 |
Maximum number of search results returned by search_newsletters |
Gmail users: Enable IMAP in Gmail settings and generate an App Password — do not use your main account password.
Usage
Once installed in Open WebUI the tool is invoked automatically by the assistant. Typical workflow:
- Ask "What newsletters arrived this week?" → calls
latest_newsletters - Ask "Read newsletter #42" → calls
read_newsletter(uid="42") - Ask "Find newsletters about AI agents" → calls
search_newsletters - Ask "Fetch this article: https://…" → calls
fetch_article - Ask "What topics should I dig into from newsletter #42?" → calls
extract_research_topics - Ask "Export newsletter #42 as JSON" → calls
export_newsletter_json
Safety Guardrails
fetch_article — explicit consent required
fetch_article must not be called automatically. The assistant should only invoke it when:
- The user explicitly asks to read or explore a specific link, or
- A link appears central to understanding the newsletter and the user has confirmed after being asked.
It must not be called:
- Automatically after
read_newsletterwithout explicit user confirmation - On unsubscribe, tracking, social network, or navigation links
- When the newsletter content alone is sufficient to answer the request
- On more than one link per conversation turn unless explicitly requested
read_newsletter — side‑effect: marks email as curated
Calling read_newsletter tags the email with the curated flag in the IMAP mailbox. This is not reversible from within this tool. The email will no longer appear in latest_newsletters results. Only call it when the user actually wants to read that newsletter.
export_newsletter_json — no side effects
export_newsletter_json is read‑only and does not mark the email as curated. It is safe to call without altering the mailbox state.
Credential hygiene
- Never commit
IMAP_PASSWORDorJINA_API_KEYto version control. - Use an App Password (Gmail/Outlook) rather than your main account password.
- Restrict the IMAP account to read/write on the newsletters folder only where your provider allows it.
- Rotate the Jina API key if it is exposed.
Network requests
fetch_articlesends the full URL to the Jina AI public API (r.jina.ai). Do not pass URLs that contain session tokens, private file references, or other sensitive data.- Jina Embeddings calls (
api.jina.ai) transmit up to 512 characters of email content. Avoid enabling semantic search on confidential mailboxes.
Content truncation
Newsletter bodies are truncated at 48 000 characters when read via read_newsletter. The initial preview in latest_newsletters and search snippets use a shorter limit of 12 000 characters. Content beyond those limits is silently dropped. If a newsletter is unusually long, the assistant may miss sections that appear late in the email.
Architecture
code ← single‑file Open WebUI tool (Python)
├── RateLimiter ← token‑bucket rate limiter for external APIs
├── HTMLCleaner ← HTML → Markdown, strips noise, truncates
└── Tools
├── Valves ← Pydantic settings model (Open WebUI UI)
├── _imap_session ← context manager: connect → select → logout (thread‑safe)
├── *_sync ← synchronous IMAP/HTTP methods with retries
└── public async ← asyncio.to_thread wrappers exposed to the LLM
Synchronous IMAP operations run inside asyncio.to_thread() to avoid blocking the Open WebUI event loop.
Search uses a two‑pass strategy: subject‑line matches are returned first, then body matches, with optional cosine‑similarity re‑ranking as a third pass.
License
MIT — see LICENSE.