Configuration¶

← Home

This guide covers configuring lit-monitor by hand, the three common setup recipes, provider routing, notifications, strict mode, and recurring-schedule deployment. To configure through the browser instead, use the setup wizard — it edits all the same files for you.

The two kinds of config¶

lit-monitor splits configuration into one secrets file and a set of behaviour files. They live in different places and serve different purposes — this is the split most worth understanding up front.

	Secrets	Behaviour
File(s)	`config.toml`	nine `*.yaml` files
Format	TOML	YAML
Holds	API keys, library IDs, emails	what to search, how to rank, where to write
Location	`~/.config/lit-monitor/config.toml`	see below
In git?	never	the `.example.yaml` templates are; your real `.yaml` files are gitignored

Where the YAML files live depends on how you installed:

pip / pipx / uvx install: in your user config dir, ~/.config/lit-monitor/ — the same folder as config.toml. first-run (or the wizard) seeds them there. This is the path resolver's primary location once paths.yaml exists.
From a source checkout: in the repo's ./config/ directory (the editable/dev fallback). This is why older docs and the install.sh flow refer to ./config/ — that's the source-checkout location.

So for a normal pip install, everything is side by side under ~/.config/lit-monitor/: config.toml next to paths.yaml, topics.yaml, and the rest. (LIT_MONITOR_ROOT overrides the location entirely if you need a custom layout.)

1. Create credentials at ~/.config/lit-monitor/config.toml:

[zotero]
api_key    = "YOUR_ZOTERO_API_KEY"
library_id = "YOUR_NUMERIC_LIBRARY_ID"

[pubmed]
email = "you@example.com"

[ollama]
api_key = "YOUR_OLLAMA_CLOUD_KEY"   # optional, only for cloud Ollama

2. Edit the behaviour YAMLs (see the table below). The .example.yaml files shipped in git document every field; your real .yaml files are gitignored.

3. Verify — lit-monitor check.

The config files at a glance¶

Nine YAML files, each documented by a matching *.example.yaml. Only the first two are required; the rest progressively enrich ranking and the knowledge graph.

File	Required?	What it controls
`paths.yaml`	Yes	Filesystem wiring: Obsidian vault path, Zotero library/collection, and the locations of the state DB, ChromaDB vector store, KuzuDB graph, and logs.
`extraction.yaml`	Yes	The behaviour hub: per-mode LLM model selection (`simple`/`complex`/`brain_build`/`build_vocabulary`), the `ranking:` signal weights, `clustering:`, `embeddings:`, the `discovery:` notification/delivery flags, and `feedback:` (active-learning) keys.
`topics.yaml`	Recommended	The recurring searches that feed discovery — each entry is a named query across PubMed/arXiv/Scopus. Auto-grows from `discovered_topics` over time.
`domain_context.yaml`	Optional	A free-text paragraph describing your research focus. `lit-monitor domain analyze` turns it into the `domain_context` ranking signal.
`concepts.yaml`	Optional	Vocabulary themes (theme → keyword lists) used for auto theme-tagging and cluster write-back. Usually generated by `build-vocabulary` (as `concepts_draft.yaml`) then reviewed into place.
`researchers.yaml`	Optional	Tracked authors for researcher gating — papers by these authors get a ranking nudge.
`predicates.yaml`	Advanced (graph)	The typed-relationship vocabulary the graph extracts (`CITES`, `EXTENDS`, `CONTRADICTS`, …), with `aliases` and `deprecated` support so renames don't force a graph rebuild.
`entity_types.yaml`	Advanced (graph)	The entity-type vocabulary (method, target, assay, …) the NER + LLM extractors map mentions onto, also alias/deprecate-aware.
`entity_aliases.yaml`	Advanced (graph)	Entity normalization aliases (canonical name → known variants) so the same concept under different spellings collapses to one graph node.

You rarely touch the three graph files by hand — the shipped defaults are sane, and lit-monitor graph propose-aliases helps grow entity_aliases.yaml.

Setup recipes¶

Three configurations, from zero to fully customised.

(a) Just run it — minimal config¶

Enough to get a recurring discovery feed and Obsidian notes.

cp config/*.example.yaml config/ (if you didn't run install.sh).
Add Zotero credentials to ~/.config/lit-monitor/config.toml.
Set obsidian_vault_path and zotero_library_id in config/paths.yaml.
Add 2–3 search topics in config/topics.yaml.
lit-monitor check — verify connectivity.
lit-monitor brain-build — index your existing library (one-time).
lit-monitor run — first discovery run.
lit-monitor serve to browse results at http://127.0.0.1:8765.

No domain context, no clustering, no graph signals yet — all optional. The vector similarity signal works on its own.

(b) Clustering on — after 100 papers¶

Enable after brain-build has indexed at least 100 papers.

lit-monitor cluster recompute — run K-means and name clusters.
lit-monitor cluster view — inspect the resulting themes.
Optionally lit-monitor cluster write-back tags to tag Zotero items.
Set cluster_centroid_weight: 0.2 under ranking: in config/extraction.yaml to include the cluster signal in scoring.

Clustering re-runs automatically on subsequent brain-build calls once clustering.enabled: true is set. Adjust clustering.n_clusters for the right granularity (8–15 is a good starting range for a 500-paper library).

lit-monitor cluster recompute        # run K-means + LLM naming
lit-monitor cluster view             # show cluster themes + paper counts
lit-monitor cluster assign           # assign every paper to nearest cluster
lit-monitor cluster write-back tags  # tag Zotero items with theme names

(c) Graph signals — tuning the ranking mix¶

Enable once the knowledge graph has been populated and you want the citation and entity-overlap signals.

Run lit-monitor graph backfill --all (first time only; incremental thereafter).
Set nonzero weights in config/extraction.yaml:

ranking:
  graph_entity_overlap_weight: 0.15
  graph_citation_weight:       0.10
  graph_shared_authors_weight: 0.05

Optionally add a domain_context.yaml paragraph and run lit-monitor domain analyze to activate the domain_context signal:

ranking:
  domain_context_weight: 0.20

Check the score decomposition in the discovery web UI or with lit-monitor discovery view --run latest --breakdown to confirm all signals are contributing as expected.

Trending-concept suggestions and researcher gating are independent features — see lit-monitor trending suggest and config/researchers.yaml respectively.

Starting from a non-biopharma field¶

lit-monitor's defaults are seeded for biopharma / downstream-process research, but nothing in the engine is domain-specific — it ranks against your library, whatever field that is. To start somewhere closer to your own area:

config/examples/ ships filled-in, synthetic config sets for bioprocessing/, ml-research/, and climate-science/. Pick the closest one and copy its topics.yaml, domain_context.yaml, concepts.yaml, and researchers.yaml into your config dir as a head start, then edit:

cp config/examples/ml-research/*.yaml config/   # or bioprocessing/ climate-science/

The non-domain configs (paths.yaml, extraction.yaml) still come from config/*.example.yaml. See the bundled example configs. Whatever your field, the first real step is the same: brain-build indexes your existing library so the vector signal has something to rank against.

LLM and embedding providers¶

Local Ollama is the default for both LLM inference and embeddings. Embeddings always run locally unless explicitly switched to a provider.

LLM routing¶

LiteLLM routing is under testing

The LiteLLM provider path (cloud LLMs via Anthropic, OpenAI, Vertex AI, etc.) works but is newer than the well-worn local-Ollama path and hasn't been exercised across as many provider/model combinations. Feedback is appreciated — if you hit a rough edge, please open a GitHub issue. Local Ollama remains the default and most-tested route.

To route any extraction mode through LiteLLM:

pip install "lit-monitor[litellm]"      # or: uv sync --extra litellm

Then per-mode in config/extraction.yaml:

modes:
  simple:
    provider: litellm
    litellm_model: claude-3-5-sonnet-20241022
    # ... other keys unchanged ...
  complex:
    provider: litellm
    litellm_model: claude-opus-4-5

Mix per-mode — local Ollama for simple, cloud Claude for complex. API keys come from your environment per LiteLLM's provider docs.

Embedding routing¶

Embeddings also support LiteLLM providers (OpenAI, Cohere, Vertex AI, or any provider that exposes an embeddings endpoint):

lit-monitor embeddings switch --provider litellm --model text-embedding-3-small
# Or switch back to local Ollama:
lit-monitor embeddings switch --provider ollama --model mxbai-embed-large

After switching, run lit-monitor embeddings rebuild to re-embed your library under the new model. The embeddings status command shows the current provider, model, embedding dimensionality, and how many papers are indexed.

Notifications and delivery¶

An OS notification fires when a discovery run completes. Clicking it lands at the chooser page on first use, then remembers your preferred surface (browser, Obsidian, or dismiss). Four delivery flags under discovery: in config/extraction.yaml:

Key	Default	Effect
`notify.enabled`	`true`	Fire OS notification at run end
`notify.preferred_viewer`	`""`	Skip the chooser when set
`notes.auto_write_per_paper`	`true`	`false` → defer per-paper notes to `obsidian sync`
`digest.auto_write`	`true`	`false` → no inline digest; use `discovery export-md`

Strict mode and diagnose¶

lit-monitor --strict run --dry-run        # CLI flag
LIT_MONITOR_STRICT=1 lit-monitor run      # env var
lit-monitor diagnose --config-only        # validate every tracked config
lit-monitor diagnose                      # full health (config + Ollama + Zotero)

Strict mode turns every silent fallback (corrupt config, unreadable attachment, unexpected API response) into a hard error.

Deployment — running on a schedule¶

A schedule is what turns lit-monitor from a search tool into a monitor — new papers arrive already searched, ranked, extracted, and filed between your visits, each run leaving a dated digest and an OS notification. For why that matters to the workflow, see Scheduling — the monitor loop; this section is the how.

lit-monitor runs on a recurring schedule on a single workstation. Install the schedule from the /schedule page (launchd on macOS, systemd user timer on Linux). Weekly is a sensible default; any cadence works.

Once the local state DB and ChromaDB store have been populated — either by brain-build in one shot, or organically by repeated lit-monitor run calls — the schedule can be handed off to a low-power node (Raspberry Pi 4/5), provided extraction is routed through Ollama Cloud or LiteLLM (see LLM routing above). Embeddings still run wherever the schedule runs, so a cloud-Ollama or LiteLLM embedding provider is the companion piece for a fully headless node.