Skip to content

Configuration

← Home

This guide covers configuring lit-monitor by hand, the three common setup recipes, provider routing, notifications, strict mode, and recurring-schedule deployment. To configure through the browser instead, use the setup wizard — it edits all the same files for you.

The two kinds of config

lit-monitor splits configuration into one secrets file and a set of behaviour files. They live in different places and serve different purposes — this is the split most worth understanding up front.

Secrets Behaviour
File(s) config.toml nine *.yaml files
Format TOML YAML
Holds API keys, library IDs, emails what to search, how to rank, where to write
Location ~/.config/lit-monitor/config.toml see below
In git? never the .example.yaml templates are; your real .yaml files are gitignored

Where the YAML files live depends on how you installed:

  • pip / pipx / uvx install: in your user config dir, ~/.config/lit-monitor/ — the same folder as config.toml. first-run (or the wizard) seeds them there. This is the path resolver's primary location once paths.yaml exists.
  • From a source checkout: in the repo's ./config/ directory (the editable/dev fallback). This is why older docs and the install.sh flow refer to ./config/ — that's the source-checkout location.

So for a normal pip install, everything is side by side under ~/.config/lit-monitor/: config.toml next to paths.yaml, topics.yaml, and the rest. (LIT_MONITOR_ROOT overrides the location entirely if you need a custom layout.)

1. Create credentials at ~/.config/lit-monitor/config.toml:

[zotero]
api_key    = "YOUR_ZOTERO_API_KEY"
library_id = "YOUR_NUMERIC_LIBRARY_ID"

[pubmed]
email = "you@example.com"

[ollama]
api_key = "YOUR_OLLAMA_CLOUD_KEY"   # optional, only for cloud Ollama

2. Edit the behaviour YAMLs (see the table below). The .example.yaml files shipped in git document every field; your real .yaml files are gitignored.

3. Verifylit-monitor check.

The config files at a glance

Nine YAML files, each documented by a matching *.example.yaml. Only the first two are required; the rest progressively enrich ranking and the knowledge graph.

File Required? What it controls
paths.yaml Yes Filesystem wiring: Obsidian vault path, Zotero library/collection, and the locations of the state DB, ChromaDB vector store, KuzuDB graph, and logs.
extraction.yaml Yes The behaviour hub: per-mode LLM model selection (simple/complex/brain_build/build_vocabulary), the ranking: signal weights, clustering:, embeddings:, the discovery: notification/delivery flags, and feedback: (active-learning) keys.
topics.yaml Recommended The recurring searches that feed discovery — each entry is a named query across PubMed/arXiv/Scopus. Auto-grows from discovered_topics over time.
domain_context.yaml Optional A free-text paragraph describing your research focus. lit-monitor domain analyze turns it into the domain_context ranking signal.
concepts.yaml Optional Vocabulary themes (theme → keyword lists) used for auto theme-tagging and cluster write-back. Usually generated by build-vocabulary (as concepts_draft.yaml) then reviewed into place.
researchers.yaml Optional Tracked authors for researcher gating — papers by these authors get a ranking nudge.
predicates.yaml Advanced (graph) The typed-relationship vocabulary the graph extracts (CITES, EXTENDS, CONTRADICTS, …), with aliases and deprecated support so renames don't force a graph rebuild.
entity_types.yaml Advanced (graph) The entity-type vocabulary (method, target, assay, …) the NER + LLM extractors map mentions onto, also alias/deprecate-aware.
entity_aliases.yaml Advanced (graph) Entity normalization aliases (canonical name → known variants) so the same concept under different spellings collapses to one graph node.

You rarely touch the three graph files by hand — the shipped defaults are sane, and lit-monitor graph propose-aliases helps grow entity_aliases.yaml.

Setup recipes

Three configurations, from zero to fully customised.

(a) Just run it — minimal config

Enough to get a recurring discovery feed and Obsidian notes.

  1. cp config/*.example.yaml config/ (if you didn't run install.sh).
  2. Add Zotero credentials to ~/.config/lit-monitor/config.toml.
  3. Set obsidian_vault_path and zotero_library_id in config/paths.yaml.
  4. Add 2–3 search topics in config/topics.yaml.
  5. lit-monitor check — verify connectivity.
  6. lit-monitor brain-build — index your existing library (one-time).
  7. lit-monitor run — first discovery run.
  8. lit-monitor serve to browse results at http://127.0.0.1:8765.

No domain context, no clustering, no graph signals yet — all optional. The vector similarity signal works on its own.

(b) Clustering on — after 100 papers

Enable after brain-build has indexed at least 100 papers.

  1. lit-monitor cluster recompute — run K-means and name clusters.
  2. lit-monitor cluster view — inspect the resulting themes.
  3. Optionally lit-monitor cluster write-back tags to tag Zotero items.
  4. Set cluster_centroid_weight: 0.2 under ranking: in config/extraction.yaml to include the cluster signal in scoring.

Clustering re-runs automatically on subsequent brain-build calls once clustering.enabled: true is set. Adjust clustering.n_clusters for the right granularity (8–15 is a good starting range for a 500-paper library).

lit-monitor cluster recompute        # run K-means + LLM naming
lit-monitor cluster view             # show cluster themes + paper counts
lit-monitor cluster assign           # assign every paper to nearest cluster
lit-monitor cluster write-back tags  # tag Zotero items with theme names

(c) Graph signals — tuning the ranking mix

Enable once the knowledge graph has been populated and you want the citation and entity-overlap signals.

  1. Run lit-monitor graph backfill --all (first time only; incremental thereafter).
  2. Set nonzero weights in config/extraction.yaml:
ranking:
  graph_entity_overlap_weight: 0.15
  graph_citation_weight:       0.10
  graph_shared_authors_weight: 0.05
  1. Optionally add a domain_context.yaml paragraph and run lit-monitor domain analyze to activate the domain_context signal:
ranking:
  domain_context_weight: 0.20
  1. Check the score decomposition in the discovery web UI or with lit-monitor discovery view --run latest --breakdown to confirm all signals are contributing as expected.

Trending-concept suggestions and researcher gating are independent features — see lit-monitor trending suggest and config/researchers.yaml respectively.

Starting from a non-biopharma field

lit-monitor's defaults are seeded for biopharma / downstream-process research, but nothing in the engine is domain-specific — it ranks against your library, whatever field that is. To start somewhere closer to your own area:

config/examples/ ships filled-in, synthetic config sets for bioprocessing/, ml-research/, and climate-science/. Pick the closest one and copy its topics.yaml, domain_context.yaml, concepts.yaml, and researchers.yaml into your config dir as a head start, then edit:

cp config/examples/ml-research/*.yaml config/   # or bioprocessing/ climate-science/

The non-domain configs (paths.yaml, extraction.yaml) still come from config/*.example.yaml. See the bundled example configs. Whatever your field, the first real step is the same: brain-build indexes your existing library so the vector signal has something to rank against.

LLM and embedding providers

Local Ollama is the default for both LLM inference and embeddings. Embeddings always run locally unless explicitly switched to a provider.

LLM routing

LiteLLM routing is under testing

The LiteLLM provider path (cloud LLMs via Anthropic, OpenAI, Vertex AI, etc.) works but is newer than the well-worn local-Ollama path and hasn't been exercised across as many provider/model combinations. Feedback is appreciated — if you hit a rough edge, please open a GitHub issue. Local Ollama remains the default and most-tested route.

To route any extraction mode through LiteLLM:

pip install "lit-monitor[litellm]"      # or: uv sync --extra litellm

Then per-mode in config/extraction.yaml:

modes:
  simple:
    provider: litellm
    litellm_model: claude-3-5-sonnet-20241022
    # ... other keys unchanged ...
  complex:
    provider: litellm
    litellm_model: claude-opus-4-5

Mix per-mode — local Ollama for simple, cloud Claude for complex. API keys come from your environment per LiteLLM's provider docs.

Embedding routing

Embeddings also support LiteLLM providers (OpenAI, Cohere, Vertex AI, or any provider that exposes an embeddings endpoint):

lit-monitor embeddings switch --provider litellm --model text-embedding-3-small
# Or switch back to local Ollama:
lit-monitor embeddings switch --provider ollama --model mxbai-embed-large

After switching, run lit-monitor embeddings rebuild to re-embed your library under the new model. The embeddings status command shows the current provider, model, embedding dimensionality, and how many papers are indexed.

Notifications and delivery

An OS notification fires when a discovery run completes. Clicking it lands at the chooser page on first use, then remembers your preferred surface (browser, Obsidian, or dismiss). Four delivery flags under discovery: in config/extraction.yaml:

Key Default Effect
notify.enabled true Fire OS notification at run end
notify.preferred_viewer "" Skip the chooser when set
notes.auto_write_per_paper true false → defer per-paper notes to obsidian sync
digest.auto_write true false → no inline digest; use discovery export-md

Strict mode and diagnose

lit-monitor --strict run --dry-run        # CLI flag
LIT_MONITOR_STRICT=1 lit-monitor run      # env var
lit-monitor diagnose --config-only        # validate every tracked config
lit-monitor diagnose                      # full health (config + Ollama + Zotero)

Strict mode turns every silent fallback (corrupt config, unreadable attachment, unexpected API response) into a hard error.

Deployment — running on a schedule

A schedule is what turns lit-monitor from a search tool into a monitor — new papers arrive already searched, ranked, extracted, and filed between your visits, each run leaving a dated digest and an OS notification. For why that matters to the workflow, see Scheduling — the monitor loop; this section is the how.

lit-monitor runs on a recurring schedule on a single workstation. Install the schedule from the /schedule page (launchd on macOS, systemd user timer on Linux). Weekly is a sensible default; any cadence works.

Once the local state DB and ChromaDB store have been populated — either by brain-build in one shot, or organically by repeated lit-monitor run calls — the schedule can be handed off to a low-power node (Raspberry Pi 4/5), provided extraction is routed through Ollama Cloud or LiteLLM (see LLM routing above). Embeddings still run wherever the schedule runs, so a cloud-Ollama or LiteLLM embedding provider is the companion piece for a fully headless node.