Configuration¶
This guide covers configuring lit-monitor by hand, the three common setup recipes, provider routing, notifications, strict mode, and recurring-schedule deployment. To configure through the browser instead, use the setup wizard — it edits all the same files for you.
The two kinds of config¶
lit-monitor splits configuration into one secrets file and a set of behaviour files. They live in different places and serve different purposes — this is the split most worth understanding up front.
| Secrets | Behaviour | |
|---|---|---|
| File(s) | config.toml |
nine *.yaml files |
| Format | TOML | YAML |
| Holds | API keys, library IDs, emails | what to search, how to rank, where to write |
| Location | ~/.config/lit-monitor/config.toml |
see below |
| In git? | never | the .example.yaml templates are; your real .yaml files are gitignored |
Where the YAML files live depends on how you installed:
- pip / pipx / uvx install: in your user config dir,
~/.config/lit-monitor/— the same folder asconfig.toml.first-run(or the wizard) seeds them there. This is the path resolver's primary location oncepaths.yamlexists. - From a source checkout: in the repo's
./config/directory (the editable/dev fallback). This is why older docs and theinstall.shflow refer to./config/— that's the source-checkout location.
So for a normal pip install, everything is side by side under
~/.config/lit-monitor/: config.toml next to paths.yaml, topics.yaml, and
the rest. (LIT_MONITOR_ROOT overrides the location entirely if you need a
custom layout.)
1. Create credentials at ~/.config/lit-monitor/config.toml:
[zotero]
api_key = "YOUR_ZOTERO_API_KEY"
library_id = "YOUR_NUMERIC_LIBRARY_ID"
[pubmed]
email = "you@example.com"
[ollama]
api_key = "YOUR_OLLAMA_CLOUD_KEY" # optional, only for cloud Ollama
2. Edit the behaviour YAMLs (see the table below). The .example.yaml files
shipped in git document every field; your real .yaml files are gitignored.
3. Verify — lit-monitor check.
The config files at a glance¶
Nine YAML files, each documented by a matching *.example.yaml. Only the first
two are required; the rest progressively enrich ranking and the knowledge graph.
| File | Required? | What it controls |
|---|---|---|
paths.yaml |
Yes | Filesystem wiring: Obsidian vault path, Zotero library/collection, and the locations of the state DB, ChromaDB vector store, KuzuDB graph, and logs. |
extraction.yaml |
Yes | The behaviour hub: per-mode LLM model selection (simple/complex/brain_build/build_vocabulary), the ranking: signal weights, clustering:, embeddings:, the discovery: notification/delivery flags, and feedback: (active-learning) keys. |
topics.yaml |
Recommended | The recurring searches that feed discovery — each entry is a named query across PubMed/arXiv/Scopus. Auto-grows from discovered_topics over time. |
domain_context.yaml |
Optional | A free-text paragraph describing your research focus. lit-monitor domain analyze turns it into the domain_context ranking signal. |
concepts.yaml |
Optional | Vocabulary themes (theme → keyword lists) used for auto theme-tagging and cluster write-back. Usually generated by build-vocabulary (as concepts_draft.yaml) then reviewed into place. |
researchers.yaml |
Optional | Tracked authors for researcher gating — papers by these authors get a ranking nudge. |
predicates.yaml |
Advanced (graph) | The typed-relationship vocabulary the graph extracts (CITES, EXTENDS, CONTRADICTS, …), with aliases and deprecated support so renames don't force a graph rebuild. |
entity_types.yaml |
Advanced (graph) | The entity-type vocabulary (method, target, assay, …) the NER + LLM extractors map mentions onto, also alias/deprecate-aware. |
entity_aliases.yaml |
Advanced (graph) | Entity normalization aliases (canonical name → known variants) so the same concept under different spellings collapses to one graph node. |
You rarely touch the three graph files by hand — the shipped defaults are sane,
and lit-monitor graph propose-aliases helps grow entity_aliases.yaml.
Setup recipes¶
Three configurations, from zero to fully customised.
(a) Just run it — minimal config¶
Enough to get a recurring discovery feed and Obsidian notes.
cp config/*.example.yaml config/(if you didn't runinstall.sh).- Add Zotero credentials to
~/.config/lit-monitor/config.toml. - Set
obsidian_vault_pathandzotero_library_idinconfig/paths.yaml. - Add 2–3 search topics in
config/topics.yaml. lit-monitor check— verify connectivity.lit-monitor brain-build— index your existing library (one-time).lit-monitor run— first discovery run.lit-monitor serveto browse results athttp://127.0.0.1:8765.
No domain context, no clustering, no graph signals yet — all optional. The vector similarity signal works on its own.
(b) Clustering on — after 100 papers¶
Enable after brain-build has indexed at least 100 papers.
lit-monitor cluster recompute— run K-means and name clusters.lit-monitor cluster view— inspect the resulting themes.- Optionally
lit-monitor cluster write-back tagsto tag Zotero items. - Set
cluster_centroid_weight: 0.2underranking:inconfig/extraction.yamlto include the cluster signal in scoring.
Clustering re-runs automatically on subsequent brain-build calls once
clustering.enabled: true is set. Adjust clustering.n_clusters for the right
granularity (8–15 is a good starting range for a 500-paper library).
lit-monitor cluster recompute # run K-means + LLM naming
lit-monitor cluster view # show cluster themes + paper counts
lit-monitor cluster assign # assign every paper to nearest cluster
lit-monitor cluster write-back tags # tag Zotero items with theme names
(c) Graph signals — tuning the ranking mix¶
Enable once the knowledge graph has been populated and you want the citation and entity-overlap signals.
- Run
lit-monitor graph backfill --all(first time only; incremental thereafter). - Set nonzero weights in
config/extraction.yaml:
ranking:
graph_entity_overlap_weight: 0.15
graph_citation_weight: 0.10
graph_shared_authors_weight: 0.05
- Optionally add a
domain_context.yamlparagraph and runlit-monitor domain analyzeto activate thedomain_contextsignal:
- Check the score decomposition in the discovery web UI or with
lit-monitor discovery view --run latest --breakdownto confirm all signals are contributing as expected.
Trending-concept suggestions and researcher gating are independent features —
see lit-monitor trending suggest and config/researchers.yaml respectively.
Starting from a non-biopharma field¶
lit-monitor's defaults are seeded for biopharma / downstream-process research, but nothing in the engine is domain-specific — it ranks against your library, whatever field that is. To start somewhere closer to your own area:
config/examples/ ships filled-in, synthetic config sets for bioprocessing/,
ml-research/, and climate-science/. Pick the closest one and copy its
topics.yaml, domain_context.yaml, concepts.yaml, and researchers.yaml
into your config dir as a head start, then edit:
The non-domain configs (paths.yaml, extraction.yaml) still come from
config/*.example.yaml. See the
bundled example configs.
Whatever your field, the first real step is the same: brain-build indexes your
existing library so the vector signal has something to rank against.
LLM and embedding providers¶
Local Ollama is the default for both LLM inference and embeddings. Embeddings always run locally unless explicitly switched to a provider.
LLM routing¶
LiteLLM routing is under testing
The LiteLLM provider path (cloud LLMs via Anthropic, OpenAI, Vertex AI, etc.) works but is newer than the well-worn local-Ollama path and hasn't been exercised across as many provider/model combinations. Feedback is appreciated — if you hit a rough edge, please open a GitHub issue. Local Ollama remains the default and most-tested route.
To route any extraction mode through LiteLLM:
Then per-mode in config/extraction.yaml:
modes:
simple:
provider: litellm
litellm_model: claude-3-5-sonnet-20241022
# ... other keys unchanged ...
complex:
provider: litellm
litellm_model: claude-opus-4-5
Mix per-mode — local Ollama for simple, cloud Claude for complex. API keys
come from your environment per
LiteLLM's provider docs.
Embedding routing¶
Embeddings also support LiteLLM providers (OpenAI, Cohere, Vertex AI, or any provider that exposes an embeddings endpoint):
lit-monitor embeddings switch --provider litellm --model text-embedding-3-small
# Or switch back to local Ollama:
lit-monitor embeddings switch --provider ollama --model mxbai-embed-large
After switching, run lit-monitor embeddings rebuild to re-embed your library
under the new model. The embeddings status command shows the current provider,
model, embedding dimensionality, and how many papers are indexed.
Notifications and delivery¶
An OS notification fires when a discovery run completes. Clicking it lands at the
chooser page on first use, then remembers your preferred surface (browser,
Obsidian, or dismiss). Four delivery flags under discovery: in
config/extraction.yaml:
| Key | Default | Effect |
|---|---|---|
notify.enabled |
true |
Fire OS notification at run end |
notify.preferred_viewer |
"" |
Skip the chooser when set |
notes.auto_write_per_paper |
true |
false → defer per-paper notes to obsidian sync |
digest.auto_write |
true |
false → no inline digest; use discovery export-md |
Strict mode and diagnose¶
lit-monitor --strict run --dry-run # CLI flag
LIT_MONITOR_STRICT=1 lit-monitor run # env var
lit-monitor diagnose --config-only # validate every tracked config
lit-monitor diagnose # full health (config + Ollama + Zotero)
Strict mode turns every silent fallback (corrupt config, unreadable attachment, unexpected API response) into a hard error.
Deployment — running on a schedule¶
A schedule is what turns lit-monitor from a search tool into a monitor — new papers arrive already searched, ranked, extracted, and filed between your visits, each run leaving a dated digest and an OS notification. For why that matters to the workflow, see Scheduling — the monitor loop; this section is the how.
lit-monitor runs on a recurring schedule on a single workstation. Install the
schedule from the /schedule page (launchd on macOS,
systemd user timer on Linux). Weekly is a sensible default; any cadence works.
Once the local state DB and ChromaDB store have been populated — either by
brain-build in one shot, or organically by repeated lit-monitor run calls —
the schedule can be handed off to a low-power node (Raspberry Pi 4/5), provided
extraction is routed through Ollama Cloud or LiteLLM (see
LLM routing above). Embeddings still run wherever
the schedule runs, so a cloud-Ollama or LiteLLM embedding provider is the
companion piece for a fully headless node.