[!CAUTION] Actively Refactoring
Flemma (formerly Claudius) is in the middle of a large-scale rename and architecture refresh. Expect new functionality, renamed modules, and occasional breaking changes while the project settles. Pin a commit if you need a steady target.
Flemma turns Neovim into a first-class AI workspace. It gives .chat buffers streaming conversations, reusable prompt templates, attachment support, cost tracking, and ergonomic commands for the three major providers: Anthropic Claude, OpenAI, and Google Vertex AI.
Flemma is for the technical writers, researchers, creators, and tinkerers, for those who occasionally get in hot water and need advice. It's for everyone who wants to experiment with AI.
…accidentally pressing <C-R> and refreshing the page midway through a prompt (or <C-W> trying to delete a word)… or Chrome sending a tab to sleep whilst I had an unsaved session… or having to worry about whether files I shared with Claude Workbench were stored on some Anthropic server indefinitely. I can be fast! I can be reckless! I can tinker! I can use my Vim keybindings and years of muscle memory!
If I have an idea, it's a buffer away. Should I want to branch off and experiment, I'd duplicate the .chat file and go in a different direction. Is the conversation getting too long? I'd summarize a set of instructions and start with them in a new .chat file, then share them each time I need a fresh start. Need backups or history? I have Git for that.
.chat buffers in Flemma.There really is no limit to what you can do with Flemma - if you can write it down and reason about it, you can use Flemma to help you with it.
On a personal level, I've used Flemma to generate bedtime stories with recurring characters for my kids, made small financial decisions based on collected evidence, asked for advice on how to respond to difficult situations, consulted (usual disclaimer, blah blah) it for legal advice and much more.
Flemma can also be a playground for coding experiments - it can help with the occasional small task. I've personally used it to generate Awk scripts, small Node.js jobs, etc. Flemma is not a coding assistant or agent. It's not pretending to be one and it'll never be one. You should keep your Codex, Claude Code, etc. for that purpose - and they'll do a great job at it.
.chat buffers..chat editing tools – get markdown folding, visual rulers, <thinking> highlighting, and message text objects tuned for chat transcripts.{{ expressions }}, and include() helpers to assemble prompts without leaving Neovim.@./path; Flemma handles MIME detection and surfaces warnings when a provider can’t ingest the asset.on_request_* callbacks, auto-write finished chats, and recall the latest usage notification when auditing work.Flemma works with any plugin manager. With lazy.nvim you only need to declare the plugin – opts = {} triggers require("flemma").setup({}) automatically:
{
"Flemma-Dev/flemma.nvim",
opts = {},
}
For managers that do not wire opts, call require("flemma").setup({}) yourself after the plugin is on the runtime path.
| Requirement | Why it matters |
|---|---|
| Neovim 0.11 or newer | Uses Tree-sitter folding APIs introduced in 0.11 and relies on vim.fs helpers. |
curl |
Streaming is handled by spawning curl with Server-Sent Events enabled. |
| Markdown Tree-sitter grammar | Flemma registers .chat buffers to reuse the markdown parser for syntax highlighting and folding. |
file CLI (optional but recommended) |
Provides reliable MIME detection for @./path attachments. When missing, extensions are used as a best effort. |
| Provider | Environment variable | Notes |
|---|---|---|
| Anthropic Claude | ANTHROPIC_API_KEY |
|
| OpenAI | OPENAI_API_KEY |
Supports GPT‑5 family, including reasoning effort settings. |
| Google Vertex AI | VERTEX_AI_ACCESS_TOKEN or service-account credentials |
Requires additional configuration (see below). |
When environment variables are absent Flemma looks for secrets in the Secret Service keyring. Store them once and every Neovim instance can reuse them:
secret-tool store --label="Claude API Key" service anthropic key api
secret-tool store --label="OpenAI API Key" service openai key api
secret-tool store --label="Vertex AI Service Account" service vertex key api project_id your-gcp-project
VERTEX_SERVICE_ACCOUNT='{"type": "..."}', or$PATH; Flemma shells out to gcloud auth application-default print-access-token whenever it needs to refresh the token.:Flemma switch vertex gemini-2.5-pro project_id=my-project location=us-central1.[!NOTE] If you only supply
VERTEX_AI_ACCESS_TOKEN, Flemma uses that token until it expires and skipsgcloud.
Configure the plugin:
require("flemma").setup({})
Create a new file that ends with .chat. Flemma only activates on that extension.
Type a message, for example:
@You: Turn the notes below into a short project update.
- Added Vertex thinking budget support.
- Refactored :Flemma command routing.
- Documented presets in the README.
Press Ctrl-] (normal or insert mode) or run :Flemma send. Flemma freezes the buffer while the request is streaming and shows @Assistant: Thinking....
When the reply finishes, a floating notification lists token counts and cost for the request and the session.
Cancel an in-flight response with Ctrl-c or :Flemma cancel.
[!TIP] Legacy commands (
:FlemmaSend,:FlemmaCancel, …) still work but forward to the new command tree with a deprecation notice.
.chat Buffers```lua
release = {
version = "v25.10-1",
focus = "command presets and UI polish",
}
notes = [[
- Presets appear first in :Flemma switch completion.
- Thinking tags have dedicated highlights.
- Logging toggles now live under :Flemma logging:*.
]]
```
@System: You turn engineering notes into concise changelog entries.
@You: Summarise {{release.version}} with emphasis on {{release.focus}} using the points below:
{{notes}}
@Assistant:
- Changelog bullets...
- Follow-up actions...
<thinking>
Model thoughts stream here and auto-fold.
</thinking>
flemma.frontmatter.parsers.register("yaml", parser_fn).@System:, @You:, or @Assistant:. The parser is whitespace-tolerant and handles blank lines between messages.<thinking> sections; Flemma folds them automatically and keeps dedicated highlights for the tags and body.| Fold level | What folds | Why |
|---|---|---|
| Level 3 | The frontmatter block | Keep templates out of the way while you focus on chat history. |
| Level 2 | <thinking>...</thinking> |
Reasoning traces are useful, but often secondary to the answer. |
| Level 1 | Each message | Collapse long exchanges without losing context. |
Toggle folds with your usual mappings (za, zc, etc.). The fold text shows a snippet of the hidden content so you know whether to expand it.
Between messages, Flemma draws a ruler using the configured ruler.char and highlight. This keeps multi-step chats legible even with folds open.
Inside .chat buffers Flemma defines:
]m / [m – jump to the next/previous message header.im / am (configurable) – select the inside or entire message as a text object. Thinking blocks are skipped so yanking im never includes <thinking> sections unintentionally.<C-]> and <C-c> in normal mode. Insert-mode <C-]> stops insert, sends, and re-enters insert when the response finishes.Disable or remap these through the keymaps section (see Configuration reference).
Use the single entry point :Flemma {command}. Autocompletion lists every available sub-command.
| Command | Purpose | Example |
|---|---|---|
:Flemma send [key=value …] |
Send the current buffer. Optional callbacks run before/after the request. | :Flemma send on_request_start=stopinsert on_request_complete=startinsert! |
:Flemma cancel |
Abort the active request and clean up the spinner. | |
:Flemma switch … |
Choose or override provider/model parameters. | See below. |
:Flemma message:next / :Flemma message:previous |
Jump through message headers. | |
:Flemma logging:enable / :…:disable / :…:open |
Toggle structured logging and open the log file. | |
:Flemma notification:recall |
Reopen the last usage/cost notification. | |
:Flemma import |
Convert Anthropics Claude Workbench code snippets into .chat format. |
:Flemma switch (no arguments) opens two vim.ui.select pickers: first provider, then model.:Flemma switch openai gpt-5 temperature=0.3 changes provider, model, and overrides parameters in one go.:Flemma switch vertex project_id=my-project location=us-central1 thinking_budget=4096 demonstrates long-form overrides. Anything that looks like key=value is accepted; unknown keys are passed to the provider for validation.Define reusable setups under the presets key. Preset names must begin with $; completions prioritise them above built-in providers.
require("flemma").setup({
presets = {
["$fast"] = "vertex gemini-2.5-flash temperature=0.2",
["$review"] = {
provider = "claude",
model = "claude-sonnet-4-5",
max_tokens = 6000,
},
},
})
Switch using :Flemma switch $fast or :Flemma switch $review temperature=0.1 to override individual values.
| Provider | Defaults | Extra parameters | Notes |
|---|---|---|---|
| Claude | claude-sonnet-4-0 |
Standard max_tokens, temperature, timeout, connect_timeout. |
Supports text, image, and PDF attachments. |
| OpenAI | gpt-5 |
reasoning=<low|medium|high> toggles reasoning effort. When set, lualine includes the reasoning level and Flemma keeps your configured max_tokens aligned with OpenAI’s completion limit automatically. |
Cost notifications include reasoning tokens. |
| Vertex AI | gemini-2.5-pro |
project_id (required), location (default global), thinking_budget enables streamed <thinking> traces. |
thinking_budget ≥ 1 activates Google’s experimental thinking output; set to 0 or nil to disable. |
The full model cataloguel (including pricing) is in lua/flemma/models.lua. You can access it from Neovim with:
:lua print(vim.inspect(require("flemma.provider.config").models))
Flemma’s prompt pipeline runs through three stages: parse, evaluate, and send. Errors at any stage surface via diagnostics before the request leaves your editor.
```lua or ```json).```lua
recipient = "QA team"
notes = [[
- Verify presets list before providers.
- Check spinner no longer triggers spell checking.
- Confirm logging commands live under :Flemma logging:*.
]]
```
Use {{ expression }} inside any non-assistant message. Expressions run in a sandbox that exposes:
string, table, math, utf8).vim.fn (fnamemodify, getcwd) and vim.fs (normalize, abspath).Outputs are converted to strings. Tables are JSON-encoded automatically.
@You: Draft a short update for {{recipient}} covering:
{{notes}}
Errors in expressions are downgraded to warnings. The request still sends, and the literal {{ expression }} remains in the prompt so you can see what failed.
include() helperCall include("relative/or/absolute/path") inside frontmatter or an expression to inline another template fragment. Includes are evaluated in isolation (they do not inherit your variables) and support their own {{ }} and @./ references.
Guards in place:
include().Flemma groups diagnostics by type in the notification shown before sending:
{{ }} evaluation.If any blocking error occurs the buffer becomes modifiable again and the request is cancelled before hitting the network.
Embed local context with @./relative/path (or @../up-one/path). Flemma handles:
.chat file (after decoding URL-escaped characters like %20).file or the extension fallback.Examples:
@You: Critique @./patches/fix.lua;type=text/x-lua.
@You: OCR this screenshot @./artifacts/failure.png.
@You: Compare these specs: @./specs/v1.pdf and @./specs/v2.pdf.
Trailing punctuation such as . or ) is ignored so you can keep natural prose. To coerce a MIME type, append ;type=<mime> as in the Lua example above.
| Provider | Text files | Images | PDFs | Behaviour when unsupported |
|---|---|---|---|---|
| Claude | Embedded as plain text parts | Uploaded as base64 image parts | Sent as document parts | The literal @./path is kept and a warning is shown. |
| OpenAI | Embedded as text parts | Sent as image_url entries with data URLs |
Sent as file objects |
Unsupported types become plain text with a diagnostic. |
| Vertex AI | Embedded as text parts | Sent as inlineData |
Sent as inlineData |
Falls back to text with a warning. |
If a file cannot be read or the provider refuses its MIME type, Flemma warns you (including line number) and continues with the raw reference so you can adjust your prompt.
Each completed request emits a floating report that names the provider/model, lists input/output tokens (reasoning tokens are counted under ⊂ thoughts), and – when pricing is enabled – shows the per-request and cumulative session cost derived from lua/flemma/models.lua. Token accounting persists for the lifetime of the Neovim instance; call require("flemma.state").reset_session() if you need to zero the counters without restarting. pricing.enabled = false suppresses the dollar amounts while keeping token totals for comparison.
Flemma keeps the most recent notification available via :Flemma notification:recall, which helps when you close the floating window before capturing the numbers. Logging lives in the same subsystem: toggle it with :Flemma logging:enable / :Flemma logging:disable and open the log file (~/.local/state/nvim/flemma.log or your stdpath("cache")) through :Flemma logging:open whenever you need the redacted curl command and streaming trace.
Configuration keys map to dedicated highlight groups:
| Key | Applies to |
|---|---|
highlights.system |
System messages (FlemmaSystem) |
highlights.user |
User messages (FlemmaUser) |
highlights.assistant |
Assistant messages (FlemmaAssistant) |
highlights.user_lua_expression |
{{ expression }} fragments |
highlights.user_file_reference |
@./path fragments |
highlights.thinking_tag |
<thinking> / </thinking> tags |
highlights.thinking_block |
Content inside thinking blocks |
Each value accepts a highlight name, a hex colour string, or a table of highlight attributes ({ fg = "#ffcc00", bold = true }).
Role markers inherit role_style (comma-separated GUI attributes) so marker styling tracks your message colours.
Set signs.enabled = true to place signs for each message line. Each role (system, user, assistant) can override the character and highlight. Signs default to using the message highlight colour.
While a request runs Flemma appends @Assistant: Thinking... with an animated braille spinner. The line is flagged as non-spellable so spell check integrations stay quiet. Once streaming starts, the spinner is removed and replaced with the streamed content.
Add the bundled component to show the active model (and reasoning effort when set):
require("lualine").setup({
sections = {
lualine_x = {
{ "flemma", icon = "🧠" },
"encoding",
"filetype",
},
},
})
The component only renders in chat buffers. Switching providers or toggling OpenAI reasoning effort causes Flemma to refresh lualine automatically.
Flemma works without arguments, but every option can be overridden:
require("flemma").setup({
provider = "claude",
model = nil, -- provider default
parameters = {
max_tokens = 4000,
temperature = 0.7,
timeout = 120,
connect_timeout = 10,
vertex = {
project_id = nil,
location = "global",
thinking_budget = nil,
},
openai = {
reasoning = nil, -- "low" | "medium" | "high"
},
},
presets = {},
highlights = {
system = "Special",
user = "Normal",
assistant = "Comment",
user_lua_expression = "PreProc",
user_file_reference = "Include",
thinking_tag = "Comment",
thinking_block = "Comment",
},
role_style = "bold,underline",
ruler = { char = "━", hl = "NonText" },
signs = {
enabled = false,
char = "▌",
system = { char = nil, hl = true },
user = { char = "▏", hl = true },
assistant = { char = nil, hl = true },
},
notify = require("flemma.notify").default_opts,
pricing = { enabled = true },
text_object = "m",
editing = {
disable_textwidth = true,
auto_write = false,
},
logging = {
enabled = false,
path = vim.fn.stdpath("cache") .. "/flemma.log",
},
keymaps = {
enabled = true,
normal = {
send = "<C-]>",
cancel = "<C-c>",
next_message = "]m",
prev_message = "[m",
},
insert = {
send = "<C-]>",
},
},
})
Additional notes:
editing.auto_write = true writes the buffer after each successful request or cancellation.text_object = false to disable the message text object entirely.notify.default_opts exposes floating-window appearance (timeout, width, border, title).logging.enabled = true starts the session with logging already active.Flemma can turn Claude Workbench exports into ready-to-send .chat buffers. Follow the short checklist above when you only need a reminder; the full walkthrough below explains each step and the safeguards in place.
Before you start
:Flemma import delegates to the current provider. Keep Claude active (:Flemma switch claude) so the importer knows how to interpret the snippet.Flemma import overwrites the entire buffer with the converted chat.Export from Claude Workbench
anthropic.messages.create({ ... }) call produced by that export.import Anthropic from "@anthropic-ai/sdk" header).Convert inside Neovim
:Flemma import. The command:anthropic.messages.create(...).@You: / @Assistant: lines.chat so folds, highlights, and keymaps activate immediately.Troubleshooting
anthropic.messages.create call, the importer aborts with “No Claude API call found”.flemma_import_debug.log in your temporary directory (e.g. /tmp/flemma_import_debug.log). Open that file to spot mismatched brackets or truncated copies.The repository provides a Nix shell so everyone shares the same toolchain:
nix develop
Inside the shell you gain convenience wrappers:
flemma-fmt – run nixfmt, stylua, and prettier across the repo.flemma-amp – open the Amp CLI, preconfigured for this project.flemma-codex – launch the OpenAI Codex helper.Run the automated tests with:
make test
The suite boots headless Neovim via tests/minimal_init.lua and executes Plenary+Busted specs in tests/flemma/, printing detailed results for each spec so you can follow along.
To exercise the plugin without installing it globally:
nvim --cmd "set runtimepath+=`pwd`" \
-c 'lua require("flemma").setup({})' \
-c ':edit scratch.chat'
[!NOTE] Almost every line of code in Flemma has been authored through AI pair-programming tools (Aider, Amp, and Codex). Traditional contributions are welcome – just keep changes focused, documented, and tested.
.chat and the first message starts with @You: or @System:..chat file and that the provider supports its MIME type. Use ;type= to override when necessary.parameters.vertex.project_id and authentication. Run gcloud auth application-default print-access-token manually to ensure credentials are valid.keymaps.enabled = false and register your own :Flemma commands.Happy prompting!