An AI workspace inside Neovim where every conversation is a document you own.
https://github.com/user-attachments/assets/cd1c509d-faea-48e1-bd4d-d01e234d6856
[!IMPORTANT] Actively Evolving. See the roadmap for what's coming next. Pin a tag if you need a stable target.
Flemma is an AI plugin for Neovim. You write in .chat files -- plain text with simple role markers -- and Flemma handles everything else: streaming responses, running tools, managing providers, and keeping the conversation clean and navigable.
@You:
Turn my rough notes into a project update for the team.
- Auth module now validates JWTs server-side.
- Migrated billing webhook to v2 API.
- Fixed the flaky CI timeout on integration tests.
Use `git log` for commit details.
@Assistant:
(response streams here)
What makes Flemma different from other AI tools is a simple design choice: the .chat file is the conversation. There's no database behind it, no hidden session state, no opaque storage. The file you see is the file the model sees. That one decision unlocks everything else.
-- lazy.nvim (any plugin manager works):
{ "Flemma-Dev/flemma.nvim", opts = {} }
-- If your plugin manager doesn't auto-call setup(), add this to your config:
require("flemma").setup({})
export ANTHROPIC_API_KEY="sk-ant-..." # or OPENAI_API_KEY, MOONSHOT_API_KEY
Create a file ending in .chat. Type your message after @You:. Press Ctrl-].
Requirements: Neovim 0.11+, curl, Markdown Tree-sitter grammar. Optional: bwrap for sandboxing (Linux), file for MIME detection.
Flemma never accepts API keys in your Lua config -- credentials stay in environment variables or your platform's secure keyring.
Environment variables (simplest approach):
| Provider | Variable |
|---|---|
| Anthropic | ANTHROPIC_API_KEY |
| OpenAI | OPENAI_API_KEY |
| Moonshot | MOONSHOT_API_KEY |
| Vertex AI | VERTEX_AI_ACCESS_TOKEN (or service-account flow below) |
Linux keyring (Secret Service) -- store once, reuse across all Neovim sessions:
secret-tool store --label="Anthropic API Key" service anthropic key api
secret-tool store --label="OpenAI API Key" service openai key api
secret-tool store --label="Moonshot API Key" service moonshot key api
macOS Keychain is also supported.
Vertex AI requires a Google Cloud service account:
VERTEX_SERVICE_ACCOUNT='{"type": "..."}', or store them in the Linux keyring with secret-tool store --label="Vertex AI Service Account" service vertex key api project_id your-project.gcloud is on your $PATH -- Flemma uses gcloud auth print-access-token to refresh tokens.project_id in your config or via :Flemma switch vertex gemini-3.1-pro-preview project_id=my-project.Flemma tries each resolver in order and uses the first one that returns a credential. When everything fails, the notification tells you exactly which resolvers were tried and why each one couldn't help. You can also write custom resolvers for tools like Bitwarden or 1Password -- read more in extending.md.
Most AI tools treat conversations as disposable. Some let you resume a session or rewind to a checkpoint, but you can't go back and edit a message you sent two turns ago and have the model treat it as if it was always there. The conversation is the tool's state, not yours. Flemma takes the opposite approach.
Your conversations are files. Save them. Reopen them tomorrow. git commit them. grep across months of work. Share a conversation with a colleague by sending them the file -- they open it in Flemma and pick up exactly where you left off, with the same model settings, the same system prompt, the same everything.
You can edit anything. The model hallucinated? Fix the response and resend. Went down the wrong path? Delete the last few turns and try again. Want to test how a different model handles the same prompt? Switch providers mid-conversation with :Flemma switch openai and press Ctrl-]. There's no hidden state to get out of sync because there is no hidden state.
Every conversation can have its own settings. One .chat file uses Claude for code review with full tool access. Another uses Gemini for brainstorming with thinking turned off. A third is a reusable template your team shares for incident postmortems. The settings live inside each file -- no global config changes needed.
You stay in Neovim. Vim motions, your keybindings, your colour scheme, your workflow. Flemma adds a handful of buffer-local mappings and gets out of the way.
Flemma is more than a chatbot. Here are some of the things people use it for:
@./path/to/file, ask questions, iterate. Switch between Claude and GPT to compare perspectives on the same problem..chat file with a system prompt and variables becomes a template. Share it with your team. Each person fills in their details and gets consistent results.Four built-in providers. Switch at any time -- even mid-conversation:
:Flemma switch openai gpt-5 temperature=0.3
:Flemma switch $fast " named presets
| Provider | Default Model |
|---|---|
| Anthropic | claude-sonnet-4-6 |
| OpenAI | gpt-5.4 |
| Vertex AI | gemini-3.1-pro-preview |
| Moonshot | kimi-k2.5 |
All four support extended thinking/reasoning through a single thinking parameter that Flemma maps to each provider's native format. Set thinking = "high" once and it works everywhere -- see the full mapping table in configuration.md. Prompt caching is handled automatically -- read more in prompt-caching.md.
Credentials are resolved automatically from environment variables or your platform keyring -- see Setting up credentials under Quick Start above.
Define presets for quick switching:
require("flemma").setup({
presets = {
["$fast"] = "vertex gemini-2.5-flash thinking=minimal",
["$review"] = { provider = "anthropic", model = "claude-sonnet-4-6", max_tokens = 6000 },
},
})
Flemma can work autonomously. When the model needs to read a file, edit code, or run a command, it uses tools -- and with autopilot enabled (the default), the entire cycle happens without you pressing a key:
You can watch the whole thing happen in the buffer. Every tool call, every result, every decision is visible text that you can read, edit, or undo.
| Tool | What it does |
|---|---|
bash |
Runs shell commands |
read |
Reads file contents |
edit |
Find-and-replace in files |
write |
Creates or overwrites files |
grep |
Searches with ripgrep (experimental) |
find |
Finds files by pattern (experimental) |
ls |
Lists directory contents (experimental) |
read, edit, write, grep, find, ls) are auto-approved. bash is auto-approved when the sandbox is available, or requires your review otherwise. You see a preview of what the tool will do before approving: bash: running tests -- $ make test. Customize approval with presets ($standard, $readonly) or write your own logic./tmp are writable. Enabled by default. The sandbox is damage control, not a security boundary -- it limits the blast radius of common accidents, not deliberate attacks.Flemma supports the Model Context Protocol (MCP) through MCPorter, a standalone CLI toolkit that handles server discovery, OAuth, and connection management. Rather than reimplement MCP inside a Neovim plugin, Flemma delegates the hard parts -- OAuth flows, token caching, transport negotiation, credential vaults -- and focuses on what it does well: making those tools available to the model in your .chat buffer.
Enable it and point it at the servers you want:
tools = {
mcporter = {
enabled = true,
include = { "slack:*", "linear:*" }, -- glob patterns for which tools to enable
},
}
At startup, Flemma discovers your MCP servers, fetches their tool schemas, and registers each one as a native tool. The model sees them alongside bash, read, and edit -- same approval flow, same autopilot, same .chat visibility. MCPorter auto-imports servers from Claude Code, Cursor, VS Code, and Windsurf, so you can likely enable it and have tools available immediately.
Read the full setup and configuration guide in mcp.md.
Flemma is a document-based AI workspace. There are broadly two kinds of AI coding tools: inline assistants that suggest and apply diffs to your source files, and agent-style tools where you give a task and watch it work. Flemma is the second kind -- closest to the terminal agent pattern, but embedded in your editor.
What it does well:
.chat files stick around. Reopen them, share them, version them. Build a library of reusable prompts and templates.What it doesn't try to do:
@./path or paste context into the conversation.All commands live under :Flemma with tab completion. Misspelled commands get did-you-mean suggestions.
:Flemma Command |
Purpose |
|---|---|
send |
Send the buffer to the provider |
cancel |
Abort the active request or tool |
switch ... |
Change provider, model, or parameters |
status [verbose] |
Show runtime status and resolved configuration |
import |
Import from Claude Workbench format (see importing.md) |
autopilot:enable|disable|status |
Toggle or inspect autonomous mode |
sandbox:enable|disable|status |
Toggle or inspect sandboxing |
tool:execute|cancel|cancel-all|list |
Manage tool executions |
message:next|previous |
Jump between messages |
logging:enable|disable|open |
Structured logging |
diagnostics:enable|disable|diff |
Request diagnostics (useful for debugging cache) |
.chat files, all configurable)| Mode | Key | Action |
|---|---|---|
| NormalInsert | Ctrl-] | Send to provider (or advance the tool approval cycle) |
| Normal | Ctrl-C | Cancel |
| Normal | Alt-Enter | Execute the tool under cursor |
| Normal | ]m / [m |
Next / previous message |
| Normal | Space | Toggle message fold |
| Operator | im / am |
Inner / around message text objects |
Flemma works without arguments -- require("flemma").setup({}) uses sensible defaults. Here's a practical starting point:
require("flemma").setup({
provider = "anthropic", -- "anthropic" | "openai" | "vertex" | "moonshot"
thinking = "high", -- unified across all providers
presets = {
["$fast"] = "vertex gemini-2.5-flash thinking=minimal",
["$opus"] = "anthropic claude-opus-4-6 thinking=max",
},
sandbox = { backend = "required" }, -- warn if no sandbox backend is available
editing = { auto_write = true }, -- save .chat files after each response
})
Individual .chat files can override any of these settings. Detailed references:
Flemma is designed to be extended. Everything plugs in through clean registries:
require("flemma.tools").register(). Read more in tools.md.FlemmaRequestSending, FlemmaToolFinished, etc.) as standard User autocmds. Read more in extending.md.openai_chat for compatible APIs. Read more in extending.md.CLAUDE.md, .cursorrules, etc.). Read more in personalities.md.Integrations with lualine and bufferline.nvim are documented in integrations.md. nvim-web-devicons gets a .chat file icon automatically.
After each response, a floating notification shows the model name, token counts, cost for this request, and cumulative session cost. When prompt caching kicks in, you'll see the cache hit rate -- green when it's saving you money, red when it's not.
Messages fold cleanly: thinking blocks and tool calls collapse automatically so you can focus on the conversation. Press Space to toggle a message fold, za for individual blocks. Rulers separate messages visually, and line highlights give each role a subtle background tint. Everything adapts to your colour scheme.
Flemma ships integrations for lualine (model and cost in the statusline) and bufferline (busy indicator on .chat tabs). Read more in ui.md and integrations.md.
The repository uses a Nix shell for a reproducible development environment. Run nix develop to enter it.
From there, make develop launches Neovim with Flemma loaded from your working directory -- useful for trying out changes. make qa runs every quality gate in parallel (linting, type checking, import conventions, and the full test suite) and is the single command to run before committing.
[!NOTE] Almost every line of code in Flemma has been authored through AI pair-programming tools. Traditional contributions are welcome -- keep changes focused, documented, and tested.
Yes. Each .chat file can set its own provider, model, and parameters. You can also switch mid-conversation with :Flemma switch openai gpt-5. Read more in configuration.md.
Yes. Type @./path/to/file in your message and Flemma inlines the content before sending. Images and PDFs are base64-encoded and sent as multipart attachments where the provider supports it. MIME types are detected automatically. Read more in templates.md.
Yes. A .chat file with a system prompt, variables, and expressions becomes a template. Define variables in a code block at the top of the file and reference them throughout your messages. You can also include other files and compose system prompts from building blocks. Read more in templates.md.
Yes. Tools are governed by an approval system with built-in presets ($standard for file tools, $readonly for read-only access). You can auto-approve specific tools, require manual review for others, or write custom approval logic. Each .chat file can override the global policy. Read more in tools.md.
Flemma checks environment variables first, then your platform keyring (Linux Secret Service or macOS Keychain), then gcloud for Vertex AI. You never have to put keys in a config file. Read more in extending.md.
Yes. Register custom tools, approval resolvers, credential resolvers, sandbox backends, and more -- everything plugs in through registries. For MCP servers (Slack, Linear, GitHub, etc.), enable the built-in MCPorter integration -- see mcp.md. For custom Lua tools, read tools.md and extending.md.
@StanAngeloff. Flemma started as a personal tool for thinking, writing, and experimenting with AI inside Neovim. It's been used for everything from architecture documents and project planning to bedtime stories.
Start with :Flemma status. It shows a tree of everything Flemma knows about the current buffer -- provider, model, resolved parameters, sandbox state, enabled tools, approval policies, and which config layer set each value. Add verbose for the full picture. If something isn't working, this is the fastest way to find out why.
| Problem | Fix |
|---|---|
| Nothing happens on send | Buffer must end with .chat. Messages need @You: on its own line, content below. |
| Vertex refuses requests | Check parameters.vertex.project_id. Run gcloud auth print-access-token manually. |
| Sandbox blocks writes | :Flemma sandbox:status to check writable paths. |
| Keymaps clash | keymaps.enabled = false to disable all built-in mappings. |
| Temperature ignored | Thinking (default "high") disables temperature on Anthropic/OpenAI. Set thinking = false. |
append/remove/prepend semantics. That's how a single .chat file can remove one tool from the approval list without replacing the whole thing.{{ expressions }}, Flemma supports {% code %} blocks for loops, conditionals, and variable assignment -- with whitespace trimming ({%- -%}), graceful error degradation, and strict undefined-variable detection. It compiles templates to Lua and runs them in a sandboxed environment..chat file is parsed into a structured document tree -- messages, segments, tool blocks, thinking blocks, expressions, all with position tracking and diagnostics. During streaming, only the newly appended lines are re-parsed; the rest is frozen in a snapshot..chat buffers. Hover shows AST node details for the element under cursor. Go-to-definition jumps between tool use and result blocks, and resolves include() paths and @./file references. gf works on file references too.anthropic:signature="...", openai:signature="..."). When you switch providers mid-conversation, old signatures stay in the buffer but are filtered out of the new provider's request -- so you can switch back without losing reasoning state.Happy prompting!