Plain Markdown, Stable IDs, and Two-Layer State: Patterns From Building a PM Assistant Plugin

A few weeks ago I built a Claude plugin for a Product Manager on my team. The brief was deceptively simple: she was drowning in seven surfaces — Slack across many channels, Jira tickets, Confluence pages, a calendar full of meetings, Gmail, code reviews, the occasional shared doc — and none of those tools talk to each other. A Slack thread, a Jira ticket, a Figma file, and an email to legal can all be about the same thing without any of them knowing.

The deliverable ended up being three things at once:

A morning briefing that synthesizes all sources into one view
A live dashboard where she actually does work — replies to messages, creates tickets, marks things done
A growing knowledge vault so "what did we decide about X six weeks ago?" has an answer

The thing that made this interesting wasn't the features. It was the single constraint that drove every design decision: I was building it for someone else's machine. Not me. Not a developer. A PM who needs the thing to just work.

This is a write-up of the patterns I'd reach for again, the ones I wouldn't, and a couple of install-time bugs that cost me real hours.

The Architecture: Three Layers, One Job Each

┌──────────────────────────────────────────────────────┐
│  Skill (the brain, runs on demand)                   │
│  Phases: pull → reconcile → correlate → write → render│
│           ↓                          ↓               │
│  ┌──────────────────┐    ┌────────────────────────┐  │
│  │  Vault (history) │    │  Artifact (live UI)    │  │
│  │  Markdown + Obs  │    │  HTML + JS + connectors │  │
│  └──────────────────┘    └────────────────────────┘  │
└──────────────────────────────────────────────────────┘

The skill is the orchestrator. It pulls from connectors, reconciles state across runs, correlates items into subjects, writes to the vault, and renders the dashboard. Five phases, each with one purpose.

The vault is plain markdown. One folder, organized into daily/, subjects/, tickets/, people/, archive/. Cross-references use Obsidian wikilink syntax ([[AUTO-892]], [[billing-migration]]) and tags (#status/in-progress, #area/billing). Open the folder in Obsidian and you get the graph view, backlinks, and full-text search for free. No database. No integration to maintain.

The artifact is a live HTML dashboard that pulls fresh data on open. It re-fetches connectors via a host-provided callMcpTool API, and it can fire chat prompts back via sendPrompt. Two-way bridge.

The non-obvious decision was the substrate. Read on.

Pattern #1: Plain Markdown Beats a Database

I considered a real database — SQLite, JSON store, a small Postgres — and rejected all of them. Markdown is grep-able, version-controllable with git, openable by any editor, and survives every refactor of every tool I might add in the future. The "database query language" is filename glob plus regex.

The vault grew to dozens of files over weeks. A SQLite store would have given me indexed queries but cost me all the rest. For this scale of data and this user, it wasn't worth it.

Practical tip: If you're building a personal assistant for a non-developer, default to plain text. The user can open it in any editor, and you can debug the system by reading files. Both of those matter more than query speed at this scale.

Pattern #2: Cross-Source Correlation in Two Passes

The hard part of a PM assistant isn't pulling data from sources. It's recognizing that a Slack thread, a Jira ticket, a Figma file, and an email are all about the same thing.

I do this in two passes:

Pass 1 — entity matching. Scan every item for stable identifiers: ticket keys (regex [A-Z]+-\d+), PR numbers near "PR" or "pull", channel names, repo names, person names from the team roster, doc titles. Group items that share an entity. This catches roughly 70–80% of correlations for free, deterministically.

Pass 2 — topic similarity. For items that didn't cluster in Pass 1, ask the language model to group by topic. A Slack thread "billing migration plan?" and a Confluence page "Billing v2 rollout" are obviously the same subject even if neither references the other.

The output is a list of subjects, each with a slug (kebab-case, used as filename), a human-readable title, and a list of cross-source items. A subject is the broader topic; a ticket is one specific Jira item. A subject's page wikilinks to its tickets; ticket pages wikilink back. Obsidian's backlinks panel makes this navigable in both directions.

Practical tip: Be conservative on Pass 2. Over-merging is worse than under-merging because it muddies the vault. I tuned the prompt to err on the side of "these are different subjects" unless the evidence is strong.

Pattern #3: Dedup Before Create (And the Source-URL Footer Trick)

Anyone with a busy backlog has filed a duplicate ticket. The fix is a skill that runs a similarity search before creation, not after.

The flow:

Extract 3–6 distinctive search terms from the proposed title and description (proper nouns, domain terms, identifiers — not generic verbs).
Run three JQL searches in order: exact summary match, all top terms ANDed, any-term broad sweep over the last 90 days.
Score each candidate 0–1 for similarity using model judgment, with explicit calibration in the skill's reference docs (≥0.9 = clearly the same; 0.7–0.9 = probably related; <0.5 = coincidental keyword match).
If anything scores ≥0.7, surface the top 3 with status, assignee, and last update — and ask before creating.
Only create after explicit confirmation.

The skill is also the only path the assistant uses for ticket creation. Direct calls to createJiraIssue are explicitly discouraged in the skill instructions. Every ticket goes through dedup, which means duplicates basically don't happen anymore.

The trick that made this even more useful: every created ticket includes Originating from: <source URL> in its description footer (the Slack permalink or email thread that triggered the creation). That footer is what enables server-side resolution detection later. More on that next.

Practical tip: A pre-create skill is worth more than a post-create alert. Once you've filed the duplicate, you have to clean it up. Once you've blocked yourself from filing it, the problem is gone.

Pattern #4: Persistence Is Two Layers, Not One

The first version of the dashboard had a subtle bug. Every briefing was a fresh write — items pulled this morning replaced items from yesterday. Anything the PM hadn't acted on quietly disappeared.

The fix is a two-layer model:

Layer 1 — server-side state file (.pm-state.json in the vault). Tracks every pending item by stable ID across runs:

{
  "items": {
    "<stable-id>": {
      "source": "slack" | "gmail" | "jira" | "github",
      "first_seen": "ISO-timestamp",
      "last_seen": "ISO-timestamp",
      "status": "pending" | "resolved" | "stale" | "dismissed",
      "resolved_reason": null | "user_replied" | "ticket_created" | "merged",
      "snapshot": { ... }
    }
  }
}

Stable IDs are the Slack permalink, Gmail thread_id, Jira ticket key, or owner/repo#number. Never a content hash; always the source's own identifier.

Layer 2 — client-side localStorage. When the user clicks Send/Skip/Mark-done in the dashboard, that decision sticks across page reloads via a localStorage map keyed by item ID with a 14-day expiry.

Both layers are needed. The state file persists across briefings even if the artifact is closed. The localStorage layer makes UI actions feel instant without waiting for the next briefing run.

The non-obvious part is server-side resolution detection — for each pending item, the briefing actively checks whether it's been resolved on the source side, before deciding whether to surface it again:

Pending item type	Detection check
Slack DM/mention	`slack_read_thread`; look for user's own message after `last_seen`
Gmail thread	`get_thread`; look for outgoing message from user after `last_seen`
Suggested ticket	JQL: `description ~ "<source_url>" OR comment ~ "<source_url>"`
Blocker	`getJiraIssue`; if status no longer "Blocked", resolved
PR awaiting review	if `merged_at` non-null, resolved

The ticket-created detection only works because of the source-URL footer from Pattern #3. Every ticket created via the dedup-skill includes its source URL. So the next briefing finds any ticket created from any source — via the dashboard, via chat, or by a teammate independently — with one JQL search.

Practical tip: Either persistence layer alone produces a bad UX. State file alone, and clicks don't stick until the next briefing. localStorage alone, and items reappear after every refresh. They're cheap to add together if you design for it from the start.

Pattern #5: Sub-Agents Need to Be Emitted in a Single Message

The briefing pulls from up to seven sources. Done serially, that's 60–90 seconds. Done in parallel, it's 15–25 seconds — bounded by the slowest single source.

The trick is one I'd been bitten by before: spawn one sub-agent per source, and emit all the agent tool calls in a single response message. Tool calls within one assistant message run concurrently. If you emit them across multiple messages, they execute sequentially and the parallelism is lost.

Each fetcher returns a fixed JSON contract:

{
  "source": "slack" | "jira" | ...,
  "status": "ok" | "partial" | "failed",
  "error": null | "<short message>",
  "data": { ... source-specific shape ... }
}

Failure isolation is the underrated benefit. One source's connector being flaky doesn't break the rest of the briefing — the orchestrator notes that source as failed in the dashboard's source-health section and continues.

Practical tip: If you're writing skills that orchestrate multiple sub-agents, make "emit all Agent tool calls in a single message" the first rule in the skill's parallelism reference doc. It's the most common mistake when a model writes orchestration logic, and it's a 5–8x speedup.

Where Things Go Wrong

I'd be doing you a disservice if I only talked about the wins. Three things hurt enough to remember.

Install validation gave me opaque error messages. The plugin shipped fine in v0.1, then completely failed to install through v0.2 with a single useless message: "Plugin validation failed." The host's CLI validator (claude plugin validate) only checks plugin.json and reported the manifest as fine. The actual install validation was stricter and surfaced no specific error. I had to bisect.

I built a stripped-down test plugin that was byte-identical to the last known good version with a bumped version. It installed. From there I added components back in halves until I found the one that broke install. Six bisect rounds total. The actual failures, in order:

Dotfiles in non-standard paths. A vault-seed/.team-roster-default.json file. The installer accepts only one specific dotfile path. Renamed.
version: field in skill frontmatter. The schema doc lists it as optional; the installer rejects any frontmatter field beyond name and description. Stripped from all SKILL.md files.
Angle brackets in skill descriptions. The brutal one. One skill's description contained "what's the latest on <subject> before the meeting". The literal <subject> looked like an unclosed HTML tag to the installer's parser. Same generic "validation failed" error — six bisect rounds before I narrowed it to a single skill, then a single line, then a single substring.

Practical tip: Test install on day one with a minimal plugin shell, then incrementally add every component. The closer your dev loop is to "actually install on the target system," the faster everything else moves. If I'd done this from the start, I would have caught all three gotchas immediately instead of batching seven new skills into one version and spending half a day bisecting.

The encoding bug. The dashboard rendered — (em-dash) as â€". Classic UTF-8-bytes-decoded-as-Latin-1. Two characters of HTML fixed it:

<meta charset="utf-8">

Plus a belt-and-suspenders rule in the skill: when serializing the data blob the skill injects into the artifact, escape non-ASCII to \uXXXX (Python: json.dumps(d, ensure_ascii=True)). That way it doesn't matter how the host serves the HTML — the JSON is pure ASCII and decodes correctly.

If your dashboard ever shows â€" or Â·, this is your bug.

Drafts that didn't sound like the user. The first version of the suggested-replies feature read like generic LLM output. The fix was a separate one-time skill that reads the user's last 50 sent Slack messages and 20 emails, extracts a tone profile (length, formality, sign-offs, quirks, what they don't do), and writes it to style-guide.md in the vault. The drafter sub-agents read this file once at the start of their batch. With it, drafts read like the user. Without it, they're noise.

What I'd Set Up on Day One

If I were starting a similar plugin fresh tomorrow, here's what I'd configure first:

A minimal install loop. Package and install a near-empty plugin shell on day one. Add components incrementally and re-install after each. Catch validator failures immediately, not in week three.

Stable IDs from the first commit. Slack permalink, Gmail thread_id, Jira ticket key, owner/repo#number. Every persistence pattern downstream depends on these. If you start with content hashes or fuzzy matching, you'll regret it.

A two-layer persistence model from the start. State file in the vault, localStorage in the artifact, communicating via stable IDs. Bolt this in late and you'll be unwinding fresh-write logic everywhere.

Standing safety rules in writing. Gmail replies always go to Drafts, never direct send. Slack sends require an explicit click. If a Slack send fails, surface the text for manual paste — don't retry on a different channel. These rules live in both the skill description and the artifact JS, because either alone is a single point of failure.

What I'd Give the Next Person Doing This

A small kit of patterns, in roughly the order they paid off:

Pattern	Why it matters
Plain markdown vault with wikilinks and tags	History for free, no integration to maintain
Stable IDs everywhere	Foundation for cross-run persistence
Two-pass correlation (entity → topic)	80% of clustering is free; only the residual needs LLM judgment
Dedup before create + source-URL footer	One pattern eliminates duplicate tickets and enables created-elsewhere detection
Sub-agents emitted in one message	5–8x speedup, failure isolation, clean shape
`sendPrompt` from artifact to invoke skills	Bridges UI buttons to dedup-checked skill flows
State file + localStorage two-layer model	The only way to make "did I already handle this?" feel right
`<meta charset="utf-8">` + JSON ensure_ascii	Free fix for a bug you only see after you ship

And a single anti-pattern, learned the slow way: don't put angle-bracket placeholders in YAML frontmatter description values when your installer parses them as markup. The error message you'll get is just "validation failed."

If you're building a similar tool, the architecture is more general than the specific connectors. Swap Jira for Linear, Slack for Discord, Confluence for Notion — the patterns hold. The shape that matters is: skill orchestrates, vault remembers, artifact transacts. Everything else is detail.

The thing that surprised me most wasn't any single technical decision. It was how much the "not for me" constraint changed the work. When you're building for yourself, you can absorb a hundred small frictions because you know your own tools. When you're building for someone else, every one of those frictions becomes a bug. That pressure made the design better.