Cursor, a pile of repos, and two strange fights to make semantic search work

TL;DR #

Cursor’s semantic search silently returns nothing when a .code-workspace lists the root folder (.) alongside its own nested subfolders. The fix: drop the workstation root and list only the repos. Re-including gitignored repos for search is a separate fight — .ignore (ripgrep) honors each repo’s nested .gitignore, but .cursorignore/.aiignore (Cursor’s indexer) don’t, so they need an explicit denylist.

Setup #

Workspaces and worktrees

I work across multiple git repos from one “workstation” repo. Each repo lives under repos/<slug> as a clone, and repos/* is gitignored so the workstation tree stays clean. I work almost entirely in the IDE. I used to be a JetBrains diehard, but nowadays it’s mostly Cursor.

I have multiple “projects” running in parallel: testing a change or a PR; working on a long-running “moon shot”; a quick fix for a customer bug; and so on. To make that manageable, I’ve written some local scripts to git worktree the root workstation repo together with its subrepos. This way, if a feature needs a multi-repo change, I’m covered.

With agentic coding, I want the agents to operate effectively against this multi-repo setup. A few Cursor tools need to work correctly in my worktrees for that to be efficient: Grep and Glob, semantic search, and Source Control (the git client in the UI). With Cursor, it’s sadly more complex than it should be. This writes it down for my future self, and for anyone with a similar setup, to save the research effort (and LLM tokens!).

Why semantic search, specifically #

Semantic search is one of the headline Cursor features, and a differentiator from Claude Code. They make a big deal of it — see Improving agent with semantic search — claiming it improves the model’s results on their benchmarks by up to 23.5% (versus conventional tools like grep/rg). The search runs on turbopuffer, and both companies market the partnership. This is probably one of the reasons you’re a Cursor customer.

Fight one: 100% indexed, zero results #

To work across the repos in one Cursor window, I had a multi-root workspace <feature-name>.code-workspace:

{
  "folders": [
    { "path": "." },
    { "path": "repos/hatchet" },
    { "path": "repos/acme-infra" }
  ],
  "settings": { "files.exclude": { "repos": true } }
}

Navigation and the per-repo git client worked, but semantic search did not.

The funny thing about the current generation of models is that they are very resourceful, and they can work around problems. E.g. if the semantic search returns no results for them, they will reach out for grep. This means it’s hard to notice the degradation.

In my case, I noticed a few times in the agent thinking traces something to the effect of “semantic search returned no results, using grep instead”. I verified this by asking the agent to answer some codebase questions using semantic search only.

To rule out the obvious: indexing was on, and Cursor reported a few thousand indexed files.

Single folder workspace #

This isn’t documented well (at all) in their official docs, but trawling through the Cursor forums I found reports that listing the workspace root in folders next to its own subfolders breaks semantic search (see use a single-root workspace, a workspace with multiple folders for a monorepo … either gets no results or results from folders that are in ROOT, and several other threads).

In my case the list of workspace folders contains the root . and all the repos, like ./repos/acme-infra and ./repos/hatchet.

The first try, then, was a single-root workspace: "folders": [{ "path": "." }]. This fixes semantic search but breaks the git client in the UI. With one root at the workstation and the repos under a gitignored path, Cursor doesn’t report the repos as git roots — you see only the workstation repo and nothing else. There’s a git.scanRepositories setting, but it doesn’t help either. This lines up with microsoft/vscode#96372.

So far, we have two shapes, each broken in some way:

  • workstation root + repos → Source Control works, semantic search empty
  • single root, repos nested → search works, Source Control shows one repo

The Cursor workspace shape that works #

I ended up dropping the workstation root from the worktrees:

{
  "folders": [
    { "path": "repos/hatchet" },
    { "path": "repos/acme-infra" }
  ]
}

No ., and the result:

  • Each folder is its own git repo, and shows in Source Control with its branch and status.
  • No parent root overlaps its children, which works around the Cursor semantic search limitation.

The cost is that the workstation’s own files aren’t a folder in this workspace. That’s an acceptable tradeoff for me: the workstation changes rarely, and I can work on it from the main checkout.

Fight two: ripgrep 🐐 and ⌘⇧F search in Cursor #

Search tools honor .gitignore — they won’t descend into ignored paths when searching for a term. For me, repos/ is in .gitignore, so out of the box ripgrep, Cursor, and the rest skip it when searching from the root.

The goal is to re-include the repos for search while leaving them out of git, and to do it without dragging in node_modules and build outputs.

.gitignore keeps the repos out of version control:

repos/*

.ignore re-includes them for ripgrep (and Claude Code’s Grep, which seems to rely on ripgrep).

One line per repo, directory only:

# .ignore
!repos/hatchet/
!repos/acme-infra/

ripgrep applies the directory re-include and then walks into it, honoring each repo’s own nested .gitignore. So node_modules, build output, and the rest stay out on their own, because the repo already ignores them.

.cursorignore and .aiignore are supposed to do the same for Cursor’s search and other AI indexers, but they don’t honor nested .gitignores. A naive attempt pulls node_modules straight into ⌘⇧F searches in Cursor:

!repos/hatchet/
!repos/hatchet/**
!repos/acme-infra/
!repos/acme-infra/**

Re-including the repos and then denylisting the usual generated directories works:

# .cursorignore, .aiignore
# Re-include cloned repos for agent search (still gitignored via repos/* in .gitignore).
!repos/hatchet/
!repos/hatchet/**
!repos/acme-infra/
!repos/acme-infra/**

# --- denylist: generated output, re-excluded after the re-includes
**/node_modules/
**/.terraform/
**/dist/
**/.next/
**/__pycache__/

Three files that look like they should be copies of each other — they serve a similar purpose — have to diverge:

  • .ignore stays directory-only and leans on each nested .gitignore
  • .cursorignore and .aiignore need the explicit /** plus a hand-maintained denylist

With that in place, Grep, Glob, and Cursor’s indexer all see the repos.

Two shapes for two jobs #

We generate two workspace shapes now, and they trade against each other deliberately:

  • Main checkout → workstation root plus repos in the .code-workspace. Source Control and the workstation files, but no semantic search across the repos. The cross-repo control center, and where I can work on the workstation repo itself.
  • Feature worktrees → repos only in the .code-workspace. Semantic search and per-repo Source Control, no workstation folder. Where the coding happens, so semantic search is what has to work.

What to take from this #

  • Cursor’s semantic search breaks when the workspace root folder also contains nested repo folders. This looks like a bug to me.
  • Listing only the root folder in the .code-workspace fixes semantic search but breaks the Source Control (git) UI for the nested repos.
  • .ignore (ripgrep) and .cursorignore/.aiignore (Cursor) behave differently with a nested folder structure.

Shape that seems to work:

  • Leave the . folder out of the .code-workspace file.
  • .gitignore the nested repos and re-include them in .ignore for ripgrep (this part actually works as expected).
  • Maintain an elaborate re-include / re-exclude list in .cursorignore.

Simplified setup:

// feature-a.code-workspace
{
  "folders": [
    { "path": "repos/hatchet" },
    { "path": "repos/acme-infra" }
  ],
  "settings": {
    "window.title": "Feature A"
  }
}
# .gitignore
repos/*
*.code-workspace
# .ignore
!repos/hatchet/
!repos/acme-infra/
# .cursorignore
# .aiignore
!repos/hatchet/
!repos/hatchet/**
!repos/acme-infra/
!repos/acme-infra/**

**/node_modules/
**/out/
**/.terraform/
**/__pycache__/
**/.venv/
# ..

Dear Cursor #

My favorite AI coding assistant ATM, but please fix the multi-repo semantic search. I don’t want to spend as much time configuring the editor as I did in my Emacs days!