Shiny.DocumentDb v6 — Vectors, Filters & a Real Connection Pool
Shiny.DocumentDb v6 just shipped. Same one-line services.AddDocumentStore(...), same zero-schema document model, same AOT story — but v6 closes five gaps that have been sitting on the wish list since v3.
Here’s what’s new and why it matters.
The Five Headliners
- Vector / ANN search that translates to the native engine on every provider — pgvector on PostgreSQL,
VECTOR_DISTANCEon SQL Server 2025, DiskANN on Cosmos,$vectorSearchon MongoDB Atlas,vsson DuckDB,sqlite-vecon SQLite. - Global query filters —
AddQueryFilter<T>(u => !u.IsDeleted)and that predicate applies to every read, every single-doc fetch, every bulk operation, and every change-stream subscription. Same shape as EF Core’sHasQueryFilter. - Composite (multi-column) JSON indexes —
CreateIndexAsync<T>(ctx.User, u => u.LastName, u => u.FirstName)across every relational provider. - Real connection pooling on server SQL — PostgreSQL, MySQL, and SQL Server stop serializing through a per-store semaphore and start using the ADO.NET driver’s pool. One
DocumentStoreinstance can now actually serve a web app. - Per-query change monitoring —
.NotifyOnChange()on any fluent query, filtered by the query’s ownWherepredicates.
The detail follows.
Vector / ANN Search
Embedding search has been the most-requested feature since the AI tools shipped in v4. v6 lands it everywhere.
Register an embedding property, then query by similarity:
public class Document
{
public Guid Id { get; set; }
public string Content { get; set; } = "";
public ReadOnlyMemory<float> Embedding { get; set; }
}
var store = new DocumentStore(new DocumentStoreOptions
{
DatabaseProvider = new SqliteDatabaseProvider("Data Source=mydata.db")
{
EnableVectorExtension = true // loads sqlite-vec on every connection
}
}.MapVectorProperty<Document>(
d => d.Embedding,
dimensions: 1536,
metric: VectorDistance.Cosine,
indexKind: VectorIndexKind.Hnsw));
var hits = await store.Query<Document>()
.Where(d => d.Content.Contains("invoice")) // pre-filter where supported
.NearestVectors(queryEmbedding, k: 10);
foreach (var hit in hits)
Console.WriteLine($"{hit.Score:F4} {hit.Document.Content}");
The vector type is ReadOnlyMemory<float> everywhere — same shape as Microsoft.Extensions.AI.Embedding<float>.Vector, JSON-round-trips through System.Text.Json without a custom converter, and avoids a float[] allocation on every read.
One API, six engines
| Provider | Storage | Index | Filter |
|---|---|---|---|
| PostgreSQL | pgvector sidecar | HNSW, IVF | Pre-filter via JOIN |
| SQL Server 2025 | Native VECTOR(n) sidecar | DiskANN | Pre-filter via JOIN |
| Cosmos DB | Embedded in document JSON | DiskANN, QuantizedFlat, Flat | WHERE + ORDER BY VectorDistance(...) |
| MongoDB (Atlas) | $vectorSearch aggregation | HNSW (Atlas-managed) | Filter inside $vectorSearch |
| DuckDB | vss sidecar | HNSW | Pre-filter via JOIN |
| SQLite | sqlite-vec virtual table | None (flat scan) | Post-filter join back |
| MySQL / LiteDB / IndexedDB | — | — | Throws NotSupportedException |
Cosine / Euclidean / DotProduct are available everywhere; Hamming is pgvector-only. Cosine distance is always surfaced as [0, 2] regardless of which way the underlying engine likes to count, so ORDER BY score ASC does the same thing on every provider.
Auto-embed on insert
If you don’t want to call IEmbeddingGenerator by hand on every write, Shiny.DocumentDb.Extensions.AI ships an AutoEmbedOnInsert<T> helper that hooks into a new OnBeforeInsert<T> pipeline:
using Shiny.DocumentDb.Extensions.AI;
opts.MapVectorProperty<Document>(d => d.Embedding, dimensions: 1536)
.AutoEmbedOnInsert<Document>(
embeddingGenerator,
sourceSelector: d => d.Content,
targetSetter: (d, vec) => d.Embedding = vec,
targetGetter: d => d.Embedding); // skip when already set
await store.Insert(new Document { Content = "hello world" });
// Embedding is populated automatically before the row hits the wire.
It runs on Insert, BatchInsert, and Upsert, skips when the source is null/empty, and skips when the target already holds a non-default vector — explicit writes always win over the generator.
Global Query Filters
If you have shipped anything on Entity Framework Core, you have written this:
modelBuilder.Entity<User>().HasQueryFilter(u => !u.IsDeleted);
Shiny.DocumentDb v6 gives you the same surface, on a document store, across every provider:
var store = new DocumentStore(new DocumentStoreOptions
{
DatabaseProvider = new SqliteDatabaseProvider("Data Source=mydata.db")
}
.AddQueryFilter<User>(u => !u.IsDeleted) // unnamed
.AddQueryFilter<Order>("tenant", o => o.TenantId == ctx.Current) // named
.AddQueryFilter<Order>("status", o => o.Status != "Archived"));
Filters AND together; your Where is AND’d on top. Captured variables (ctx.Current) are re-read on every translation, so per-request tenant scopes work without rebuilding the store.
What gets filtered
The interesting decision is what isn’t filtered. v6 matches EF Core exactly: every read path enforces the filter, but inserts and raw SQL stay free.
| Path | Filtered? |
|---|---|
Query<T>() and every terminal (ToList, Count, ExecuteUpdate, …) | Yes |
query.NotifyOnChange() | Yes — only matching documents emit |
Get<T>(id) / GetDiff<T>(id, ...) | Yes — returns null if filter fails |
Update<T> | Yes — throws “not found” if filter fails |
SetProperty<T> / RemoveProperty<T> / Remove<T>(id) / Clear<T>() | Yes |
Insert<T> / BatchInsert<T> / Upsert<T> | No — matches EF Core |
Query<T>(rawSql) / QueryStream<T>(rawSql) | No — your SQL, your call |
Per-query opt-out matches EF Core too:
// Disable every filter
var allUsers = await store.Query<User>().IgnoreQueryFilters().ToList();
// Disable a specific named filter — the others still apply
var anyTenant = await store.Query<Order>().IgnoreQueryFilters("tenant").ToList();
This works on every provider that has a real query translator: relational SQL (DocumentStore), LiteDbDocumentStore, CosmosDbDocumentStore, MongoDbDocumentStore, and IndexedDbDocumentStore.
This single feature collapses a lot of hand-written boilerplate. Soft-delete, multi-tenancy, “active only”, row-level security — all become one line of options config.
Composite JSON Indexes
CreateIndexAsync<T> has accepted a single expression since v3. v6 adds a multi-expression overload:
// Single-column (unchanged)
await store.CreateIndexAsync<User>(u => u.Name, ctx.User);
// Composite — one B-tree over multiple JSON paths
await store.CreateIndexAsync(
ctx.User,
u => u.LastName,
u => u.FirstName);
The composite index name is built by joining the resolved paths with __, so ix_User_LastName__FirstName is what ends up on disk. Drop it with the matching overload:
await store.DropIndexAsync(ctx.User, u => u.LastName, u => u.FirstName);
How each provider implements it:
- SQLite / SQLCipher / PostgreSQL / MySQL / DuckDB — one composite index with one
json_extract(or provider equivalent) expression per path. Single statement, single index object. - SQL Server — JSON expression indexes need
PERSISTEDcomputed columns. v6 creates one column per path (cc_{indexName}_0,cc_{indexName}_1, …) and indexes them all. The drop path discovers the backing computed columns fromsys.index_columns, so single- and multi-column indexes drop through the same code path with no special-case logic.
Existing single-path index names are preserved bit-for-bit, so v5 indexes survive an upgrade without an OBJECT_DROP_FAILED somewhere in production.
Real Connection Pooling on Server SQL
This is the change I’m most relieved about.
v5 was honest about its limit: a single DocumentStore instance serialized every operation through one semaphore around one long-lived connection. Fine on a phone. Miserable on a server.
v6 splits behaviour along the provider:
- PostgreSQL, MySQL, SQL Server — open a connection per operation. The ADO.NET driver’s pool multiplexes callers. One store, many concurrent calls, no in-process queueing.
- SQLite, SQLCipher, DuckDB — embedded engines that take a database-wide write lock. These keep the v5 model: one long-lived connection, one per-store semaphore. The provider declares which mode it wants via
IDatabaseProvider.RequiresSingleConnection.
RunInTransaction pins one connection for the duration of the user callback regardless of provider, so every nested operation shares the transaction.
Table init is now backed by a ConcurrentDictionary<string, Lazy<Task>> — first-touch DDL runs exactly once per table even under concurrent first calls. No more “is the schema there yet?” races on cold start.
A small but important consequence for streaming: on the pooled providers, await foreach (... in store.Query<T>().ToAsyncEnumerable()) holds one connection out of the pool for the lifetime of the iterator instead of holding the entire store. Other callers don’t block. On the embedded engines, behaviour is unchanged — finish the enumeration before issuing another store call.
Per-Query Change Monitoring
IObservableDocumentStore shipped in v5.3 with a global, type-scoped stream of DocumentChange<T>. v6 adds a query-scoped overload — every fluent query now exposes a .NotifyOnChange() that filters the change feed by the query’s own Where predicates:
var pending = store.Query<Order>().Where(o => o.Status == "Pending");
await foreach (var change in pending.NotifyOnChange(ct))
{
// Only fires when an Order matching Status == "Pending" is inserted or updated.
UpdateUi(change);
}
OrderBy, Paginate, and GroupBy are ignored because they change result shape, not membership. Calling Select(...) first throws — projecting away the document body breaks the filter.
SetProperty, RemoveProperty, Remove, and Clear don’t carry the full document, so DocumentChange<T>.Document is null for those events. The per-query filter passes them through unconditionally so the consumer can re-query and decide for itself whether the document still matches.
Combined with the new global query filters and the existing IChangeFeedDocumentStore (cross-process change feeds backed by PostgreSQL LISTEN/NOTIFY, SQL Server Change Tracking, and Cosmos DB Change Feed), change observation is now end-to-end coherent: every read goes through the same filter; every change subscription sees only the changes that match.
Smaller Wins
A few items that landed without their own section but are worth knowing about:
MapIdProperty<T>(...)— standalone Id-property override that no longer requiresMapTypeToTable. Use it when the Id is namedSlugorDeviceKeybut you still want the type stored in the default shared table.OnBeforeInsert<T>— an async pre-write hook onDocumentStoreOptions.AutoEmbedOnInsert<T>is the headline consumer but it’s a general “compute derived fields” extension point.SupportsVectoronIDocumentStoreandIDatabaseProvider, matching the existingSupportsSpatial.- PostgreSQL optimistic concurrency fix — the version check now extracts as a typed int (
::BIGINT), no more42883: operator does not exist. - PostgreSQL and DuckDB multi-tenancy fix — the
CAST(@data AS JSONB)/CAST(@data AS JSON)envelopes no longer break the tenant-column rewrite.
Upgrading
v6 is API-compatible with v5 in every place that matters. The semaphore on the server-SQL providers is gone — if you relied on it to serialize writes from one store instance, switch to RunInTransaction for that semantic. Everything else is purely additive.
dotnet add package Shiny.DocumentDb.PostgreSql --version 6.0.0
dotnet add package Shiny.DocumentDb.Sqlite --version 6.0.0
dotnet add package Shiny.DocumentDb.SqlServer --version 6.0.0
# etc.
This is the release that takes Shiny.DocumentDb from “great for mobile and embedded” to “actually fine for a real ASP.NET Core service” without losing any of the original ergonomics. Pull it down, kick the tires, and tell me what breaks.
comments powered by Disqus