In ragnar_find_links(), the default
children_only = FALSE now returns all links on a page. If
you relied on the previous default, set
children_only = TRUE (#115).
ragnar_register_tool_retrieve() now uses
search_{store@name} as the default tool name prefix
(instead of rag_retrieve_from_{store@name}), so you may
need to update any code that refers to the tool name explicitly (#123,
#127).
New embed_azure_openai() supports embeddings from
Azure AI Foundry (#144).
New embed_snowflake() supports embeddings via the
Snowflake Cortex Embedding API (#148).
New mcp_serve_store() lets local MCP clients
(e.g. Codex CLI or Claude Code) search a RagnarStore
(#123).
ragnar_retrieve() (and the corresponding ellmer
retrieval tool) now accepts a vector of queries (#150).
New ragnar_store_atlas() visualizes store embeddings
(#124).
New ragnar_store_ingest() prepares documents in
parallel with mirai and inserts them into a store (#133).
embed_ollama() now defaults to the
embeddinggemma model (#121).
embed_openai() error messages are now surfaced to
the user (#112).
Embedding helpers now share a generalized request retry policy,
configurable via options(ragnar.embed.req_retry = ...)
(#138).
ragnar now requires mirai >= 2.5.1 (#139).
print() on a RagnarStore now shows the
store location (#116).
ragnar_retrieve_bm25() now orders results by
descending score (#122).
ragnar_retrieve() no longer returns duplicate rows
when called with multiple queries (#153).
The ellmer retrieval tool now omits score columns from its output (#130).
ragnar_store_inspect() now includes keyboard
shortcuts, a draggable divider, improved preview linkification, better
metadata display, and other UI tweaks (#120, #117, #118).
ragnar_find_links() works better with local HTML
files (#115).
ragnar_store_insert() and
ragnar_store_update() (v2 stores) now handle stores that
are missing store@schema metadata (#146).
read_as_markdown() once again fetches YouTube
transcripts and now supports youtube_transcript_formatter,
so you can add timestamps or links to the transcript output
(#149).
read_as_markdown() gains an origin
argument to customize the @origin recorded on returned
documents (#128).
read_as_markdown() now correctly reads plain-text
files with non-ASCII characters (#151).
Vignette heading levels were fixed (#129).
Added an example using sentence-transformers
embeddings (#131).
ragnar_register_tool_retrieve() now registers a tool
that will not return previously returned chunks, enabling the LLM to
perform deeper searches of a ragnar store with repeated tool calls
(#106).
Updates for ellmer v0.3.0 and duckdb v1.3.1 (#99)
Improved docs and error message in
ragnar_store_insert() (@mattwarkentin, #88)
ragnar_find_links() can now parse
sitemap.xml files. It also gains a validate
argument, allowing for sending a HEAD request to each link
and filtering out broken links (#83).
ragnar_inspector() now renders all urls as clickable
links in the chunk markdown viewer, even if url is not a formal markdown
link (#82).
Before running examples and tests we now check if ragnar can load DuckDB extensions. This fixes issues in environments where DuckDB pre-built binaries for extensions are not compatible with the installed DuckDB version (#94).
Added embed_lm_studio to use LMStudio as an
embedding provider (#100).
Fixed a bug causing ragnar_retrieve() to fail when
documents were inserted without an origin (#102).
We now suppress a “Couldn’t find ffmpeg or avconv” warning when
importing markitdown when using read_as_markdown(). The
warning would only be relevant for users doing audio transcription
(#103).
Added embed_google_gemini to use Google Gemini API
as an embedding provider (#105).
ragnar_store_create() gains a new argument:
version, with default 2. Store version 2 adds
support for chunk deoverlapping on retrieval and automatic chunk
augmentation with headings. To support these features, the internal
schema and ingestion requirements are different. See
markdown_chunk() and new S7 classes
MarkdownDocument and MarkdownDocumentChunks.
Backwards compatibility is maintained with version = 1. (#58, #39,
#36)
ragnar_store_create() now supports Date and POSIXct
classes supplied to extra_cols.
ragnar_store_create() now supports remote MotherDuck
Databases specified with md:<dbname> as the
location argument. (#50)
ragnar_retrieve() and friends gain a
filter argument, adding support for efficiently filtering
retrieval results.
ragnar_retrieve_bm25() gains arguments
b, k, and conjunctive
(#56).
ragnar_retrieve_vss() gains argument
query_vector, supporting workflows that preprocess the
query string before embedding.
ragnar_retrieve_vss() set of valid
method choices have been updated to a narrower set to
ensure that an HNSW index scan is used.
Passing a tbl(store) to
ragnar_retrieve() is deprecated.
New chunker markdown_chunk() with support for chunk
heading context generation, semantic boundary selection, overlapping
chunks, document segmentation, and more. (#56)
New function embed_google_vertex() (@dfalbel, #49)
New function embed_databricks() (@atheriel, #45)
New function ragnar_chunks_view() for quickly
previewing chunks (#42)
ragnar_register_tool_retrieve() gains optional
name and title arguments to allow for more
descriptive tool registration. These values can also be set in
ragnar_store_create() (#43).
ragnar_read() and read_as_markdown()
now accept paths that begin with ~ (@topepo, #46, #48).
Changes to read_as_markdown() HTML conversion (#40,
#51):
html_extract_selectors and
html_zap_selectors provide a flexible way to exclude some
html page elements from being included in the converted markdown.