Efficient Multivector Retrieval with Token-Aware Clustering and Hierarchical Indexing
Silvio Martinico, Franco Maria Nardini, Cosimo Rulli, Rossano Venturini
Proposes TACHIOM, a multivector retrieval system that uses Token-Aware Clustering (TAC) for accurate and scalable token clustering. By combining hierarchical indexing with a MaxSim-optimized Product Quantization layout, TACHIOM achieves up to 247x faster clustering than standard k-means and delivers up to 9.8x faster retrieval compared to state-of-the-art systems.
Sparton: Fast and Memory-Efficient Triton Kernel for Learned Sparse Retrieval
Thong Nguyen, Cosimo Rulli, Franco Maria Nardini, Rossano Venturini, Andrew Yates
Sparton is a Triton kernel for the Language Model head in Learned Sparse Retrieval models that fuses tiled matrix multiplication, ReLU, log1p, and max-reduction into a single GPU kernel, achieving up to 4.8x speedup and an order-of-magnitude reduction in peak memory usage compared to PyTorch baselines.
Forward Index Compression for Learned Sparse Retrieval
Sebastian Bruch, Martino Fontana, Franco Maria Nardini, Cosimo Rulli, Rossano Venturini
Introduces DotVByte, a compression technique optimized for inner product computation that achieves significant space savings while maintaining sparse retrieval efficiency.
Multivector Reranking in the Era of Strong First-Stage Retrievers
Silvio Martinico, Franco Maria Nardini, Cosimo Rulli, Rossano Venturini
Demonstrates that replacing token-level gatherer phases with learned sparse retrieval achieves over 24x speedup over state-of-the-art multivector retrieval systems.
Under review at the Journal of the ACMSparse RetrievalSketchingInverted Index
Efficient Sketching and Nearest Neighbor Search Algorithms for Sparse Vector Sets
Sebastian Bruch, Franco Maria Nardini, Cosimo Rulli, Rossano Venturini
Introduces theoretically-grounded sketching algorithm to reduce effective dimensionality while preserving inner product-induced ranks, and shows its link with the Seismic data structure.
A Rust-based ANN research library combining state-of-the-art indexing for dense and sparse vectors with vector quantization, designed for easy prototyping.
Investigating the Scalability of Approximate Sparse Retrieval Algorithms to Massive Datasets
Sebastian Bruch, Franco Maria Nardini, Cosimo Rulli, Rossano Venturini, Leonardo Venuta
Compares graph-based and inverted index-based sparse retrieval methods on the 138M-passage MS MARCO v2 dataset, uncovering scalability challenges and efficiency trade-offs.
Effective Inference-Free Retrieval for Learned Sparse Representations
Franco Maria Nardini, Thong Nguyen, Cosimo Rulli, Rossano Venturini, Andrew Yates
Proposes Li-LSR, which replaces the query encoder with a fast lookup table by learning a static relevance score per token at training time, achieving state-of-the-art inference-free sparse retrieval and surpassing SPLADE-v3-Doc by 1 mRR@10 point on MsMarco and 1.8 nDCG@10 points on BEIR.
Pairing Clustered Inverted Indexes with k-NN Graphs for Fast Approximate Retrieval over Learned Sparse Representations
Sebastian Bruch, Franco Maria Nardini, Cosimo Rulli, Rossano Venturini
Enhances Seismic with k-NN graph integration and a clustering hypothesis, achieving nearly 2.2x speedup over standard Seismic while maintaining accuracy.
Efficient Inverted Indexes for Approximate Retrieval over Learned Sparse Representations
Sebastian Bruch, Franco Maria Nardini, Cosimo Rulli, Rossano Venturini
Presents Seismic, a novel inverted index organization that enables fast retrieval over learned sparse embeddings, competitive with dense retrieval on BigANN benchmarks.