Semantic Search with Vector Embeddings: Implementation Using FAISS and Annoy – Neuronix Technology LLC

Semantic search uses vector embeddings to retrieve information based on the meaning of queries and documents, rather than simple keyword matching. Tools like FAISS (Facebook AI Similarity Search) and Annoy (Approximate Nearest Neighbors Oh Yeah) are popular libraries for efficient similarity search in high-dimensional embedding spaces.

This guide explores the implementation of semantic search using both FAISS and Annoy.

Why Use Semantic Search?

Improved Relevance:

Retrieves documents or items based on meaning, not just keywords.

Scalability:

Handles millions of vectors with approximate nearest neighbor (ANN) algorithms.

Versatility:

Applicable to diverse use cases like text retrieval, product recommendations, and image search.

Tools Overview

Tool	Description	Best For
FAISS	Optimized for large-scale similarity search with GPU acceleration.	Large datasets and GPU-based acceleration.
Annoy	Uses random projection trees for fast approximate nearest neighbor search.	Smaller datasets or scenarios requiring fast setup and lightweight indexing.

Pipeline Overview

Generate Embeddings:

Convert text or data into dense vector representations using models like Sentence Transformers or OpenAI Embeddings.

Build an Index:

Use FAISS or Annoy to create an index for the embeddings.

Perform Search:

Search the index to retrieve the nearest vectors to a query embedding.

1. Semantic Search with FAISS

Installation

Install FAISS:

pip install faiss-cpu
# For GPU support:
# pip install faiss-gpu

Code Implementation

a. Generate Embeddings

Use a pre-trained model to generate embeddings (e.g., Sentence Transformers):

from sentence_transformers import SentenceTransformer

# Load pre-trained model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Sample data
documents = [
    "The Eiffel Tower is located in Paris.",
    "The Colosseum is in Rome.",
    "The Great Wall of China is in Beijing."
]

# Generate embeddings
embeddings = model.encode(documents)

b. Build FAISS Index

Create and populate a FAISS index:

import faiss
import numpy as np

# Convert embeddings to a NumPy array
embedding_dim = embeddings.shape[1]
embeddings = np.array(embeddings).astype('float32')

# Initialize FAISS index
index = faiss.IndexFlatL2(embedding_dim)  # L2 distance (Euclidean)

# Add embeddings to the index
index.add(embeddings)
print(f"Number of vectors in the index: {index.ntotal}")

c. Perform Search

Query the index:

# Query text
query = "Where is the Eiffel Tower?"
query_embedding = model.encode([query]).astype('float32')

# Search for the nearest neighbors
k = 2  # Number of results to retrieve
distances, indices = index.search(query_embedding, k)

# Print results
print("Top results:")
for i, idx in enumerate(indices[0]):
    print(f"{i+1}: {documents[idx]} (Distance: {distances[0][i]:.2f})")

2. Semantic Search with Annoy

Installation

Install Annoy:

pip install annoy

Code Implementation

a. Generate Embeddings

(Use the same embedding generation as above.)

b. Build Annoy Index

Create and populate an Annoy index:

from annoy import AnnoyIndex

# Initialize Annoy index
embedding_dim = embeddings.shape[1]
index = AnnoyIndex(embedding_dim, metric='angular')  # Angular distance (cosine similarity)

# Add embeddings to the index
for i, embedding in enumerate(embeddings):
    index.add_item(i, embedding)

# Build the index
n_trees = 10  # Number of trees (higher = more accurate but slower)
index.build(n_trees)
index.save('annoy_index.ann')

c. Perform Search

Query the Annoy index:

# Load the index (if saved previously)
index.load('annoy_index.ann')

# Query text
query = "Where is the Eiffel Tower?"
query_embedding = model.encode([query])[0]

# Search for the nearest neighbors
k = 2  # Number of results to retrieve
indices, distances = index.get_nns_by_vector(query_embedding, k, include_distances=True)

# Print results
print("Top results:")
for i, idx in enumerate(indices):
    print(f"{i+1}: {documents[idx]} (Distance: {distances[i]:.2f})")

Comparison of FAISS and Annoy

Aspect	FAISS	Annoy
Accuracy	High (especially with GPU acceleration).	Approximate but adjustable with `n_trees`.
Speed	Faster for large datasets with GPU support.	Faster for smaller datasets with fewer vectors.
Index Size	Optimized, compact for memory usage.	Larger index due to tree-based structure.
Ease of Use	Slightly steeper learning curve.	Easy to implement and deploy.
Best Use Case	Large-scale, high-performance applications.	Lightweight, quick setup for smaller projects.

Tips for Effective Semantic Search

Choose the Right Embedding Model:

Use pre-trained models like Sentence Transformers (all-MiniLM-L6-v2) for general-purpose tasks.
Fine-tune models for domain-specific data.

Optimize Index Parameters:

For FAISS: Experiment with clustering-based indices (e.g., IndexIVF for faster searches).
For Annoy: Increase n_trees to improve accuracy.

Normalize Embeddings:

Normalize vectors to ensure consistency, especially when using cosine similarity.

Handle Large Datasets:

For FAISS, use GPU support to scale to millions of vectors.
For Annoy, shard the index if memory becomes a bottleneck.

Combine with Metadata:

Enhance search results by combining vector embeddings with metadata filters (e.g., tags, categories).

Conclusion

FAISS is ideal for large-scale, high-accuracy applications, especially when GPU acceleration is available.
Annoy is lightweight and well-suited for smaller datasets or scenarios requiring quick setup.