# AI Search

AI Search provides a powerful Search API backend implementation that leverages
vector databases and AI embeddings to enable semantic search capabilities in
Drupal. Unlike traditional keyword-based search, AI Search understands the
meaning and context of content, delivering highly relevant results based on
conceptual similarity rather than exact word matching.

For a full description of the module, visit the
[project page](https://www.drupal.org/project/ai_search).

Submit bug reports and feature suggestions, or track changes in the
[issue queue](https://www.drupal.org/project/issues/ai_search).


## Table of contents

- Requirements
- Recommended modules
- Installation
- Configuration
- Features
- Advanced usage
- Troubleshooting
- Maintainers


## Requirements

This module requires the following:

### Drupal modules
- **AI** (ai:ai ^2.0) - Provides embedding generation and vector database
  provider infrastructure
- **Search API** (search_api:search_api >=8.x-1.40) - Core search indexing and
  query framework

### PHP libraries
- **league/html-to-markdown** (^5.1) - Converts HTML content to markdown format
  for improved embedding quality

### External services
- **AI Provider** - Access to an AI service that provides embedding generation
  (e.g., OpenAI, Anthropic, local models via Ollama)
- **Vector Database** - A vector database for storing and searching embeddings
  (e.g., Pinecone, Qdrant, Chroma, Elasticsearch with vector support, or
  MySQL with vector extension)

### System requirements
- Drupal 10.4+ or Drupal 11+
- PHP 8.1 or higher


## Recommended modules

- **league/commonmark** - Enables preview of chunked content when configuring
  fields and formats messages in AI Chatbot module
- **Search API Solr** (search_api_solr ^4.0) - Enables hybrid search by
  combining traditional Solr search with AI-powered semantic search


## Installation

Download the AI Search module using Composer:

```bash
composer require drupal/ai_search
```

Then install it:

```bash
drush en ai_search -y
```

Or through the Drupal admin interface at `/admin/modules`.

**Important**: During installation, the module extends the Search API's
`search_api_item` database table with fields to track content chunking. If you
uninstall the module, these fields will be automatically removed.


## Configuration

### Step 1: Configure an AI Provider

Before creating a search index, you need to configure an AI provider that will
generate embeddings for your content.

1. Navigate to **Configuration > AI > Providers** (`/admin/config/ai/providers`)
2. Click **Add provider**
3. Select your AI service (e.g., OpenAI, Anthropic, local models)
4. Configure the provider with your API credentials
5. Test the connection and save

### Step 2: Configure a Vector Database Provider

Configure where your embeddings will be stored:

1. Navigate to **Configuration > AI > VDB Providers**
   (`/admin/config/ai/vdb_providers`)
2. Click **Add VDB provider**
3. Select your vector database service
4. Configure connection settings (database name, collection/index name, etc.)
5. Choose the distance metric (cosine, euclidean, etc.)
6. Save the configuration

### Step 3: Create a Search API Server

1. Navigate to **Configuration > Search and metadata > Search API**
   (`/admin/config/search/search-api`)
2. Click **Add server**
3. Enter a server name
4. Select **AI Search** as the backend
5. Configure the server:
   - **Vector Database**: Select your configured VDB provider
   - **Embeddings Engine**: Choose the AI provider and model for generating
     embeddings
   - **Embedding Dimensions**: Set manually or leave blank for automatic
     detection
   - **Embedding Strategy**: Choose how content is processed:
     - **Contextual Chunks**: Splits content into chunks with contextual
       information (recommended)
     - **Average Pool**: Creates a single averaged embedding per item
   - **Strategy Configuration**: Set chunk size, overlap, and contextual
     content percentage
6. Save the server

### Step 4: Create a Search Index

1. Click **Add index** on the Search API page
2. Enter an index name and select the content types to index
3. Select your AI Search server
4. Configure the index tracker - the **AI Search Tracker** will be
   automatically selected for AI Search servers
5. Save the index

### Step 5: Configure Index Fields

The AI Search module provides specialized field indexing options:

1. Click **Fields** tab on your index
2. Add the fields you want to index
3. For each field, select an indexing option:
   - **Main Content**: Content to be chunked and embedded (e.g., body fields,
     full text)
   - **Contextual Content**: Added to every chunk for additional context
     (e.g., title, taxonomy terms)
   - **Filterable Attribute**: Stored as metadata for filtering results
     (e.g., content type, author, dates)
   - **Not Indexed**: Excluded from the vector database

4. Configure advanced options:
   - **Maximum field length**: Control metadata field length to optimize storage
   - **Exclude chunks from metadata**: Don't include individual chunk text in
     metadata
   - **Exclude title from chunk metadata**: Prevent title duplication if
     already in contextual content

5. Save the field configuration

### Step 6: Index Your Content

Index content using one of these methods:

```bash
# Index all items
drush search-api:index

# Index a specific index
drush search-api:index [index_id]

# Clear and re-index
drush search-api:clear [index_id]
drush search-api:index [index_id]
```

Or use the admin interface at **Configuration > Search and metadata > Search
API** and click **Index now** on your index.

**Note**: Indexing with AI Search may take longer than traditional search due
to embedding generation and vector database operations. Monitor progress in the
index status display.


## Features

### Semantic Search
AI Search understands meaning and context, not just keywords. It can find
content related to a query even when exact words don't match.

### Content Chunking
Large content is automatically split into manageable chunks, ensuring each
piece is within the token limits of your embedding model while maintaining
context.

### Contextual Enrichment
Each chunk can include repeated contextual information (title, categories,
etc.) to improve search relevance and provide context for standalone chunks.

### Hybrid Search
Combine traditional keyword search (Solr or Database) with AI semantic search
using the **Boost by AI Search** processor:

1. Add the processor to your traditional search index
2. Configure the AI Search index to query
3. Set minimum relevance score threshold
4. AI Search results are prepended to boost highly relevant semantic matches

### Access Control
Entity-level access checking ensures users only see content they have
permission to view, with iterative result fetching for performance.

### RAG (Retrieval Augmented Generation) Integration
Use your search index with AI assistants:

- **RAG Action**: Enables AI assistants to search your content and provide
  informed responses
- **RAG Tool**: Allows AI models to use function calling to query your indexes

### Flexible Result Filtering
- Filter by minimum score threshold
- Filter using VDB metadata attributes
- Entity-level result grouping (when supported by VDB)

### Multiple Embedding Strategies

**Contextual Chunks (Recommended)**:
- Splits content into overlapping chunks
- Enriches each chunk with contextual information
- Creates multiple vectors per content item
- Best for comprehensive content retrieval

**Average Pool**:
- Generates a single averaged embedding
- More efficient for simple use cases
- Single vector per content item


## Advanced usage

### Customizing Chunk Size and Overlap

When configuring your server's embedding strategy, adjust:

- **Chunk Size**: Number of tokens per chunk (default varies by model)
- **Chunk Overlap**: Number of overlapping tokens between chunks
- **Contextual Content Percentage**: Portion of chunk reserved for repeated
  context

Smaller chunks provide more precise results but require more storage. Larger
chunks provide more context but may dilute relevance.

### Accessing Raw Embedding Vectors

Enable **Include raw embedding vector** in your server configuration to retrieve
the actual vector arrays in search results. Useful for custom similarity
calculations or debugging.

### Altering Search Results

Implement `hook_ai_search_boost_results_alter()` to customize result ranking:

```php
/**
 * Implements hook_ai_search_boost_results_alter().
 */
function mymodule_ai_search_boost_results_alter(array &$results, $keywords, $ai_index, $target_index) {
  // Re-rank, filter, or modify results
  foreach ($results as &$result) {
    // Boost results from specific content types
    if ($result->getField('type')->getValues()[0] === 'article') {
      $result->setScore($result->getScore() * 1.5);
    }
  }
}
```

### Using with Views

Create Views using your Search API index as the data source. AI Search works
seamlessly with Views filters, sorts, and displays.

### Metadata Filtering

Configure fields as **Filterable Attributes** to enable VDB-level filtering.
This is more efficient than post-query filtering for large result sets.

### Score Thresholds

Add the **Score Threshold** processor to your index to automatically filter
out low-relevance results. Configure the minimum score (0.0 to 1.0) in the
processor settings.

Override the score threshold via the Search API query, e.g.:

```php
/** @var \Drupal\search_api\Query\QueryInterface $query */
$query->setOption('ai_search_score_threshold_override', 0.8);
```

### Similarity search

The Search API Query supports receiving `vector_input` instead of keywords to
search for, to allow finding similar entities. Use a module like
[AI Related Content](https://drupal.org/project/ai_related_content) or do this
programmatically for example:

```php
// Get source vectors for a node.
/** @var \Drupal\search_api\Query\QueryInterface $query */
$query = $index->query();
$query->addCondition('drupal_entity_id', 'entity:node/123:en');
$results = $query->execute();
$source_vectors = $results->getResults()[0]->getExtraData('raw_vector');

// Find similar nodes.
$query = $index->query();
$query->setOption('vector_input', $source_vectors);
$results = $query->execute();
```

## Troubleshooting

### Indexing is slow
- Embedding generation requires API calls to your AI provider
- Consider indexing during off-peak hours
- Check your AI provider's rate limits
- Ensure your VDB provider has adequate performance capacity

### Results not appearing
- Verify entity access permissions
- Check the minimum score threshold isn't too high
- Ensure content was successfully indexed (check Search API status)
- Review VDB provider logs for errors

### Out of memory errors
- Reduce chunk size in embedding strategy configuration
- Limit the number of items indexed at once
- Increase PHP memory limit if necessary

### Unexpected or poor quality results
- Verify field configuration (Main Content vs Contextual Content)
- Review your embedding model choice (larger models often perform better)
- Adjust chunk size and overlap settings
- Ensure HTML content is being properly converted to markdown
- Consider the quality and completeness of your indexed content

### Chunk tracking issues
- The AI Search Tracker must be enabled on AI Search indexes
- Run update hooks if upgrading from an earlier version
- Clear and re-index if chunk counts appear incorrect

### Integration with traditional search
- Ensure the **Boost by AI Search** processor is properly configured
- Verify the AI Search index referenced exists and is indexed
- Check processor weight/order in the processor configuration


## Maintainers

- Scott Euser - [scott_euser](https://www.drupal.org/u/scott_euser)
- David Galeano - [gxleano](https://www.drupal.org/u/gxleano)
- Marcus Johansson - [marcus_johansson](https://www.drupal.org/u/marcus_johansson)
- Andrew Belcher - [andrewbelcher](https://www.drupal.org/u/andrewbelcher)
- Kevin Quillen - [kevinquillen](https://www.drupal.org/u/kevinquillen)
- Michal Gow - [seogow](https://www.drupal.org/u/seogow)
- James Abrahams - [yautja_cetanu](https://www.drupal.org/u/yautja_cetanu)
- AI Module Team - [AI Project](https://www.drupal.org/project/ai)

This module is part of the AI module suite. For support, documentation, and
updates, visit the [AI project page](https://www.drupal.org/project/ai).

For comprehensive documentation, see the
[AI Search module guide](https://project.pages.drupalcode.org/ai/latest/modules/ai_search/).
