# AI Search Block

AI-powered search block for Drupal that provides intelligent, conversational answers to user queries using RAG (Retrieval Augmented Generation) technology.

## Overview

This module combines vector database search with Large Language Models (LLMs) to deliver contextual, AI-generated responses based on your site's content. Users can ask natural language questions and receive streaming responses powered by content retrieved from your Drupal site.

## Features

- **Natural Language Search**: Users can ask questions in plain language
- **Streaming Responses**: Real-time, streaming AI-generated answers
- **Vector Database Integration**: Uses Search API with vector embeddings for semantic search
- **Flexible Content Rendering**: Support for both chunked and full entity render modes
- **Customizable Prompts**: Full control over system prompts and response formatting
- **Access Control**: Optional access checking for search results
- **Extensible Hooks**: Multiple alter hooks for customizing HTML, markdown, and prompts
- **Logging Support**: Optional logging module for tracking queries and responses

## Requirements

- Drupal 10.2+ or 11.x
- **AI module** (`drupal/ai`) - Version 1.2 or higher, provides LLM integration
- **Search API** (`drupal/search_api`) - Version 1.39 or higher, for content indexing
- **Vector database backend** - Such as:
  - Pinecone
  - Weaviate
  - ChromaDB
  - Or any Search API compatible vector database
- **league/html-to-markdown** - Version 5.1 or higher (installed automatically via Composer)
- **league/commonmark** - Version 2.6 or higher (installed automatically via Composer)

## Installation

1. Install the module and its dependencies:
   ```bash
   composer require drupal/ai_search_block
   drush en ai_search_block
   ```

2. Configure your AI provider (via the AI module)

3. Set up a Search API index with vector embeddings

4. Configure your vector database backend

## Configuration

### 1. Block Setup

1. Navigate to **Structure > Block layout**
2. Place two blocks on your page:
   - **AI Search Form Block**: The search input form
   - **AI Search Response Block**: The response display area
3. **Important**: Only add ONE response area block per page

### 2. Search Block Configuration

Configure the following in the search block settings:

#### Basic Settings
- **Search API Index**: Select your configured vector database index
- **AI Model**: Choose your LLM provider and model
- **Render Mode**:
  - `chunks`: Use pre-chunked content from vector database
  - `node`: Render full entities with complete URLs and metadata

#### Search Parameters
- **Max Results**: Maximum number of results to retrieve
- **Min Results**: Minimum results required to generate a response
- **Score Threshold**: Minimum relevance score for including results

#### Prompt Configuration
- **System Prompt**: Define how the AI should respond
- **Aggregated Prompt Template**: Template for combining search results
- **No Results Message**: Message shown when no relevant content is found
- **Blocked Words**: Optional list of blocked terms

#### Advanced Options
- **Streaming**: Enable/disable response streaming
- **Temperature**: Control AI creativity (0.0 - 1.0)
- **View Mode**: Drupal view mode for rendering full entities
- **Query Prefix**: Optional query transformation template

### Available Tokens

Use these tokens in your prompt templates:

- `[question]` - The user's search query
- `[entity]` - The retrieved content (formatted per render mode)
- `[is_logged_in]` - User authentication status
- `[user_roles]` - Current user's roles
- `[user_id]` - Current user ID
- `[user_name]` - Current user display name
- `[user_language]` - User's preferred language
- `[user_timezone]` - User's timezone
- `[page_path]` - Current page path
- `[page_language]` - Current page language
- `[site_name]` - Site name
- `[time_now]` - Current time
- `[date_today]` - Today's date
- `[date_tomorrow]` - Tomorrow's date
- `[date_yesterday]` - Yesterday's date

## Render Modes

### Chunks Mode
Uses pre-chunked content from the vector database. Content is already split into smaller segments optimized for vector search.

### Full Entity Mode (Node)
Renders complete entities with full context. Each entity is formatted as:

```
>>>>>> BEGIN ENTITY {ID} <<<<<<<<
ENTITY_URL:  https://example.com/node/123

ENTITY CONTENT:

[Full rendered content in markdown]

>>>>>> END ENTITY {ID} <<<<<<<<
```

This provides the AI with complete entity URLs for citation and full content context.

## Usage

1. Users type a natural language question into the search form
2. The module performs a vector similarity search
3. Relevant content is retrieved and formatted
4. Content is sent to the LLM with the configured prompt
5. AI-generated response streams back to the response area

## Extending the Module

### Available Hooks

The module provides three alter hooks for customization:

#### 1. Alter HTML Before Markdown Conversion

Modify the rendered HTML before it's converted to markdown:

```php
/**
 * Implements hook_ai_search_block_entity_html_alter().
 */
function mymodule_ai_search_block_entity_html_alter(&$rendered_entity, $entity) {
  // Remove HTML comments
  $rendered_entity = preg_replace('/<!--(.|\s)*?-->/', '', $rendered_entity);

  // Remove select elements and their options
  $rendered_entity = preg_replace(
    '/(<select[\s\w*\-*\=\"]*>.*<\/select>)/gim',
    '',
    $rendered_entity
  );

  // Clean up excessive whitespace
  $rendered_entity = preg_replace('/\s\s+/', ' ', $rendered_entity);
}
```

#### 2. Alter Markdown Before Prompt

Modify the markdown content before it's included in the prompt:

```php
/**
 * Implements hook_ai_search_block_entity_markdown_alter().
 */
function mymodule_ai_search_block_entity_markdown_alter(&$markdown, $entity) {
  // Remove excessive empty lines
  $lines = explode(PHP_EOL, $markdown);
  $newlines = [];
  $prev = NULL;

  foreach ($lines as $line) {
    $newline = trim($line, "\t");
    // Skip duplicate empty lines
    if ($prev === $newline && $newline === '') {
      continue;
    }
    $newlines[] = $newline;
    $prev = $newline;
  }

  $markdown = implode(PHP_EOL, $newlines);

  // Add custom entity metadata
  if ($entity->hasField('field_custom_info')) {
    $info = $entity->get('field_custom_info')->value;
    $markdown = "**Info:** {$info}\n\n" . $markdown;
  }
}
```

#### 3. Alter the Final Prompt

Modify the complete prompt before sending to the LLM:

```php
/**
 * Implements hook_ai_search_block_prompt_alter().
 */
function mymodule_ai_search_block_prompt_alter(string &$prompt) {
  // Add custom tokens
  $current_time = date('H:i:s');
  $prompt = str_replace('[current_time]', $current_time, $prompt);

  // Add context-specific instructions
  if (str_contains($prompt, 'pricing')) {
    $prompt .= "\n\nIMPORTANT: Always mention that prices may vary by region.";
  }

  // Add citation requirements
  $prompt .= "\n\nWhen answering, cite specific entity URLs when referencing information.";
}
```

### Theming

The module provides two theme templates for customization:

- `ai-search-block-wrapper.html.twig` - The search form wrapper
- `ai-search-block-response.html.twig` - The response display area

Copy these templates to your theme and customize as needed.

## Logging

Install the optional `ai_search_block_log` submodule to track:
- User queries
- Generated prompts
- AI responses
- Retrieved entity IDs
- Response scoring data

This is useful for:
- Improving prompt engineering
- Understanding user needs
- Debugging search relevance
- Compliance and auditing


## Related Modules

- **AI** (`drupal/ai`) - Required for LLM integration
- **Search API** - Required for content indexing
- **ai_search_block_log** - Optional logging submodule

## Support

- Issue queue: https://www.drupal.org/project/issues/ai_search_block
- Documentation: https://www.drupal.org/docs/contributed-modules/ai-search-block

## License

GPL-2.0-or-later


