# OpenSearch NLP

Provides Natural Language Processing capabilities for OpenSearch, enabling semantic search, hybrid search, and ML-powered query understanding.

## Features

- **Semantic Search**: Vector-based search using text embeddings
- **Hybrid Search**: Combines keyword matching with semantic search
- **Semantic Caching**: Intelligent query caching based on similarity
- **ML Model Management**: Deploy and manage OpenSearch ML models
- **External Model Support**: Connect to OpenAI and other ML services

## Requirements

- Drupal 10 or 11
- `search_api_opensearch` module
- OpenSearch server 2.19 with ML Commons plugin ([Documentation](https://docs.opensearch.org/2.19))
- PHP 8.1+

**Important**:
- This module currently supports OpenSearch version 2.19. Other versions may have compatibility issues with ML Commons features.
- You can use OpenSearch pretrained models for semantic search. See [OpenSearch Pretrained Models](https://docs.opensearch.org/2.19/ml-commons-plugin/pretrained-models/) for available models and their configurations.

## Installation

```bash
composer require drupal/opensearch_nlp
drush en opensearch_nlp
```

### Required Patch for OpenSearch PHP Client 2.19 version

This module requires a patch to the `opensearch-project/opensearch-php` library to support ML model search queries and automatic KNN index creation. Add the following to your `composer.json`:

```json
{
  "extra": {
    "patches": {
      "opensearch-project/opensearch-php": {
        "Support POST method for ML model search and auto-enable KNN on index creation": "patches/opensearch-php-ml-model-search-knn.patch"
      }
    }
  }
}
```

**Patch contents** (`patches/opensearch-php-ml-model-search-knn.patch`):

```diff
diff --git a/src/OpenSearch/Endpoints/Ml/SearchModels.php b/src/OpenSearch/Endpoints/Ml/SearchModels.php
index 25d7e06b..868d295c 100644
--- a/src/OpenSearch/Endpoints/Ml/SearchModels.php
+++ b/src/OpenSearch/Endpoints/Ml/SearchModels.php
@@ -40,7 +40,7 @@ class SearchModels extends AbstractEndpoint

     public function getMethod(): string
     {
-        return 'GET';
+        return isset($this->body) ? 'POST' : 'GET';
     }

     public function setBody($body): static

diff --git a/src/OpenSearch/Namespaces/IndicesNamespace.php b/src/OpenSearch/Namespaces/IndicesNamespace.php
index 7d6b5ca5..fbbbe40b 100644
--- a/src/OpenSearch/Namespaces/IndicesNamespace.php
+++ b/src/OpenSearch/Namespaces/IndicesNamespace.php
@@ -209,6 +209,19 @@ class IndicesNamespace extends AbstractNamespace
      */
     public function create(array $params = [])
     {
+        // Ensure 'body' exists and is an array
+        if (!isset($params['body']) || !is_array($params['body'])) {
+            $params['body'] = [];
+        }
+        // Ensure 'settings' and 'index' keys exist
+        if (!isset($params['body']['settings']) || !is_array($params['body']['settings'])) {
+            $params['body']['settings'] = [];
+        }
+        if (!isset($params['body']['settings']['index']) || !is_array($params['body']['settings']['index'])) {
+            $params['body']['settings']['index'] = [];
+        }
+        // Set knn to true
+        $params['body']['settings']['index']['knn'] = true;
         $index = $this->extractArgument($params, 'index');
         $body = $this->extractArgument($params, 'body');
```

**What this patch does:**
1. **ML Model Search**: Allows POST method for `SearchModels` endpoint when a body is present (required for filtering models by query)
2. **Auto-enable KNN**: Automatically enables KNN (K-Nearest Neighbors) on index creation, which is required for vector search functionality

## Setup

### 1. Add Vector Field to Index

1. Go to your Search API index: `/admin/config/search/search-api/index/{index_id}/fields`
2. Add a new field:
   - **Type**: String
   - **Data Type**: **Opensearch Text embedding Vector**
   - **Name**: Based on your mapping field (e.g., `title` → `title_embedding`)
3. Save the index

### 2. Configure NLP Settings

1. Go to `/admin/config/search/opensearch-nlp`
2. Enable NLP and set model configuration
3. Under "Index-specific NLP settings":
   - Enable NLP for your index
   - Set mapping-embedding pairs: `title|title_embedding`
   - Set pipeline IDs, search type, etc.
4. Save configuration (this creates model and pipelines automatically)

### 3. Set Vector Dimension

1. Go to `/admin/config/search/opensearch-vector-nlp-config-form`
2. Set dimension based on your model:
   - **768**: HuggingFace models (default)
   - **1536**: OpenAI ada-002
3. Save configuration

### 4. Apply Settings to Index

1. Go back to your Search API index
2. **Re-save the index** (this applies NLP settings to field mappings and index settings)

### 5. Re-index Content

```bash
drush search-api:reindex {index_id}
drush search-api:index {index_id}
```

### 6. Multi-Index Search (Optional)

For multi-index search to work:
1. Create a multi-index search pipeline at `/admin/config/search/opensearch-search-pipeline-config-form`
2. Set pipeline ID as `multi_index_hybrid_search_pipeline`
3. Configure universal mapping and embedding fields in NLP settings

## Configuration

### Search Types
- `keyword`: Standard search (default)
- `semantic`: Pure vector search
- `hybrid`: Combined approach (recommended)
- `script_score`: Custom KNN scoring

### Index Configuration Example
```yaml
my_index:
  enable_nlp: true
  mapping_embedding_pairs: 'title|title_embedding'
  ingestion_pipeline_id: 'my-ingestion-pipeline'
  search_pipeline_id: 'my-search-pipeline'
  search_type: 'hybrid'
  pagination_depth: 20
  nearest_neighbors: 50
```

## Usage

### Programmatic Search
```php
$nlpSearch = \Drupal::service('opensearch_nlp.nlp_search');
$results = $nlpSearch->search('my_index', 'search query', 0, 10);
```

### Available Services
- `opensearch_nlp.nlp_ingestion` - Model/pipeline management
- `opensearch_nlp.nlp_search` - Search functionality
- `opensearch_nlp.semantic_cache` - Query caching

## Configuration Management

All settings are exportable via Drupal's configuration system:

```bash
drush config:export
drush config:import
```

Config files:
- `opensearch_nlp.nlp_settings.yml`
- `opensearch_nlp.vector_settings.yml`
- `opensearch_nlp.search_pipeline_settings.yml`
- `opensearch_nlp.semantic_cache_settings.yml`

### Environment-Specific Configuration

You can override configuration per environment using `settings.php` or `settings.local.php`. This is useful for dev/stage/prod environments with different models:

```php
// settings.php (or settings.local.php, settings.prod.php, etc.)

// Enable/disable NLP
$settings['opensearch_nlp.enable_nlp'] = TRUE;

// External model configuration
$settings['opensearch_nlp.is_externally_hosted_model'] = FALSE;
$settings['opensearch_nlp.connector_id'] = '';

// Model type (local, openai, rag, aws)
$settings['opensearch_nlp.model_type'] = 'local';

// Override model path per environment
$settings['opensearch_nlp.model_path'] = getenv('OPENSEARCH_MODEL_PATH') ?: 'huggingface/sentence-transformers/all-MiniLM-L6-v2';

// Override model version
$settings['opensearch_nlp.model_version'] = getenv('OPENSEARCH_MODEL_VERSION') ?: '1.0.2';

// Override model description
$settings['opensearch_nlp.model_description'] = 'Production semantic search model';

// Override model group
$settings['opensearch_nlp.model_group'] = 'prod_nlp_models';
$settings['opensearch_nlp.model_group_description'] = 'Production NLP model group';

// Override model format
$settings['opensearch_nlp.model_format'] = 'TORCH_SCRIPT';
```

**How it works:**
1. The form checks `settings.php` overrides first
2. If no override exists, falls back to configuration values
3. UI shows the effective value (from settings.php or config)
4. Form saves still update config (but settings.php takes precedence)

**Example per-environment setup:**

```php
// Stage environment (settings.stage.php)
$settings['opensearch_nlp.model_path'] = 'huggingface/sentence-transformers/msmarco-distilbert-base-tas-b';
$settings['opensearch_nlp.model_version'] = '1.0.1';
$settings['opensearch_nlp.model_group'] = 'stage_nlp_models';

// Production environment (settings.prod.php)
$settings['opensearch_nlp.model_path'] = 'huggingface/sentence-transformers/all-mpnet-base-v2';
$settings['opensearch_nlp.model_version'] = '1.0.2';
$settings['opensearch_nlp.model_group'] = 'prod_nlp_models';
```

This allows you to:
- Use different models per environment
- Test new models in stage before prod
- Keep sensitive config out of version control
- Use environment variables from CI/CD

## Testing

```bash
# All tests
vendor/bin/phpunit web/modules/custom/opensearch_nlp

# Kernel tests only (recommended)
vendor/bin/phpunit web/modules/custom/opensearch_nlp/tests/src/Kernel
```
