# Schema Metatag AI

AI-powered automatic generation of Schema.org structured data for Drupal metatags.

## Overview

This module extends the Schema Metatag module by providing AI-powered automatic generation of schema metadata. It **dynamically discovers** available schema types from your enabled schema_metatag submodules and generates appropriate structured data based on your content using **custom AI prompts with type-specific instructions**.

## Key Features

### 1. Dynamic Schema Type Discovery & Auto-Configuration
- **No Hard-coded Schema Types**: Automatically discovers all enabled schema types from your schema_metatag installation
- **Auto-configures** on cache clear: Custom prompt templates are generated automatically for each discovered schema type
- Works with any schema type: Article, Course, Product, Event, Person, Organization, Place, LocalBusiness, Recipe, Book, VideoObject, HowTo, FAQPage, WebPage, JobPosting, and more
- Only shows schema types that are actually installed and enabled on your site

### 2. Intelligent AI Prompts
- **Auto-generated custom prompts** for each schema type with type-specific instructions
- **Content relevance checking**: AI determines if content is actually relevant to the schema type before generating data
- **Anti-hallucination safeguards**: Explicit instructions to only extract information present in the content
- References official Schema.org documentation URLs for each type
- Prompts are fully customizable via the admin UI

### 3. Multi-Schema Type Support
- **Generates data for all enabled schema types** on a content type simultaneously
- One button click processes Article, Course, Person, Place, and any other enabled schemas
- Smart filtering: Skips schemas that aren't relevant to the content

### 4. Flexible Field Mapping
- **Automatically maps** AI-generated properties to schema form fields
- **Generic mapping algorithm** converts property names (e.g., "courseCode" → "course-code")
- Tries multiple selector patterns to find the correct form field
- Works with any schema type without hardcoding field mappings
- Handles nested objects (Person, Organization, Place) automatically

### 5. Smart Form Integration
- **"Generate Schema Metatag" button** automatically appears in form actions for configured content types
- Only appears when user has appropriate permissions
- Works with any content type that has schema metatag fields
- Extracts data from all node fields automatically

### 6. Enhanced Schema Data
- Automatically adds **@context** ("https://schema.org")
- Automatically generates **@id** with full entity URL (including path aliases)
- Proper Schema.org type values (@type field)
- Handles nested objects and complex data types

### 7. Custom Configuration (Optional)
Map Drupal fields to schema properties for more accurate data extraction:

```yaml
field_mappings:
  courseCode: 'field_ucas_code'        # Extract courseCode from field_ucas_code
  educationalLevel: 'field_level'      # Extract educationalLevel from field_level
  teaches: 'field_learning_outcomes'   # Extract teaches from field_learning_outcomes
  timeRequired: 'field_duration'       # Extract timeRequired from field_duration
  provider: 'field_provider'           # Extract provider from field_provider
```

Configure custom schema form field IDs if the auto-generated ones don't match:

```yaml
schema_field_mappings:
  name: 'schema-course-name'
  courseCode: 'schema-course-course-code'
```

## Requirements

- Drupal 10+
- Schema Metatag module and desired submodules (e.g., schema_article, schema_course, schema_product)
- AI module (for AI provider integration)
- An AI provider (e.g., OpenAI with API key)

## Installation

1. Install the module: `composer require drupal/schema_metatag_ai` (or place in modules/custom)
2. Enable the module: `drush en schema_metatag_ai`
3. Configure your AI provider in the AI module settings
4. Configure Schema Metatag AI settings at: `/admin/config/search/schema-metatag-ai`

## Quick Start

### 1. Configure AI Provider
Navigate to `/admin/config/search/schema-metatag-ai`:
- **Select Content Types**: Choose which content types should have AI generation (leave empty to enable for all)
- **AI Provider**: Select your AI provider (e.g., OpenAI)
- **AI Model**: Specify the model (e.g., gpt-4, gpt-3.5-turbo)
- Review available schema types (automatically discovered from enabled modules)
- Click on individual schema type links to customize prompts and field mappings (optional)

### 2. Clear Cache to Auto-Configure
After enabling schema_metatag submodules, clear cache to auto-configure them:
```bash
drush cr
```

This automatically:
- Discovers all enabled schema types
- Generates custom prompt templates with type-specific instructions
- Creates default configuration for each schema type

### 3. Use the Feature
1. Edit or create a node with schema metatag fields
2. Fill in your content (title, body, custom fields)
3. Click **"Generate Schema Metatag"** button in the form actions
4. Review the AI-generated schema values in all schema tabs
5. Make any adjustments needed
6. Save

## How It Works

The module works in five automated steps:

1. **Discovery**: Dynamically discovers all enabled schema types from your schema_metatag modules
2. **Auto-Configuration**: On cache clear, generates custom AI prompts with type-specific instructions for each schema type
3. **Content Extraction**: Extracts content from all node fields (title, body, custom fields)
4. **AI Generation**:
   - Sends content to AI with custom prompts for each enabled schema type
   - AI checks content relevance to each schema type
   - AI extracts only information explicitly present in the content (no hallucination)
   - Generates valid Schema.org JSON for relevant schemas
5. **Smart Mapping**:
   - Automatically maps AI response properties to schema form fields using generic algorithm
   - Tries multiple selector patterns to find correct fields
   - Handles nested objects and complex data types
   - Skips schemas where content isn't relevant

## Configuration

### Admin Interface
Navigate to `/admin/config/search/schema-metatag-ai`:

1. **Global Settings**:
   - **Content Types**: Select which content types have AI generation (empty = all)
   - **AI Provider**: Choose your AI provider (e.g., OpenAI, Anthropic)
   - **AI Model**: Specify the model (e.g., gpt-4o, gpt-3.5-turbo, claude-3-5-sonnet)
   - View list of all discovered schema types

2. **Per-Schema Type Configuration** (click on schema type links):
   - **Enable/Disable**: Toggle AI generation for specific schema types
   - **Custom Prompt Template**: Edit the auto-generated prompt or create your own
   - **Field Mappings**: Map Drupal fields to Schema.org properties
   - **Schema Form Field IDs**: Override auto-generated form field IDs if needed

### Auto-Generated Prompt Templates

When you clear cache, the module automatically generates intelligent prompts for each schema type. These prompts include:

- **Relevance checking**: AI first determines if content is relevant to the schema type
- **Type-specific instructions**: Custom guidance for each schema type (Article, Course, Person, etc.)
- **Anti-hallucination safeguards**: Explicit instructions to only extract information present in content
- **Schema.org references**: Links to official documentation
- **Data type guidance**: Instructions for nested objects, arrays, dates, etc.

Example auto-generated prompt structure:
```
You are a Schema.org structured data expert. Generate valid Schema.org JSON-LD markup for a Course type.

Reference: https://schema.org/Course

IMPORTANT: First, determine if the content below is actually about a Course.
If NOT, return ONLY: {"@type": "Course"}

For Course schema, extract ONLY the following if present:
- name (exact course name/title from content)
- description (exact course description from content)
- provider (only if institution/provider name is mentioned)
- courseCode (only if code is explicitly stated)
...

[Additional requirements and content follow]
```

### Custom Prompt Templates with Placeholders

You can customize prompts to reference specific Drupal fields:

```yaml
prompt_template: |
  Generate Course schema using the following information:
  - Course name: [FIELD:title]
  - Course code: [FIELD:field_ucas_code]
  - Level: [FIELD:field_level]
  - Description: [FIELD:body]

  Return valid JSON with @type set to "Course".
  Only include properties for which data is available.
```

### Field Mappings Configuration

Map Drupal fields to Schema.org properties for more accurate extraction:

```yaml
field_mappings:
  name: 'title'                        # Course name from node title
  description: 'body'                  # Course description from body
  courseCode: 'field_ucas_code'        # UCAS code from custom field
  educationalLevel: 'field_level'      # Level from taxonomy/list field
  teaches: 'field_learning_outcomes'   # Learning outcomes
  timeRequired: 'field_duration'       # Duration field
  provider: 'field_institution'        # Institution reference
```

### Schema Form Field Mappings

Override auto-generated form field IDs if they don't match your form structure:

```yaml
schema_field_mappings:
  name: 'schema-course-name'
  courseCode: 'schema-course-course-code'
  educationalLevel: 'schema-course-educational-level'
  description: 'schema-course-description'
```

### Programmatic Configuration

Configuration is stored in `schema_metatag_ai.settings`:

```yaml
schema_metatag_ai:
  content_types:
    - article
    - course
    - page
  ai_provider: 'openai'
  ai_model: 'gpt-4o'
  schema_types:
    course:
      enabled: true
      prompt_template: |
        [Your custom prompt template]
      field_mappings:
        courseCode: 'field_ucas_code'
        educationalLevel: 'field_level'
      schema_field_mappings:
        courseCode: 'schema-course-course-code'
```

## Field Mapping Examples

### Course Schema Example
```yaml
course:
  field_mappings:
    name: 'title'                      # Course title from node title
    description: 'body'                # Course description from body
    courseCode: 'field_ucas_code'      # UCAS code from custom field
    educationalLevel: 'field_level'    # Level from taxonomy field
    teaches: 'field_learning_outcomes' # Learning outcomes
    timeRequired: 'field_duration'     # Duration field
    provider: 'field_institution'      # Institution reference
    keywords: 'field_tags'            # Tags for keywords
```

### Article Schema Example
```yaml
article:
  field_mappings:
    name: 'title'
    description: 'body'
    author: 'field_author'
    publisher: 'field_publisher'
    keywords: 'field_tags'
```

## Prompt Template Placeholders

Use `[FIELD:field_name]` placeholders in prompt templates:

- `[FIELD:title]` - Node title
- `[FIELD:body]` - Node body
- `[FIELD:field_ucas_code]` - Custom field value
- `[FIELD:field_level]` - Taxonomy term label
- `[FIELD:field_author]` - Referenced entity label

## Supported Schema Types

The module **dynamically supports** all schema types that are:
- Installed as schema_metatag submodules
- Enabled on your site

Common schema types include:
- Article, Course, Product, Event
- Person, Organization, Place
- Recipe, Book, Movie, Review
- Web Page, Web Site, How To, QA Page
- Job Posting, Video Object, Service
- And any other schema_metatag submodule you install

## Architecture

### Services

**SchemaTypeDiscoveryService** (`schema_metatag_ai.schema_discovery`)
- Dynamically discovers available schema types from the metatag group plugin manager
- Extracts schema type information (ID, label, Schema.org URL) from plugin definitions
- Provides methods to get available types, check type availability, and get type options
- No hard-coded schema type lists
- Monitors installed schema_metatag submodules

**SchemaFieldMapperService** (`schema_metatag_ai.field_mapper`)
- Generic field mapping service that works with any schema type
- Automatically converts Schema.org property names to form field IDs using multiple patterns
- Tries multiple selector formats to find the correct form field
- Handles nested objects (Person, Organization, Place) and extracts relevant values
- Handles arrays, strings, numbers, and complex data types
- Populates core fields (@type, @id) and all schema-specific properties
- Works without hardcoding field mappings for each schema type

**GenerateSchemaMetatag** (`schema_metatag_ai.generator`)
- Core AI generation service
- Uses SchemaTypeDiscoveryService to get schema type information
- Builds custom prompts with type-specific instructions
- Supports custom prompt templates with [FIELD:field_name] placeholders
- Processes AI responses and validates JSON
- Enhances schema data with @context and @id (full entity URL with alias)
- Extracts field values from entities handling various field types

### Hooks & Form Alter

**hook_modules_installed()**
- Detects when schema_metatag modules are installed
- Triggers auto-configuration for new schema types

**hook_cache_flush()**
- Runs auto-discovery and configuration on cache clear
- Generates custom prompt templates for newly discovered schema types
- Updates configuration automatically

**hook_form_alter()**
- Adds "Generate Schema Metatag" button to node forms
- Checks user permissions (`generate schema metatag`)
- Checks if content type is configured for AI generation
- Verifies schema metatag fields exist on the form
- Adds AJAX callback for button click

**AJAX Callback** (`schema_metatag_ai_generate_submit_form`)
- Extracts field data from form state (title, body, custom fields)
- Detects all enabled schema types on the form
- Processes each schema type through the generator service
- Checks content relevance to each schema type
- Uses field mapper to populate form fields
- Returns success/warning messages for each schema type

### Auto-Configuration Functions

**schema_metatag_ai_discover_and_configure_all_types()**
- Main discovery and configuration function
- Calls SchemaTypeDiscoveryService to get available types
- Triggers auto-configuration for all discovered types

**schema_metatag_ai_auto_configure_schema_types()**
- Creates configuration for schema types that don't have it yet
- Generates custom prompt templates for each type
- Saves configuration to `schema_metatag_ai.settings`

**schema_metatag_ai_generate_prompt_template()**
- Generates intelligent prompts for each schema type
- Includes relevance checking instructions
- Adds type-specific instructions from a comprehensive map
- Includes anti-hallucination safeguards

**schema_metatag_ai_get_type_specific_instructions()**
- Provides custom instructions for 15+ schema types
- Defines which properties to extract for each type
- Includes guidance on data formats and nested objects

### Key Improvements

1. **Dynamic Discovery**: No hard-coded schema type lists, works with any schema_metatag module
2. **Auto-Configuration**: Automatic prompt generation and configuration on cache clear
3. **Intelligent Prompts**: Type-specific instructions with anti-hallucination safeguards
4. **Multi-Schema Support**: Processes all enabled schemas in a single button click
5. **Smart Mapping**: Tries multiple selector patterns to find fields automatically
6. **Relevance Checking**: AI determines if content matches each schema type
7. **Enhanced Data**: Automatic @context and @id with path alias support
8. **Flexible Configuration**: Full control over prompts, field mappings, and form field IDs
9. **Maintainable**: Adding new schema types just requires enabling the module and clearing cache

## Permissions

- `administer schema metatag ai`: Administer module configuration
- `generate schema metatag`: Use the AI generation feature

## Developer Notes

### Adding New Schema Types

**Good News**: You don't need to modify this module to add new schema types!

The module automatically discovers and configures new schema types:

1. **Install** the desired schema_metatag submodule:
   ```bash
   composer require drupal/schema_metatag
   ```

2. **Enable** the submodule:
   ```bash
   drush en schema_job_posting
   ```

3. **Clear cache** to auto-configure:
   ```bash
   drush cr
   ```

The module will:
- Automatically discover the new schema type
- Generate a custom prompt template with type-specific instructions
- Create default configuration
- Make it available for use immediately

Optional customization:
1. Navigate to `/admin/config/search/schema-metatag-ai`
2. Click on the schema type link to customize:
   - Edit the auto-generated prompt template
   - Add field mappings for more accurate data extraction
   - Override schema form field IDs if needed

### Extending the Module

**Custom Field Mapper**:
```php
class MyCustomFieldMapper extends SchemaFieldMapperService {
  protected function applyGenericMapping(AjaxResponse $response, array $result, $schema_type, $elem_id) {
    // Your custom mapping logic
    parent::applyGenericMapping($response, $result, $schema_type, $elem_id);
  }
}
```

**Custom Schema Discovery**:
```php
class MySchemaDiscovery extends SchemaTypeDiscoveryService {
  public function getAvailableSchemaTypes() {
    $types = parent::getAvailableSchemaTypes();
    // Add custom logic
    return $types;
  }
}
```

### Field Value Extraction

The module automatically handles different field types:
- **String/Text fields**: Direct value extraction
- **Entity reference**: Referenced entity label
- **List fields**: Selected value
- **Date fields**: Formatted date string
- **Complex fields**: String conversion

### Troubleshooting

**Button not appearing?**
1. Verify user has `generate schema metatag` permission
2. Confirm content type is selected in module configuration (or leave empty for all)
3. Check that schema metatag fields exist on the content type form
4. Ensure at least one schema_metatag group is enabled (e.g., schema_article)
5. Look for errors in Drupal logs: `drush watchdog:show`

**Fields not populating correctly?**
1. Check browser console for JavaScript errors
2. Review the AJAX response in browser network tab
3. Check Drupal logs for mapping debug messages: `drush watchdog:show --type=schema_metatag_ai`
4. Verify field ID format matches expected pattern (view page source to see actual field IDs)
5. Add custom `schema_field_mappings` in configuration to override field IDs
6. Ensure the AI is returning valid JSON (check logs)

**Schema type not appearing in list?**
1. Ensure the schema_metatag submodule is enabled: `drush pm:list | grep schema`
2. Clear Drupal cache to trigger auto-discovery: `drush cr`
3. Check that the metatag group plugin is properly registered
4. Look for discovery errors in logs

**Content not relevant to schema type warnings?**
This is expected behavior! The module checks if content matches each schema type:
- A Course page won't generate Person or Place schema (intentional)
- Only relevant schema types will be populated
- This prevents generating invalid or inappropriate structured data

**AI generating incorrect data?**
1. Review and customize the auto-generated prompt template for that schema type
2. Add field mappings to guide the AI to the correct Drupal fields
3. Use [FIELD:field_name] placeholders in custom prompts
4. Add more specific instructions in the prompt template
5. Try a more advanced AI model (e.g., gpt-4o instead of gpt-3.5-turbo)

**Configuration not saving?**
1. Check file permissions on `sites/default/files/config_*/sync`
2. Verify user has `administer schema metatag ai` permission
3. Check for PHP errors in Drupal logs
4. Ensure config directory is writable

## Type-Specific Instructions

The module includes built-in type-specific instructions for the following schema types:

### Content Types
- **Article**: Extracts name, description, author, datePublished, dateModified, publisher, image, articleBody
- **Book**: Focuses on title, author, isbn, publisher, pages, format
- **Recipe**: Extracts ingredients, instructions, prep/cook times, yield, nutrition
- **HowTo**: Captures steps, tools, supplies, total time
- **VideoObject**: Gets title, description, thumbnail, duration, upload date, content URL

### People & Organizations
- **Person**: Extracts name, job title, bio, contact info, affiliation, address
- **Organization**: Gets name, description, contact info, address, founding date, social media

### Places & Businesses
- **Place**: Captures name, address, geo coordinates, contact info, opening hours
- **LocalBusiness**: Similar to Place with additional business-specific fields

### Educational & Professional
- **Course**: Extracts course name, code, level, learning outcomes, duration, provider
- **JobPosting**: Gets job title, description, employer, location, employment type, salary

### Events & Reviews
- **Event**: Captures name, dates, location, organizer, performer, offers, status
- **Product**: Extracts name, description, brand, SKU, offers, ratings, reviews
- **Review**: Gets rating, item reviewed, review body

### Web Content
- **WebPage**: Focuses on name, description, URL, breadcrumb, main entity
- **FAQPage**: Structures questions and answers in mainEntity array

All instructions include:
- **Relevance checking**: Only generate if content matches the schema type
- **Anti-hallucination**: Only extract explicitly stated information
- **No invention**: Don't generate data that isn't in the content

## Anti-Hallucination Features

This module implements multiple safeguards to prevent AI hallucination:

1. **Relevance Pre-Check**: AI must first determine if content is relevant to the schema type
2. **Explicit Instructions**: Prompts explicitly state "DO NOT hallucinate, invent, or infer"
3. **Omit Missing Data**: AI is instructed to omit properties if data isn't available
4. **Type-Specific Guidance**: Each schema type has clear rules about what to extract
5. **Field Mappings**: Direct the AI to specific Drupal fields for accurate extraction
6. **Content-Only Rule**: "CRITICAL: Only extract information clearly stated in the content"
7. **Validation**: JSON validation ensures proper structure
8. **Skip Irrelevant Schemas**: Module skips population if only @type is returned

Example safeguards in prompts:
```
IMPORTANT: First, determine if the content below is actually about a [Type].
If NOT, return ONLY: {"@type": "[Type]"}

Requirements:
- Populate properties ONLY using information explicitly provided
- DO NOT hallucinate, invent, or infer information
- If a property value is not available, omit that property entirely
- CRITICAL: Only extract information clearly stated in the content
```

## Summary

Schema Metatag AI is a powerful, flexible module that:

1. **Works with any schema type** through dynamic discovery
2. **Auto-configures** with intelligent prompts on cache clear
3. **Prevents hallucination** with multiple safeguards
4. **Processes multiple schemas** simultaneously
5. **Maps fields generically** without hardcoding
6. **Checks content relevance** before generating data
7. **Enhances with @context and @id** automatically
8. **Customizable** via admin UI for prompts and mappings
9. **Maintainable** - adding new schema types requires zero code changes
10. **Production-ready** with comprehensive error handling and logging

Perfect for sites with rich structured content that need accurate, AI-powered Schema.org markup without manual configuration for each schema type.