# PDF Services

[Adobe PDF Services API](https://developer.adobe.com/document-services/apis/pdf-services/) integration for Drupal that provides enterprise-grade PDF analysis, optimization, and accessibility checking capabilities directly within your Drupal site.

- Analysis provides detailed information about PDF properties, including page count, size, and structure.
- Accessibility checking validates PDFs against PDF/UA standards and generates detailed reports of compliance issues.
- Optimization reduces file sizes through configurable compression levels while maintaining document quality.
- Field-level configuration allows for different handling of different content types and field contexts.
- Automated queue processing ensures that PDF analysis and optimization does not impact site performance during content creation.
- Status monitoring dashboard provides visibility into the processing queue and allows for easy management of PDF processing operations.
- Email notifications alert content editors when accessibility issues are detected in their PDFs.
- Views integration allows for easy display of PDF analysis and accessibility reports in custom views.

## Features

- **PDF Document Analysis**: Automatically analyze PDF properties including page count, size, structure, and metadata.

- **Accessibility Compliance Checking**: Validate PDFs against PDF/UA accessibility standards and generate detailed reports of compliance issues. Enhance accessibility by providing customized remediation suggestions that can link to external resources.

- **PDF Optimization and Compression**: Reduce file sizes through configurable compression levels while maintaining document quality. And only process PDFs that exceed a certain average page size threshold. Bypass compression for already linearized PDFs.

- **Field-Level Configuration**: Configure PDF processing on a per-field basis, allowing different handling for different content types and field contexts.

- **Automated Queue Processing**: Process PDFs asynchronously through Drupal's queue system to prevent performance impact during content creation.

- **Status Monitoring Dashboard**: Monitor the processing queue status, view statistics, and manage PDF processing operations.

- **Email Notifications**: Automatically notify content editors when accessibility issues are detected in their PDFs.

- **Field Widget Support**: Custom file widget integration for enhanced PDF handling in content forms.

## Post-Installation

After installing the module:

1. Configure API credentials at `/admin/config/content/pdf-services`
   - You'll need a Client ID and Client Secret from Adobe PDF Services

2. Configure processing settings:
   - Batch size for cron processing
   - Retry limits for failed operations (corrupt files, network glitches etc.)
   - Email notification settings

3. Set up field-level PDF processing:
   - Edit any file field on a content type
   - Configure PDF analysis and optimization options
   - Optionally on the form display settings, enable the PDF preview widget to easily view PDF accessibility reports right from the content edit form.

4. Edit content and when PDF files are detected, the module will automatically queue them for processing for the next cron run.

5. Monitor processing at `/admin/config/content/pdf-services/queue`
   - View processing status of all PDFs
   - Check accessibility and analysis reports

## Additional Requirements

- [Adobe PDF Services API](https://developer.adobe.com/document-services/apis/pdf-services/) credentials
  - Free tier available with 500 document transactions per month
  - Paid plans available for higher volume

## Similar projects

- **PDF Preview** (pdf_preview): Focuses on generating thumbnails/previews of PDFs
- **Entity Print** (entity_print): Generates PDFs from Drupal entities rather than analyzing uploaded PDFs
- **PDF Reader** (pdf_reader): Embeds a PDF reader in pages but doesn't provide analysis or optimization
- **pdf** (pdf): pdf.js integration for displaying PDFs in the browser

The PDF Services module is unique in providing deep integration with Adobe's enterprise-grade PDF analysis tools, focusing specifically on accessibility compliance and performance optimization of PDF documents.

## Supporting this Module

This module is maintained by the Tampa.gov development team. If you find it useful, please consider:

- Contributing code improvements through pull requests
- Reporting issues in the issue queue
- Sharing your use cases and feature ideas

## Community Documentation

- [Adobe PDF Services API Documentation](https://developer.adobe.com/document-services/docs/apis/)

## Installation

1. Install the module using Composer:
   ```
   composer require drupal/pdf_services
   ```

2. Enable the module:
   ```
   drush en pdf_services
   ```

3. Configure your Adobe PDF Services API credentials and settings.

## Local Development Environment

Maintainers can use [DDEV](https://ddev.readthedocs.io/) with the [`ddev/ddev-drupal-contrib`](https://github.com/ddev/ddev-drupal-contrib) project for easy module development. This provides a pre-configured local development environment for Drupal.

To add development-specific dependencies, add them to the `require-dev` section of the `composer.json` file and then run `ddev poser`.

## API Usage

The module provides several services for programmatic PDF processing:

### Alter Hook: hook_pdf_services_should_process_file_alter

Other modules can alter whether a file should be processed by implementing `hook_pdf_services_should_process_file_alter()`. This allows custom logic, such as skipping processing for files attached to unpublished nodes.

**Example implementation:**

```php
/**
 * Implements hook_pdf_services_should_process_file_alter().
 */
function mymodule_pdf_services_should_process_file_alter(&$should_process, $context) {
  // Example: Only process files if the parent node is published.
  $entity = $context['entity'] ?? NULL;
  if ($entity && $entity->getEntityTypeId() === 'node' && method_exists($entity, 'isPublished') && !$entity->isPublished()) {
    $should_process = FALSE;
  }
}
```

**Parameters:**
- `&$should_process` (bool): Whether the file should be processed (default from core logic).
- `$context` (array): Context array with keys:
  - `file` (\Drupal\file\FileInterface): The file entity being evaluated.
  - `entity` (\Drupal\Core\Entity\EntityInterface): The parent entity (e.g., node).
  - `field_name` (string): The field name on the entity.
  - `field_settings` (array): PDF Services field settings for this field.

```php
// Queue a PDF file for processing with custom settings
$pdf_services_manager = \Drupal::service('pdf_services.manager');
$settings = [
  'check_properties' => TRUE,
  'check_accessibility' => TRUE,
  'compression_level' => 'MEDIUM',
];
$pdf_services_manager->createProcessingStatus($file, $settings);

// Get an accessibility report for a PDF file
$file_id = 123;
$accessibility_reports = \Drupal::entityTypeManager()
  ->getStorage('pdf_accessibility_result')
  ->loadByProperties(['fid' => $file_id]);

// Check if file exceeds size threshold
$analysis_service = \Drupal::service('pdf_services.client');
$needs_optimization = $analysis_service->exceedsSizeThreshold($file, 500000);
```
