๐Ÿ“ฆ Stirling-Tools / Stirling-PDF

๐Ÿ“„ CLAUDE.md ยท 229 lines
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Common Development Commands

### Build and Test
- **Build project**: `./gradlew clean build`
- **Run locally**: `./gradlew bootRun`
- **Full test suite**: `./test.sh` (builds all Docker variants and runs comprehensive tests)
- **Code formatting**: `./gradlew spotlessApply` (runs automatically before compilation)

### Docker Development
- **Build ultra-lite**: `docker build -t stirlingtools/stirling-pdf:latest-ultra-lite -f ./Dockerfile.ultra-lite .`
- **Build standard**: `docker build -t stirlingtools/stirling-pdf:latest -f ./Dockerfile .`
- **Build fat version**: `docker build -t stirlingtools/stirling-pdf:latest-fat -f ./Dockerfile.fat .`
- **Example compose files**: Located in `exampleYmlFiles/` directory

### Security Mode Development
Set `DOCKER_ENABLE_SECURITY=true` environment variable to enable security features during development. This is required for testing the full version locally.

### Frontend Development
- **Frontend dev server**: `cd frontend && npm run dev` (requires backend on localhost:8080)
- **Tech Stack**: Vite + React + TypeScript + Mantine UI + TailwindCSS
- **Proxy Configuration**: Vite proxies `/api/*` calls to backend (localhost:8080)
- **Build Process**: DO NOT run build scripts manually - builds are handled by CI/CD pipelines
- **Package Installation**: DO NOT run npm install commands - package management handled separately
- **Deployment Options**:
  - **Desktop App**: `npm run tauri-build` (native desktop application)
  - **Web Server**: `npm run build` then serve dist/ folder
  - **Development**: `npm run tauri-dev` for desktop dev mode

#### Multi-Tool Workflow Architecture
Frontend designed for **stateful document processing**:
- Users upload PDFs once, then chain tools (split โ†’ merge โ†’ compress โ†’ view)
- File state and processing results persist across tool switches
- No file reloading between tools - performance critical for large PDFs (up to 100GB+)

#### FileContext - Central State Management
**Location**: `src/contexts/FileContext.tsx`
- **Active files**: Currently loaded PDFs and their variants
- **Tool navigation**: Current mode (viewer/pageEditor/fileEditor/toolName)
- **Memory management**: PDF document cleanup, blob URL lifecycle, Web Worker management
- **IndexedDB persistence**: File storage with thumbnail caching
- **Preview system**: Tools can preview results (e.g., Split โ†’ Viewer โ†’ back to Split) without context pollution

**Critical**: All file operations go through FileContext. Don't bypass with direct file handling.

#### Processing Services
- **enhancedPDFProcessingService**: Background PDF parsing and manipulation
- **thumbnailGenerationService**: Web Worker-based with main-thread fallback
- **fileStorage**: IndexedDB with LRU cache management

#### Memory Management Strategy
**Why manual cleanup exists**: Large PDFs (up to 100GB+) through multiple tools accumulate:
- PDF.js documents that need explicit .destroy() calls
- Blob URLs from tool outputs that need revocation
- Web Workers that need termination
Without cleanup: browser crashes with memory leaks.

#### Tool Development

**Architecture**: Modular hook-based system with clear separation of concerns:

- **useToolOperation** (`frontend/src/hooks/tools/shared/useToolOperation.ts`): Main orchestrator hook
  - Coordinates all tool operations with consistent interface
  - Integrates with FileContext for operation tracking
  - Handles validation, error handling, and UI state management

- **Supporting Hooks**:
  - **useToolState**: UI state management (loading, progress, error, files)
  - **useToolApiCalls**: HTTP requests and file processing
  - **useToolResources**: Blob URLs, thumbnails, ZIP downloads

- **Utilities**:
  - **toolErrorHandler**: Standardized error extraction and i18n support
  - **toolResponseProcessor**: API response handling (single/zip/custom)
  - **toolOperationTracker**: FileContext integration utilities

**Three Tool Patterns**:

**Pattern 1: Single-File Tools** (Individual processing)
- Backend processes one file per API call
- Set `multiFileEndpoint: false`
- Examples: Compress, Rotate
```typescript
return useToolOperation({
  operationType: 'compress',
  endpoint: '/api/v1/misc/compress-pdf',
  buildFormData: (params, file: File) => { /* single file */ },
  multiFileEndpoint: false,
});
```

**Pattern 2: Multi-File Tools** (Batch processing)
- Backend accepts `MultipartFile[]` arrays in single API call
- Set `multiFileEndpoint: true`
- Examples: Split, Merge, Overlay
```typescript
return useToolOperation({
  operationType: 'split',
  endpoint: '/api/v1/general/split-pages',
  buildFormData: (params, files: File[]) => { /* all files */ },
  multiFileEndpoint: true,
  filePrefix: 'split_',
});
```

**Pattern 3: Complex Tools** (Custom processing)
- Tools with complex routing logic or non-standard processing
- Provide `customProcessor` for full control
- Examples: Convert, OCR
```typescript
return useToolOperation({
  operationType: 'convert',
  customProcessor: async (params, files) => { /* custom logic */ },
});
```

**Benefits**:
- **No Timeouts**: Operations run until completion (supports 100GB+ files)
- **Consistent**: All tools follow same pattern and interface
- **Maintainable**: Single responsibility hooks, easy to test and modify
- **i18n Ready**: Built-in internationalization support
- **Type Safe**: Full TypeScript support with generic interfaces
- **Memory Safe**: Automatic resource cleanup and blob URL management

## Architecture Overview

### Project Structure
- **Backend**: Spring Boot application with Thymeleaf templating
- **Frontend**: React-based SPA in `/frontend` directory (Thymeleaf templates fully replaced)
  - **File Storage**: IndexedDB for client-side file persistence and thumbnails
  - **Internationalization**: JSON-based translations (converted from backend .properties)
- **PDF Processing**: PDFBox for core PDF operations, LibreOffice for conversions, PDF.js for client-side rendering
- **Security**: Spring Security with optional authentication (controlled by `DOCKER_ENABLE_SECURITY`)
- **Configuration**: YAML-based configuration with environment variable overrides

### Controller Architecture
- **API Controllers** (`src/main/java/.../controller/api/`): REST endpoints for PDF operations
  - Organized by function: converters, security, misc, pipeline
  - Follow pattern: `@RestController` + `@RequestMapping("/api/v1/...")`
- **Web Controllers** (`src/main/java/.../controller/web/`): Serve Thymeleaf templates
  - Pattern: `@Controller` + return template names

### Key Components
- **SPDFApplication.java**: Main application class with desktop UI and browser launching logic
- **ConfigInitializer**: Handles runtime configuration and settings files
- **Pipeline System**: Automated PDF processing workflows via `PipelineController`
- **Security Layer**: Authentication, authorization, and user management (when enabled)

### Component Architecture
- **React Components**: Located in `frontend/src/components/` and `frontend/src/tools/`
- **Static Assets**: CSS, JS, and resources in `src/main/resources/static/` (legacy) + `frontend/public/` (modern)
- **Internationalization**:
  - Backend: `messages_*.properties` files
  - Frontend: JSON files in `frontend/public/locales/` (converted from .properties)
  - Conversion Script: `scripts/convert_properties_to_json.py`

### Configuration Modes
- **Ultra-lite**: Basic PDF operations only
- **Standard**: Full feature set
- **Fat**: Pre-downloaded dependencies for air-gapped environments
- **Security Mode**: Adds authentication, user management, and enterprise features

### Testing Strategy
- **Integration Tests**: Cucumber tests in `testing/cucumber/`
- **Docker Testing**: `test.sh` validates all Docker variants
- **Manual Testing**: No unit tests currently - relies on UI and API testing

## Development Workflow

1. **Local Development**:
   - Backend: `./gradlew bootRun` (runs on localhost:8080)
   - Frontend: `cd frontend && npm run dev` (runs on localhost:5173, proxies to backend)
2. **Docker Testing**: Use `./test.sh` before submitting PRs
3. **Code Style**: Spotless enforces Google Java Format automatically
4. **Translations**:
   - Backend: Use helper scripts in `/scripts` for multi-language updates
   - Frontend: Update JSON files in `frontend/public/locales/` or use conversion script
5. **Documentation**: API docs auto-generated and available at `/swagger-ui/index.html`

## Frontend Architecture Status

- **Core Status**: React SPA architecture complete with multi-tool workflow support
- **State Management**: FileContext handles all file operations and tool navigation
- **File Processing**: Production-ready with memory management for large PDF workflows (up to 100GB+)
- **Tool Integration**: Modular hook architecture with `useToolOperation` orchestrator
  - Individual hooks: `useToolState`, `useToolApiCalls`, `useToolResources`
  - Utilities: `toolErrorHandler`, `toolResponseProcessor`, `toolOperationTracker`
  - Pattern: Each tool creates focused operation hook, UI consumes state/actions
- **Preview System**: Tool results can be previewed without polluting file context (Split tool example)
- **Performance**: Web Worker thumbnails, IndexedDB persistence, background processing

## Translation Rules

- **CRITICAL**: Always update translations in `en-GB` only, never `en-US`
- Translation files are located in `frontend/public/locales/`

## Important Notes

- **Java Version**: Minimum JDK 17, supports and recommends JDK 21
- **Lombok**: Used extensively - ensure IDE plugin is installed
- **Desktop Mode**: Set `STIRLING_PDF_DESKTOP_UI=true` for desktop application mode
- **File Persistence**:
  - **Backend**: Designed to be stateless - files are processed in memory/temp locations only
  - **Frontend**: Uses IndexedDB for client-side file storage and caching (with thumbnails)
- **Security**: When `DOCKER_ENABLE_SECURITY=false`, security-related classes are excluded from compilation
- **FileContext**: All file operations MUST go through FileContext - never bypass with direct File handling
- **Memory Management**: Manual cleanup required for PDF.js documents and blob URLs - don't remove cleanup code
- **Tool Development**: New tools should follow `useToolOperation` hook pattern (see `useCompressOperation.ts`)
- **Performance Target**: Must handle PDFs up to 100GB+ without browser crashes
- **Preview System**: Tools can preview results without polluting main file context (see Split tool implementation)
- **Adding Tools**: See `ADDING_TOOLS.md` for complete guide to creating new PDF tools

## Communication Style
- Be direct and to the point
- No apologies or conversational filler
- Answer questions directly without preamble
- Explain reasoning concisely when asked
- Avoid unnecessary elaboration

## Decision Making
- Ask clarifying questions before making assumptions
- Stop and ask when uncertain about project-specific details
- Confirm approach before making structural changes
- Request guidance on preferences (cross-platform vs specific tools, etc.)
- Verify understanding of requirements before proceeding