CipherStream Enterprise Data Extraction Platform
About CipherStream
CipherStream is Centaur Software's proprietary enterprise bulk data extraction platform, designed for high-performance data extraction from cloud-hosted databases with intelligent processing optimisation.
Key Features
- 38 Specialised Data Endpoints: Complete coverage of Dental4Web data
- Intelligent Processing: Auto-switching between streaming (≤100K rows) and job processing (>100K rows)
- Enterprise Security: AES-256-GCM encryption, API key authentication, IP whitelisting
- Multiple Formats: JSON, NDJSON, CSV with optional GZIP/ZIP compression
- Real-time & Batch: Sub-second streaming for small datasets, S3-backed jobs for large datasets
- Webhook Integration: Real-time notifications for job completion
Authentication
API Key Types
Key Type | Format | Access Level | Use Case |
Customer Live |
| Production API access | Live data extraction |
Customer Demo |
| Demo/testing access | Testing and evaluation |
Headers Required
Plain text
Authorization: Bearer <your-api-key>Content-Type: application/jsonExecution Modes
CipherStream automatically determines the optimal processing method based on data volume:
Auto Mode (Recommended)
- ≤100K rows: Streaming response (immediate)
- >100K rows: Background job with S3 download
- >50MB estimated: Automatic job mode regardless of row count
- System decides: Based on current load and data characteristics
Stream Mode
- Force streaming: Real-time response for all requests
- Recommended limit: ≤100K rows for optimal performance
- Timeout: 30 seconds maximum
- Use case: Small datasets requiring immediate response
Job Mode
- Force background processing: All requests create jobs
- S3 storage: Secure file storage with presigned URLs
- Extended timeout: 180 seconds for large extractions
- Use case: Large datasets, scheduled extractions
Date Modifiers
Control which records are returned based on their creation and modification dates:
Available Options
- All (default): Returns all records within the date range regardless of creation/update dates
- Created: Returns only records created within the specified date range
- Updated: Returns only records updated within the specified date range
Usage Examples
Plain text
{ "date_modifier": "Created", "from_date": "2024-01-01", "to_date": "2024-12-31"}Behavior by Procedure Type
- Reference Data: Date modifiers not applicable (always returns current data)
- Data Procedures: Uses ts_4_insert (Created) and ts_4_update (Updated) fields
- Special Procedures: Applies to underlying table timestamp fields
Output Formats
NDJSON (Default)
- Format: Newline-delimited JSON
- Structure: One JSON object per line
- Benefits: Streaming-friendly, memory efficient
- Use case: Large datasets, real-time processing
Plain text
JSON
- Format: Standard JSON array
- Structure: Array of objects with metadata
- Benefits: Standard format, easy parsing
- Use case: Small to medium datasets, standard integrations
Plain text
{ "data": [ {"id": 1, "name": "John Doe", "date": "2024-01-01"}, {"id": 2, "name": "Jane Smith", "date": "2024-01-02"} ], "metadata": {"execution_time": 2.5, "record_count": 2}}CSV
- Format: Comma-separated values
- Structure: Header row followed by data rows
- Benefits: Universal compatibility, Excel-friendly
- Use case: Reporting, data analysis, spreadsheet import
Plain text
id,name,date1,John Doe,2024-01-012,Jane Smith,2024-01-02Compression Options
None (Default)
- No compression: Raw output format
- Use case: Small datasets, immediate processing
- Available in: Streaming and Job modes
GZIP
- Compression ratio: 70-96% size reduction (tested)
- Format: .gz compressed files
- Smart threshold: Only compresses data >8KB for optimal performance
- Available in: Streaming mode and Job mode
- Use case: Large datasets, bandwidth optimisation
- HTTP Header: Content-Encoding: gzip
ZIP
- Compression ratio: 60-97% size reduction (tested)
- Format: .zip archive files with proper structure
- Smart threshold: Only compresses data >8KB for optimal performance
- Available in: Streaming mode and Job mode
- Use case: Multiple files, Windows compatibility
- HTTP Header: Content-Encoding: deflate
- Compression ratio: 70-90% size reduction
- Format: .gz compressed files
- Use case: Large datasets, bandwidth optimisation
Security & Encryption
Data in Transit
- TLS 1.3: All API communications encrypted
- Certificate pinning: Enhanced security for production
- HSTS: HTTP Strict Transport Security enabled
Data at Rest
- AES-256-GCM: S3 bucket encryption for job results
- Presigned URLs: Time-limited access (12 hours)
- Automatic cleanup: Files removed after expiry
End-to-End Encryption
- AES-256-GCM: Customer-specific encryption keys
- Available in: Streaming mode and Job mode
- Chunk-based: Each data chunk encrypted individually
- Key rotation: Key rotation support
- Usage: Set
"encrypted": truein request body - Format:
{"encrypted_data": "key_id:iv:ciphertext"}
Authentication
- Bearer tokens: API key-based authentication
- IP whitelisting: Optional IP restriction
- Rate limiting: Prevents abuse and ensures fair usage
Webhook System
Supported Events
- job.completed: Job finished successfully with download URL
- job.failed: Job encountered an error with details
- job.cancelled: Job was manually cancelled
Security Features
- HMAC-SHA256: Payload signature verification
- Automatic retries: Up to 3 attempts with exponential backoff
- Delivery tracking: Monitor success/failure rates
- Timeout handling: 30-second response timeout
Webhook Payload Example
Plain text
{ "event": "job.completed", "timestamp": "2024-09-26T10:02:15Z", "signature": "sha256=abc123...", "data": { "job_id": "appointments_1758630144658_c7511999", "customer_id": "your-customer-id", "status": "completed", "rows_processed": 45678, "execution_time_seconds": 64.75, "s3_url": "https://secure-download-url", "expires_at": "2024-09-26T22:02:15Z", "file_size_bytes": 2048576, "output_format": "ndjson", "compression": "gzip" }}Job Management Lifecycle
Job States
- Queued: Job created and waiting for processing
- Running: Data extraction in progress with real-time progress updates
- Completed: Data available for download via S3 presigned URL
- Failed: Error occurred during processing with detailed error information
- Cancelled: Job manually cancelled by user or system timeout
Job Features
- Progress tracking: Real-time percentage completion (0-100%)
- Row counting: Live count of processed records
- Time estimation: Estimated completion time based on current progress
- Resource monitoring: Memory and CPU usage tracking
- Error handling: Detailed error messages and recovery suggestions
Download Management
- S3 presigned URLs: Secure, time-limited download links
- 12-hour expiry: URLs automatically expire for security
- Resume support: Partial download recovery for large files
- Bandwidth optimisation: CDN-accelerated downloads
Data Procedures Overview
Reference Data
- Always streamed: Immediate response (no job creation)
- 5-minute cache: Optimal performance for frequently accessed data
- No parameters required: Simple, consistent access pattern
- Small datasets: Typically <1000 records per procedure
- Use cases: Dropdown population, validation, system configuration
Data Procedures
- Intelligent mode selection: Auto streaming/job based on volume
- Date range filtering: Flexible from_date/to_date parameters
- Date modifier support: Created, Updated, or All records
- High-volume capable: Handles millions of records efficiently
- Use cases: Bulk data extraction, reporting, analytics
Special Procedures
- CALL syntax: Advanced stored procedure execution
- Direct table access: Extract from any accessible database table
- Extended timeout: 180-second timeout for large extractions
- Flexible parameters: Custom table names and date filtering
- Use cases: Custom extractions, ad-hoc queries, data migration
Performance Characteristics
Streaming Response
- Latency: <2 seconds for reference data
- Throughput: Up to 10,000 records/second
- Memory usage: Constant memory footprint
- Concurrent requests: Up to 10 simultaneous streams
Job Processing
- Throughput: Up to 100,000 records/second
- Scalability: Auto-scaling based on queue depth
- Resource allocation: Dedicated processing resources
- Monitoring: Real-time progress and performance metrics
Error Handling
HTTP Status Codes
- 200: Success - Data returned or job created
- 400: Bad Request - Invalid parameters or format
- 401: Unauthorized - Invalid or missing API key
- 403: Forbidden - Access denied or rate limited
- 404: Not Found - Endpoint or resource not found
- 429: Too Many Requests - Rate limit exceeded
- 500: Internal Server Error - System error occurred
Error Response Format
Plain text
{ "error": { "code": "INVALID_DATE_RANGE", "message": "The specified date range is invalid", "details": "from_date must be earlier than to_date", "timestamp": "2024-09-26T10:00:00Z", "request_id": "req_abc123" }}Environment URLs
- Production: https://cipherstream.centaur.software
Was this section helpful?
What made this section unhelpful for you?
On this page
- CipherStream Enterprise Data Extraction Platform