Concept of Operations (ConOps) - Deep Water
Advanced operational modeling for complex systems.
Operational Context Diagrams (Users, Systems, Data Flows)
Create visual diagrams showing:
- All user types and their roles
- All systems that interact with yours
- Data flows between components
- Trust boundaries and security zones
This becomes the authoritative reference for how everything fits together. When someone asks “where does customer data come from?”, you point to the diagram.
What to include:
- External systems (with ownership/contact info)
- Data stores (with classification levels)
- Network boundaries (DMZ, internal, external)
- User interaction points
- Automated processes and scheduled jobs
Mission Thread Analysis for Critical Paths
A mission thread is an end-to-end critical workflow that must work reliably.
Example - Healthcare system:
Emergency medication order thread:
1. Doctor searches for medication (must complete <2s)
2. System checks patient allergies (must be real-time, no cached data)
3. Doctor confirms order (must work even if billing system is down)
4. Order appears in pharmacy queue (must not be lost, even if network hiccups)
5. Pharmacist verifies and dispenses
For each mission thread:
- Identify every component involved
- Document failure modes at each step
- Define recovery procedures
- Specify performance requirements
- Determine acceptable degraded operation
Mission-critical threads get extra scrutiny, redundancy, monitoring.
Error Recovery and Exception Handling Scenarios
Don’t just document happy path. Document what happens when things break:
Data entry errors:
- User enters invalid date format
- User submits form with missing required field
- User uploads file that’s too large
System errors:
- Database connection timeout
- External API returns 500 error
- Disk space runs out mid-operation
Business logic errors:
- User tries to approve their own request
- Conflicting concurrent edits
- Stale data from caching
For each error scenario:
- What does the user see?
- What gets logged?
- Can the user retry or recover?
- Who gets notified?
Handoff Points Between Human and Automated Processes
Map where automation starts and stops:
Example - Invoice processing:
Human: Uploads scanned invoice PDF
→ Automated: OCR extracts text, identifies vendor, amount, date
→ Automated: Checks if amount exceeds approval threshold
→ Human (if over threshold): Reviews and approves
→ Automated: Routes to accounting system
→ Automated: Schedules payment
→ Human: Receives confirmation email
At each handoff:
- What information transfers?
- What if automation fails partway through?
- Can human override or intervene?
- How does human know it’s their turn?
Poor handoff design causes work to fall through cracks.
Regulatory/Compliance Workflow Requirements
Some workflows have regulatory requirements:
Healthcare (HIPAA):
- Audit every access to patient records
- Require reason for access
- Alert patient of emergency access overrides
- Retain audit logs for 7 years
Finance (SOX):
- Segregation of duties (creator can’t be approver)
- Multi-level approval for high-value transactions
- Immutable audit trail
Privacy (GDPR):
- User can request all their data
- User can request deletion
- Track consent for each data processing purpose
Document these requirements in ConOps so they’re baked into workflows, not bolted on later.
Multi-Tenant or Multi-Organization Scenarios
If your system serves multiple customers/organizations:
Data isolation:
- Can users from Org A see Org B’s data? (Should be never)
- Can admin user see across orgs? (Maybe, depends on your model)
- What happens when user belongs to multiple orgs?
Configuration:
- Does each org have different approval workflows?
- Different integration endpoints?
- Different branding/customization?
Operations:
- Can you deploy updates to one org at a time?
- Do you need per-org rate limiting?
- How do you handle org-specific bugs?
Multi-tenancy sounds simple until you map all the scenarios where “tenant X” and “tenant Y” interact differently with the same code.
Disaster Recovery and Business Continuity Operations
What happens in truly bad scenarios?
Data center outage:
- How do users fail over to backup region?
- Automatic or manual failover?
- How do you prevent split-brain scenarios?
- What’s the recovery point objective (how much data loss is acceptable)?
Security breach:
- Who has authority to take system offline?
- How do you notify users?
- What’s the procedure for forensic analysis?
- How do you restore from clean backup?
Major bug in production:
- Can you roll back? How quickly?
- What’s the procedure for emergency hotfix?
- Who must approve emergency change?
Document these scenarios. In a crisis, nobody can think clearly. Having a documented procedure means people follow the plan instead of making it up under pressure.