Abstract
Cataract surgery is one of the most commonly performed procedures worldwide, with over 26 million surgeries performed annually. Despite standardized techniques, intraoperative complications may still occur due to surgeon variability and patient-related factors. Such complications pose significant risks to visual outcomes and, in severe cases, may cause permanent vision loss. We propose CataCompDetect, a complication detection framework that combines complication-specific risk scoring and vision-language reasoning for classification. The framework first identifies relevant surgical phases via expert-derived priors, then performs risk assessment to identify video segments exhibiting anatomical cues suggestive of complications, followed by a vision-language model for final classification. To evaluate CataCompDetect, we introduce CataComp-104, the first cataract surgery video dataset annotated for intraoperative complications, comprising 104 surgeries with 44 complication cases across three clinically significant types: iris prolapse, PCR, and vitreous loss. CataCompDetect achieves an average F1 score of 60.05%, with per-complication F1 scores of 75.00% (iris prolapse), 57.14% (PCR), and 48.00% (vitreous loss).
Target Complications
Three clinically significant intraoperative complications in MSICS cataract surgery.
Iris Prolapse
Protrusion of iris tissue through the corneal incision, appearing as dark brownish-red tissue extending beyond the incision margin. Caused by fluctuations in intraocular pressure or incision instability.
F1: 75.00%Posterior Capsule Rupture (PCR)
A rupture in the thin posterior capsule supporting the lens, typically occurring during lens removal or cortical wash. Appears as a visible tear in the capsule. Can lead to vitreous loss if unmanaged.
F1: 57.14%Vitreous Loss
Occurs when the gel-like vitreous prolapses into the anterior chamber, often following PCR. Appears as translucent, web-like strands through the pupil, causing characteristic tear-drop pupil distortion.
F1: 48.00%Complication Examples
Representative frames from CataComp-104 illustrating each intraoperative complication.
Method: CataCompDetect
A three-stage pipeline integrating surgical phase-aware localization, anatomical risk scoring, and vision-language classification.
Phase Localization
Surgical phase predictions (MS-TCN++) narrow analysis to phases where each complication is most likely. PCR/Vitreous Loss: cortical wash. Iris Prolapse: entire video.
Anatomical Risk Scoring
Per-frame risk scores computed from iris/pupil segmentation masks using SAM 2 and TernausNet. Temporal sliding window identifies high-risk segments.
VLM Classification
Top-5 high-risk segments per video are verified by GPT-5 with complication-specific few-shot prompts describing precise clinical visual indicators.
Risk Scoring Modules
SAM 2 identifies candidate masks at the iris periphery; filtered by size and color thresholds. Largest validated mask area serves as risk score.
Histogram equalization + edge detection inside pupil mask. Longest edge (bounding-box diagonal) normalized by pupil area is the risk score.
Pupil boundary partitioned into angular sectors; risk score = max sector radius / global mean radius, capturing localized wedge-shape deformation.
CataComp-104 Dataset
The first cataract surgery video dataset annotated for intraoperative complications. Collected under routine surgical conditions using a smartphone-mounted microscope (1920×1080 px, 30 fps).
| Complication | Train | Val | ||
|---|---|---|---|---|
| Videos | Avg. Duration | Videos | Avg. Duration | |
| None | 32 | 18m 16s ± 7m 56s | 28 | 17m 01s ± 6m 19s |
| Iris Prolapse | 11 | 25m 37s ± 21m 46s | 13 | 31m 56s ± 25m 54s |
| PCR | 13 | 50m 15s ± 23m 34s | 11 | 37m 06s ± 14m 44s |
| Vitreous Loss | 13 | 50m 15s ± 23m 34s | 12 | 35m 58s ± 14m 34s |
| Total | 53 | 26m 16s ± 19m 21s | 51 | 24m 46s ± 17m 22s |
Results
Per-complication and average detection performance on CataComp-104 (validation split). Best results per complication in bold.
| Complication · Method | Accuracy | Sensitivity | Specificity | F1 Score |
|---|---|---|---|---|
| Iris Prolapse | ||||
| Random | 50.00% | 50.00% | 50.00% | 33.77% |
| Naive Classifier (I3D) | 72.55% | 69.23% | 73.68% | 56.25% |
| Risk-scoring only | 37.25% | 100.00% | 15.79% | 44.83% |
| VLM-only (GPT-5) | 88.24% | 53.85% | 100.00% | 70.00% |
| CataCompDetect (GPT-5) | 88.24% | 69.23% | 94.74% | 75.00% |
| PCR | ||||
| Random | 50.00% | 50.00% | 50.00% | 30.14% |
| Naive Classifier (I3D) | 76.47% | 36.36% | 87.50% | 40.00% |
| Risk-scoring only | 72.55% | 63.64% | 75.00% | 50.00% |
| VLM-only (GPT-5) | 45.10% | 81.82% | 35.00% | 39.19% |
| CataCompDetect (GPT-5) | 82.35% | 54.55% | 90.00% | 57.14% |
| Vitreous Loss | ||||
| Random | 50.00% | 50.00% | 50.00% | 32.00% |
| Naive Classifier (I3D) | 76.47% | 00.00% | 100.00% | 00.00% |
| Risk-scoring only | 68.63% | 75.00% | 66.67% | 52.94% |
| CataCompDetect (GPT-5) | 74.51% | 46.15% | 82.05% | 48.00% |
| Average — CataCompDetect | 81.70% | 56.64% | 88.93% | 60.05% |