CataCompDetect: Intraoperative Complication
Detection in Cataract Surgery

Bhuvan Sachdeva, Sneha Kumari, Rudransh Agarwal, Shalaka Kumaraswamy,
Niharika Prasad, Raphael Lechtenboehmer, Simon Mueller, Maximilian W. M. Wintergerst,
Thomas Schultz, Kaushik Murali, Mohit Jain
Sankara Eye Hospital, Bengaluru, India Microsoft Research, Bengaluru, India University of Bonn, Bonn, Germany

Abstract

Cataract surgery is one of the most commonly performed procedures worldwide, with over 26 million surgeries performed annually. Despite standardized techniques, intraoperative complications may still occur due to surgeon variability and patient-related factors. Such complications pose significant risks to visual outcomes and, in severe cases, may cause permanent vision loss. We propose CataCompDetect, a complication detection framework that combines complication-specific risk scoring and vision-language reasoning for classification. The framework first identifies relevant surgical phases via expert-derived priors, then performs risk assessment to identify video segments exhibiting anatomical cues suggestive of complications, followed by a vision-language model for final classification. To evaluate CataCompDetect, we introduce CataComp-104, the first cataract surgery video dataset annotated for intraoperative complications, comprising 104 surgeries with 44 complication cases across three clinically significant types: iris prolapse, PCR, and vitreous loss. CataCompDetect achieves an average F1 score of 60.05%, with per-complication F1 scores of 75.00% (iris prolapse), 57.14% (PCR), and 48.00% (vitreous loss).

Target Complications

Three clinically significant intraoperative complications in MSICS cataract surgery.

Iris Prolapse

Protrusion of iris tissue through the corneal incision, appearing as dark brownish-red tissue extending beyond the incision margin. Caused by fluctuations in intraocular pressure or incision instability.

F1: 75.00%

Posterior Capsule Rupture (PCR)

A rupture in the thin posterior capsule supporting the lens, typically occurring during lens removal or cortical wash. Appears as a visible tear in the capsule. Can lead to vitreous loss if unmanaged.

F1: 57.14%

Vitreous Loss

Occurs when the gel-like vitreous prolapses into the anterior chamber, often following PCR. Appears as translucent, web-like strands through the pupil, causing characteristic tear-drop pupil distortion.

F1: 48.00%

Complication Examples

Representative frames from CataComp-104 illustrating each intraoperative complication.

Iris Prolapse example
Iris Prolapse
Dark brownish-red tissue protruding beyond the incision margin
PCR example
Posterior Capsule Rupture
Visible tear in the posterior capsule during cortical wash
Vitreous Loss example
Vitreous Loss
Translucent web-like vitreous strands with tear-drop pupil distortion

Method: CataCompDetect

A three-stage pipeline integrating surgical phase-aware localization, anatomical risk scoring, and vision-language classification.

1

Phase Localization

Surgical phase predictions (MS-TCN++) narrow analysis to phases where each complication is most likely. PCR/Vitreous Loss: cortical wash. Iris Prolapse: entire video.

2

Anatomical Risk Scoring

Per-frame risk scores computed from iris/pupil segmentation masks using SAM 2 and TernausNet. Temporal sliding window identifies high-risk segments.

3

VLM Classification

Top-5 high-risk segments per video are verified by GPT-5 with complication-specific few-shot prompts describing precise clinical visual indicators.

Risk Scoring Modules

Iris Prolapse

SAM 2 identifies candidate masks at the iris periphery; filtered by size and color thresholds. Largest validated mask area serves as risk score.

PCR

Histogram equalization + edge detection inside pupil mask. Longest edge (bounding-box diagonal) normalized by pupil area is the risk score.

Vitreous Loss

Pupil boundary partitioned into angular sectors; risk score = max sector radius / global mean radius, capturing localized wedge-shape deformation.

CataComp-104 Dataset

The first cataract surgery video dataset annotated for intraoperative complications. Collected under routine surgical conditions using a smartphone-mounted microscope (1920×1080 px, 30 fps).

104
MSICS Videos
43h 49m
Total Duration
44
Complication Cases
2
Expert Annotators
Complication Train Val
Videos Avg. Duration Videos Avg. Duration
None3218m 16s ± 7m 56s2817m 01s ± 6m 19s
Iris Prolapse1125m 37s ± 21m 46s1331m 56s ± 25m 54s
PCR1350m 15s ± 23m 34s1137m 06s ± 14m 44s
Vitreous Loss1350m 15s ± 23m 34s1235m 58s ± 14m 34s
Total5326m 16s ± 19m 21s5124m 46s ± 17m 22s

Results

Per-complication and average detection performance on CataComp-104 (validation split). Best results per complication in bold.

Complication · Method Accuracy Sensitivity Specificity F1 Score
Iris Prolapse
Random50.00%50.00%50.00%33.77%
Naive Classifier (I3D)72.55%69.23%73.68%56.25%
Risk-scoring only37.25%100.00%15.79%44.83%
VLM-only (GPT-5)88.24%53.85%100.00%70.00%
CataCompDetect (GPT-5)88.24%69.23%94.74%75.00%
PCR
Random50.00%50.00%50.00%30.14%
Naive Classifier (I3D)76.47%36.36%87.50%40.00%
Risk-scoring only72.55%63.64%75.00%50.00%
VLM-only (GPT-5)45.10%81.82%35.00%39.19%
CataCompDetect (GPT-5)82.35%54.55%90.00%57.14%
Vitreous Loss
Random50.00%50.00%50.00%32.00%
Naive Classifier (I3D)76.47%00.00%100.00%00.00%
Risk-scoring only68.63%75.00%66.67%52.94%
CataCompDetect (GPT-5)74.51%46.15%82.05%48.00%
Average — CataCompDetect 81.70% 56.64% 88.93% 60.05%