Echo-Motion
๐ซ Project Overview
Echo-Motion is a deep learning framework for automatic left ventricle (LV) segmentation and ejection fraction (EF) estimation from echocardiography videos. The model is built in two stages:
- First, a segmentation model is trained using only the manually labeled ED and ES frames (as provided in public datasets like EchoNet-Dynamic).
- Then, the model is used to generate segmentation masks for all intermediate frames, enabling full-sequence analysis and temporal learning.
This strategy allows us to bootstrap a full cardiac cycle representation from partial annotations โ a practical approach in clinical datasets where dense labeling is expensive.
EchoMotion combines transformer-based encoders, temporal modeling, and multi-view learning, while preserving explainability and clinical alignment by mimicking expert workflows (segmentation โ volume โ EF).
๐ฉบ Clinical Background
Ejection Fraction (EF) is a critical indicator of heart function, used to diagnose and monitor heart failure. Clinically, EF is calculated by measuring volume change of the left ventricle (LV) between end-diastole (ED) and end-systole (ES).
This is usually done by manually tracing the LV in A4C and A2C echo views โ a process that is slow, subjective, and prone to inter-observer variability.
Deep learning offers the potential to automate this process. However, challenges remain:
- Speckle noise and motion blur in ultrasound images
- Anatomical variability between patients
- Limited annotated data, typically only ED/ES frames per video
EchoMotion addresses these challenges by combining high-capacity segmentation models, temporal context, and view fusion to produce a clinically relevant and scalable EF analysis system.
๐ฏ Research Objective
EchoMotion is designed to:
- Learn to segment the LV using only ED/ES-labeled frames
- Automatically generate intermediate-frame masks across full cardiac cycles
- Use those outputs to model temporal dynamics of heart motion
- Estimate EF from derived volume curves using a clinical method
- Support interpretability and trust via Grad-CAM and transparent calculations
This approach balances clinical realism, data efficiency, and modern ML techniques.
๐ง Core Features
Transformer + U-Net Hybrid Architectures
Built on SegFormer backbones with integrated attention modules (SE, CBAM, DANet)Sparse-to-Dense Learning Pipeline
Trained on ED/ES pairs, generalized to full-sequence segmentation for full cardiac cycle modelingSpatio-Temporal Frame Analysis
Sequence modeling (planned) for smooth, consistent heart motion representationMulti-View Fusion (in progress)
Dual-encoder design to jointly leverage A4C and A2C views for improved EF estimationExplainability via Grad-CAM
Visualizes model attention to support clinical trustActive Learning Loop (planned)
Uses uncertainty-based selection to identify frames for further expert annotation
๐งช Methodology
The pipeline consists of four main stages:
1. Preprocessing
- Extract ED and ES labeled frames from echocardiography videos
- Apply CLAHE for contrast enhancement and bilateral filtering to reduce speckle noise
- Resize, normalize, and split into training/validation/test sets
2. Segmentation Network (EchoMotionNet)
- Encoder: SegFormer B0โB2 backbones
- Decoder: U-Net-style with skip connections
- Attention: SE / CBAM / DANet modules
- Trained only on ED/ES frames (sparse supervision)
3. Temporal Expansion & EF Estimation
- Use trained model to infer masks for all frames in the video
Compute LV area per frame, extract EDV/ESV, and calculate:
[ EF = \frac{EDV - ESV}{EDV} \times 100 ]
- (Planned) Learn temporal features directly using Bi-LSTM or Temporal Transformers
4. Evaluation & Visualization
- Metrics: Dice, IoU, MAE (EF), Rยฒ, Pearson
- Visual outputs: segmentation overlays, Grad-CAM maps, EF scatter plots
๐ ๏ธ Technologies Used
- PyTorch
- SegFormer (via HuggingFace / Timm)
- Hydra for configuration
- MLflow for experiment tracking
- OpenCV, PIL, Matplotlib for visualization
- Scikit-learn for evaluation metrics
๐ Evaluation Highlights
- Segmentation Metrics: Dice, IoU
- EF Accuracy: MAE, Rยฒ, Pearson correlation
- Explainable Outputs: Grad-CAM maps, EF area plots, segmentation overlays
๐ก Why EchoMotion?
This project showcases:
- Sparse-to-dense training strategies for realistic clinical datasets
- Transformer-based architectures in medical segmentation
- Temporal modeling potential for dynamic anatomy
- Strong focus on explainability, clinical relevance, and future scalability
