A comprehensive study comparing U-Net, PSPNet, and SegNet for pixel-wise facial parts segmentation
- Overview
- Team Members
- Objective
- Dataset
- Models & Architectures
- Installation
- Usage
- Project Structure
- Methodology
- Evaluation Metrics
- Results
- Key Findings
- Applications
- Contributing
- References
- License
This project focuses on facial parts segmentation using state-of-the-art deep learning techniques. We conducted a comprehensive comparative analysis of several prominent CNN architectures to evaluate their effectiveness in detecting and segmenting facial features including:
- 👁️ Eyes
- 👃 Nose
- 👄 Mouth
- ✨ Eyebrows
- 🦴 Jawline
- 👂 Ears
- 🎨 Skin regions
- 💇 Hair
The project implements and compares three powerful semantic segmentation architectures: U-Net, PSPNet, and SegNet, all leveraging MobileNetV2 as the backbone encoder for efficient feature extraction.
- Youness Boumlik
- Abdellah Boulidam
- Zakaria El Houari
- Imane El Warraqi
- Nassima Rhannouch
The primary aim of this project is to:
- Explore and compare different convolutional neural network (CNN) architectures for accurate and efficient facial feature segmentation
- Identify the most suitable models in terms of precision, robustness, and performance across various conditions
- Evaluate model performance on pixel-wise segmentation tasks with 11 distinct facial part classes
- Provide insights into the strengths and weaknesses of each architecture for real-world applications
We use the LAPA dataset, a comprehensive dataset for facial part segmentation:
- Images: High-resolution facial images
- Annotations: Pixel-wise annotations for 11 facial part classes
- Classes: Background, skin, left eyebrow, right eyebrow, left eye, right eye, nose, upper lip, inner mouth, lower lip, hair
- Split: Training and validation sets for robust model evaluation
- Preprocessing: Images resized to 256×256 pixels for efficient training
The dataset can be downloaded from Kaggle - LAPA Face Parsing Dataset.
U-Net is a popular encoder-decoder architecture originally designed for biomedical image segmentation:
- Encoder: MobileNetV2 pretrained on ImageNet
- Skip Connections: Direct connections from encoder to decoder at multiple scales
- Decoder: Progressive upsampling with concatenation of encoder features
- Strengths: Excellent at preserving fine-grained details and spatial information
- Use Case: Ideal for applications requiring high precision in boundary detection
PSPNet incorporates a pyramid pooling module to capture multi-scale contextual information:
- Encoder: MobileNetV2 backbone
- Pyramid Pooling Module: Aggregates context at multiple scales (1×1, 2×2, 3×3, 6×6)
- Progressive Upsampling: Five-stage decoder for full resolution reconstruction
- Strengths: Superior at capturing global context and preserving overall structure
- Use Case: Best for scenarios requiring understanding of facial composition
SegNet uses a symmetric encoder-decoder structure with pooling indices:
- Architecture: Five encoder-decoder blocks
- Encoding: Convolutional layers with max pooling
- Decoding: Upsampling with skip connections from corresponding encoder layers
- Strengths: Memory efficient and good for real-time applications
- Use Case: Suitable for deployment on resource-constrained devices
- Python 3.7 or higher
- CUDA-capable GPU (recommended for training, but CPU training is also supported - just slower)
- 8GB+ RAM (16GB+ recommended for GPU training)
git clone https://github.com/Younessboumlik/Facial-Parts-Segmentation-with-Deep-Learning.git
cd Facial-Parts-Segmentation-with-Deep-Learningpip install tensorflow>=2.8.0
pip install opencv-python
pip install numpy
pip install matplotlib
pip install kagglehub
pip install scikit-learnThe training notebook includes code to download the LAPA dataset automatically using kagglehub. Alternatively, you can manually download it from Kaggle.
The project includes a comprehensive Jupyter notebook (training-source-code.ipynb) that contains all the code for:
- Dataset Loading: Automatic download and preprocessing
- Model Building: Implementation of all three architectures
- Training: Complete training pipeline with callbacks
- Evaluation: Performance metrics and visualization
jupyter notebook training-source-code.ipynbOr upload to Kaggle Notebooks for GPU acceleration.
image_h, image_w = 256, 256 # Image dimensions
num_classes = 11 # Number of facial part classes
batch_size = 8 # Batch size for training
lr = 1e-4 # Learning rate
num_epochs = 10 # Training epochs# Build U-Net model
unet_model = build_unet(input_shape=(256, 256, 3), num_classes=11)
# Build PSPNet model
pspnet_model = build_pspnet(input_shape=(256, 256, 3), num_classes=11)
# Build SegNet model
segnet_model = build_segnet(input_shape=(256, 256, 3), num_classes=11)# Load a trained model
model = tf.keras.models.load_model('path/to/model.keras',
custom_objects={
'iou': iou,
'dice_coefficient': dice_coefficient,
'precision': precision,
'recall': recall
})
# Load and preprocess an image
image = load_and_preprocess_image('path/to/image.jpg')
image_batch = np.expand_dims(image, axis=0)
# Make prediction
prediction = model.predict(image_batch)
mask = np.argmax(prediction[0], axis=-1)The notebook includes visualization functions to compare model predictions:
visualize_predictions(
models=[unet_model, pspnet_model, segnet_model],
model_names=["U-Net", "PSPNet", "SegNet"],
image_paths=test_images
)Facial-Parts-Segmentation-with-Deep-Learning/
│
├── training-source-code.ipynb # Main training notebook with all implementations
├── report.pdf # Detailed project report
├── README.md # This file
├── .gitattributes # Git attributes configuration
│
└── files/ # Generated during training
├── unet_model.keras # Trained U-Net model
├── pspnet_model.keras # Trained PSPNet model
├── segnet_model.keras # Trained SegNet model
├── unet_data.csv # U-Net training logs
├── pspnet_data.csv # PSPNet training logs
└── segnet_data.csv # SegNet training logs
Our comprehensive approach includes:
- Image resizing to 256×256 pixels
- Normalization to [0, 1] range
- One-hot encoding of segmentation masks
- Data augmentation (optional)
- Three state-of-the-art architectures selected
- MobileNetV2 pretrained encoder for transfer learning
- Custom decoder implementations for each architecture
- Mixed precision training for efficiency
- Categorical cross-entropy loss
- Adam optimizer with learning rate scheduling
- Early stopping and model checkpointing
- Learning rate reduction on plateau
- Quantitative metrics: IoU, Dice coefficient, Precision, Recall
- Qualitative analysis: Visual comparison of predictions
- Performance assessment under normal and occluded conditions
We employ multiple metrics to comprehensively evaluate model performance:
- Measures overlap between predicted and ground truth masks
- Range: 0 (no overlap) to 1 (perfect overlap)
- Primary metric for segmentation quality
- Similar to IoU but more sensitive to small regions
- Harmonic mean of precision and recall
- Range: 0 to 1
- Measures accuracy of positive predictions
- Important for minimizing false positives
- Measures completeness of positive predictions
- Important for minimizing false negatives
- Training objective function
- Measures pixel-wise classification accuracy
The models were assessed on their ability to segment facial components under various conditions:
- ✅ Strengths: Excellent at handling fine-grained details and preserving spatial information
- ✅ Best performance on boundary detection
- ✅ Superior for small facial features (eyes, eyebrows)
⚠️ Moderate performance on global structure
- ✅ Strengths: Superior at preserving global facial structure
- ✅ Better contextual understanding through pyramid pooling
- ✅ Robust to scale variations
⚠️ Slightly slower inference time
- ✅ Strengths: Memory efficient architecture
- ✅ Faster inference for real-time applications
- ✅ Good balance between accuracy and efficiency
⚠️ Moderate performance on complex occlusions
- All models achieved competitive performance on the LAPA dataset
- U-Net excels at detail preservation
- PSPNet performs best for overall facial structure understanding
- SegNet offers the best speed-accuracy tradeoff
-
Architecture-Specific Strengths: Each architecture has unique strengths and weaknesses depending on the type of facial features and occlusions being segmented.
-
Transfer Learning Benefits: Using pretrained MobileNetV2 as the encoder significantly improves convergence speed and final performance.
-
Multi-Scale Context: PSPNet's pyramid pooling module provides advantages in understanding facial composition as a whole.
-
Skip Connections: U-Net's skip connections are crucial for preserving fine-grained spatial details.
-
Real-World Applicability: Ensemble or hybrid approaches may improve robustness in real-world applications with various lighting conditions and occlusions.
-
Computational Efficiency: SegNet offers a good balance for deployment in resource-constrained environments.
Facial parts segmentation has numerous practical applications:
- Enhanced feature extraction
- Robust to partial occlusions
- Face filters and effects
- Virtual makeup application
- Real-time face modification
- Facial reconstruction planning
- Anomaly detection
- Cosmetic surgery simulation
- Motion capture for facial animation
- Character design and modeling
- Video game development
- Enhanced authentication systems
- Surveillance and monitoring
- Identity verification
- Emotion recognition
- Facial expression analysis
- Human-computer interaction
We welcome contributions to improve this project! Here's how you can help:
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
- Additional model architectures (DeepLabV3+, HRNet, etc.)
- Data augmentation strategies
- Post-processing techniques
- Real-time inference optimization
- Mobile deployment (TensorFlow Lite)
- Web deployment (TensorFlow.js)
- Additional evaluation metrics
- Documentation improvements
- LAPA Dataset: Kaggle - LAPA Face Parsing Dataset
- U-Net: Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. MICCAI 2015.
- PSPNet: Zhao, H., Shi, J., Qi, X., Wang, X., & Jia, J. (2017). Pyramid Scene Parsing Network. CVPR 2017.
- SegNet: Badrinarayanan, V., Kendall, A., & Cipolla, R. (2017). SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. TPAMI 2017.
- MobileNetV2: Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. (2018). MobileNetV2: Inverted Residuals and Linear Bottlenecks. CVPR 2018.
This project is licensed under the MIT License - see the LICENSE file for details.
- Thanks to the creators of the LAPA dataset for providing high-quality annotations
- TensorFlow and Keras teams for excellent deep learning frameworks
- The research community for developing these powerful architectures
- Kaggle for providing computational resources
For questions, suggestions, or collaborations, please contact the team members or open an issue in this repository.
⭐ If you find this project useful, please consider giving it a star! ⭐
Made with ❤️ by the Facial Segmentation Team