Open Vocabulary Video Semantic Segmentation (master thesis)

Open vocabulary video semantic segmentation (OV-VSS) aims to assign a semantic label to each pixel of each frame of the video given an arbitrary set of open-vocabulary category names. There are a number of attempts on open vocabulary image semantic segmentation (OV-ISS). However, OV-VSS does not get enough attention due to the difficulty of video understanding tasks in modeling local redundancy and global correlation. In this master thesis project, we plan to fill the gap by extending existing OV-ISS methods to OV-VSS. Specifically, we aim to develop a OV-VSS method which achieves high accuracy by using temporal information and keeps high efficiency.

Description
Requirement: Familiar with Python and Pytorch. Prior experience with computer vision, e.g. take computer vision courses at ETH Zurich. Knowledge in image/video semantic segmentation is a plus.
Requirement: Familiar with Python and Pytorch. Prior experience with computer vision, e.g. take computer vision courses at ETH Zurich. Knowledge in image/video semantic segmentation is a plus.
Goal
Goal: 1. Familiarize with OV-ISS and OV-VSS. 2. Adopt existing video semantic segmentation methods to OV-ISS. 3. Propose an algorithm for OV-VSS. 4. Possibility of a submission to top AI conferences such as NeurIPS 2024 and ICLR 2024.
Goal:
1. Familiarize with OV-ISS and OV-VSS.
2. Adopt existing video semantic segmentation methods to OV-ISS.
3. Propose an algorithm for OV-VSS.
4. Possibility of a submission to top AI conferences such as NeurIPS 2024 and ICLR 2024.
Contact Details
Dr. Guolei Sun, guolei.sun@vision.ee.ethz.ch References: [1] Guolei Sun, et. al., “Mining Relations among Cross-Frame Affinities for Video Semantic Segmentation”, ECCV 2022 [2] Guolei Sun, et. al., “Coarse-to-Fine Feature Mining for Video Semantic Segmentation”, ECCV 2022 [3] Feng Liang, et. al., “Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP”, CVPR 2023
Dr. Guolei Sun, guolei.sun@vision.ee.ethz.ch

References:
[1] Guolei Sun, et. al., “Mining Relations among Cross-Frame Affinities for Video Semantic Segmentation”, ECCV 2022
[2] Guolei Sun, et. al., “Coarse-to-Fine Feature Mining for Video Semantic Segmentation”, ECCV 2022
[3] Feng Liang, et. al., “Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP”, CVPR 2023

Calendar

Earliest start	No date
Latest end	No date

Location

Medical imaging (ETHZ)

Labels

Master Thesis

Topics

Information, Computing and Communication Sciences