Original paper
RegionCLIP: Region-based Language-Image Pretraining
Pages: 16772 - 16782
Published: Jun 1, 2022
Abstract
Contrastive language-image pretraining (CLIP) using image-text pairs has achieved impressive results on image classification in both zero-shot and transfer learning set-tings. However, we show that directly applying such mod-els to recognize image regions for object detection leads to unsatisfactory performance due to a major domain shift: CLIP was trained to match an image as a whole to a text de-scription, without capturing the fine-grained...
Paper Details
Title
RegionCLIP: Region-based Language-Image Pretraining
Published Date
Jun 1, 2022
Pages
16772 - 16782