RegionCLIP: Region-based Language-Image Pretraining

Yiwu Zhong; Jianwei Yang; Pengchuan Zhang; Chunyuan Li; Noel Codella; Liunian Harold Li; Luowei Zhou; Xiyang Dai; Lu Yuan; Yin Li; Jianfeng Gao

doi:https://doi.org/10.1109/cvpr52688.2022.01629

doi.org/10.1109/cvpr52688.2022.01629

Original paper

RegionCLIP: Region-based Language-Image Pretraining

,

,

..., Jianfeng Gao

38

Pages: 16772 - 16782

Published: Jun 1, 2022

Abstract

Contrastive language-image pretraining (CLIP) using image-text pairs has achieved impressive results on image classification in both zero-shot and transfer learning set-tings. However, we show that directly applying such mod-els to recognize image regions for object detection leads to unsatisfactory performance due to a major domain shift: CLIP was trained to match an image as a whole to a text de-scription, without capturing the fine-grained...

Paper Fields

Paper Details

Title

RegionCLIP: Region-based Language-Image Pretraining

DOI

doi.org/10.1109/cvpr52688.2022.01629

Published Date

Jun 1, 2022

Pages

16772 - 16782

Notes

To use the Note feature, you need to be logged in. Please

History