arXiv: Computer Vision and Pattern Recognition
Papers
30.8k
Papers 10,000
1 page of 1,000 pages (10k results)
Newest
Last. Seba Susan (DCE: Delhi Technological University)H-Index: 13
view all 4 authors...
A hybrid model is proposed that integrates two popular image captioning methods to generate a text-based summary describing the contents of the image. The two image captioning models are the Neural Image Caption (NIC) and the k-nearest neighbor approach. These are trained individually on the training set. We extract a set of five features, from the validation set, for evaluating the results of the two models that in turn is used to train a logistic regression classifier. The BLEU-4 scores of the...
Source
Our recent study using historic data of paddy yield and associated conditions include humidity, luminescence, and temperature. By incorporating regression models and neural networks (NN), one can produce highly satisfactory forecasting of paddy yield. Simulations indicate that our model can predict paddy yield with high accuracy while concurrently detecting diseases that may exist and are oblivious to the human eye. Crop Yield Prediction Using Regression and Neural Networks (CYPUR-NN) is develop...
#1Renshen WangH-Index: 1
#2Yasuhisa FujiiH-Index: 11
Last. Ashok C. Popat (Google)H-Index: 8
view all 3 authors...
Paragraphs are an important class of document entities. We propose a new approach for paragraph identification by spatial graph convolutional neural networks (GCN) applied on OCR text boxes. Two steps, namely line splitting and line clustering, are performed to extract paragraphs from the lines in OCR results. Each step uses a beta-skeleton graph constructed from bounding boxes, where the graph edges provide efficient support for graph convolution operations. With only pure layout input features...
Sign Language helps people with Speaking and Hearing Disabilities communicate with others efficiently. Sign Language identification is a challenging area in the field of computer vision and recent developments have been able to achieve near perfect results for the task, though some challenges are yet to be solved. In this paper we propose a novel machine learning based pipeline for American Sign Language identification using hand track points. We convert a hand gesture into a series of hand trac...
Source
#1Ron MokadyH-Index: 3
#2Amir HertzH-Index: 5
Last. Amit BermanoH-Index: 13
view all 3 authors...
Image captioning is a fundamental task in vision-language understanding, where the model predicts a textual informative caption to a given input image. In this paper, we present a simple approach to address this task. We use CLIP encoding as a prefix to the caption, by employing a simple mapping network, and then fine-tunes a language model to generate the image captions. The recently proposed CLIP model contains rich semantic features which were trained with textual context, making it best for ...
Since convolutional neural networks (CNNs) perform well at learning generalizable image priors from large-scale data, these models have been extensively applied to image restoration and related tasks. Recently, another class of neural architectures, Transformers, have shown significant performance gains on natural language and high-level vision tasks. While the Transformer model mitigates the shortcomings of CNNs (i.e., limited receptive field and inadaptability to input content), its computatio...
#1Mingfei Gao (SF: Salesforce.com)H-Index: 9
#2Chen Xing (SF: Salesforce.com)
Last. Caiming Xiong (SF: Salesforce.com)H-Index: 54
view all 7 authors...
Despite great progress in object detection, most existing methods are limited to a small set of object categories, due to the tremendous human effort needed for instance-level bounding-box annotation. To alleviate the problem, recent open vocabulary and zero-shot detection methods attempt to detect object categories not seen during training. However, these approaches still rely on manually provided bounding-box annotations on a set of base classes. We propose an open vocabulary detection framewo...
#1Ze Liu (Microsoft)H-Index: 6
#2Han Hu (Microsoft)H-Index: 26
Last. Baining Guo (Microsoft)H-Index: 71
view all 12 authors...
We present techniques for scaling Swin Transformer up to 3 billion parameters and making it capable of training with images of up to 1,536\times,536 resolution. By scaling up capacity and resolution, Swin Transformer sets new records on four representative vision benchmarks: 84.0% top-1 accuracy on ImageNet-V2 image classification, 63.1/54.4 box/mask mAP on COCO object detection, 59.9 mIoU on ADE20K semantic segmentation, and 86.8% top-1 accuracy on Kinetics-400 video action classification. O...
We introduce latency-aware network acceleration (LANA) - an approach that builds on neural architecture search techniques and teacher-student distillation to accelerate neural networks. LANA consists of two phases: in the first phase, it trains many alternative operations for every layer of the teacher network using layer-wise feature map distillation. In the second phase, it solves the combinatorial selection of efficient operations using a novel constrained integer linear optimization (ILP) ap...
In the real world, the degradation of images taken under haze can be quite complex, where the spatial distribution of haze is varied from image to image. Recent methods adopt deep neural networks to recover clean scenes from hazy images directly. However, due to the paradox caused by the variation of real captured haze and the fixed degradation parameters of the current networks, the generalization ability of recent dehazing methods on real-world hazy images is not this http URL address the prob...
12345678910
Top fields of study
This website uses cookies.
We use cookies to improve your online experience. By continuing to use our website we assume you agree to the placement of these cookies.
To learn more, you can find in our Privacy Policy.