Página 17 dos resultados de 471 itens digitais encontrados em 0.077 segundos

NetVLAD: CNN architecture for weakly supervised place recognition

Arandjelović, Relja; Gronat, Petr; Torii, Akihiko; Pajdla, Tomas; Sivic, Josef
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 23/11/2015 Português
Relevância na Pesquisa
37.081572%
We tackle the problem of large scale visual place recognition, where the task is to quickly and accurately recognize the location of a given query photograph. We present the following three principal contributions. First, we develop a convolutional neural network (CNN) architecture that is trainable in an end-to-end manner directly for the place recognition task. The main component of this architecture, NetVLAD, is a new generalized VLAD layer, inspired by the "Vector of Locally Aggregated Descriptors" image representation commonly used in image retrieval. The layer is readily pluggable into any CNN architecture and amenable to training via backpropagation. Second, we develop a training procedure, based on a new weakly supervised ranking loss, to learn parameters of the architecture in an end-to-end manner from images depicting the same places over time downloaded from Google Street View Time Machine. Finally, we show that the proposed architecture obtains a large improvement in performance over non-learnt image representations as well as significantly outperforms off-the-shelf CNN descriptors on two challenging place recognition benchmarks.

Object Level Deep Feature Pooling for Compact Image Representation

Mopuri, Konda Reddy; Babu, R. Venkatesh
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 24/04/2015 Português
Relevância na Pesquisa
37.081572%
Convolutional Neural Network (CNN) features have been successfully employed in recent works as an image descriptor for various vision tasks. But the inability of the deep CNN features to exhibit invariance to geometric transformations and object compositions poses a great challenge for image search. In this work, we demonstrate the effectiveness of the objectness prior over the deep CNN features of image regions for obtaining an invariant image representation. The proposed approach represents the image as a vector of pooled CNN features describing the underlying objects. This representation provides robustness to spatial layout of the objects in the scene and achieves invariance to general geometric transformations, such as translation, rotation and scaling. The proposed approach also leads to a compact representation of the scene, making each image occupy a smaller memory footprint. Experiments show that the proposed representation achieves state of the art retrieval results on a set of challenging benchmark image datasets, while maintaining a compact representation.; Comment: Deep Vision 2015

Wavelet and Curvelet Moments for Image Classification: Application to Aggregate Mixture Grading

Murtagh, Fionn; Starck, Jean-Luc
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 24/02/2008 Português
Relevância na Pesquisa
37.081572%
We show the potential for classifying images of mixtures of aggregate, based themselves on varying, albeit well-defined, sizes and shapes, in order to provide a far more effective approach compared to the classification of individual sizes and shapes. While a dominant (additive, stationary) Gaussian noise component in image data will ensure that wavelet coefficients are of Gaussian distribution, long tailed distributions (symptomatic, for example, of extreme values) may well hold in practice for wavelet coefficients. Energy (2nd order moment) has often been used for image characterization for image content-based retrieval, and higher order moments may be important also, not least for capturing long tailed distributional behavior. In this work, we assess 2nd, 3rd and 4th order moments of multiresolution transform -- wavelet and curvelet transform -- coefficients as features. As analysis methodology, taking account of image types, multiresolution transforms, and moments of coefficients in the scales or bands, we use correspondence analysis as well as k-nearest neighbors supervised classification.; Comment: Submitted to Pattern Recognition Letters

Hierarchical Graphical Models for Multigroup Shape Analysis using Expectation Maximization with Sampling in Kendall's Shape Space

Yu, Yen-Yun; Fletcher, P. Thomas; Awate, Suyash P.
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Português
Relevância na Pesquisa
37.081572%
This paper proposes a novel framework for multi-group shape analysis relying on a hierarchical graphical statistical model on shapes within a population.The framework represents individual shapes as point setsmodulo translation, rotation, and scale, following the notion in Kendall shape space.While individual shapes are derived from their group shape model, each group shape model is derived from a single population shape model. The hierarchical model follows the natural organization of population data and the top level in the hierarchy provides a common frame of reference for multigroup shape analysis, e.g. classification and hypothesis testing. Unlike typical shape-modeling approaches, the proposed model is a generative model that defines a joint distribution of object-boundary data and the shape-model variables. Furthermore, it naturally enforces optimal correspondences during the process of model fitting and thereby subsumes the so-called correspondence problem. The proposed inference scheme employs an expectation maximization (EM) algorithm that treats the individual and group shape variables as hidden random variables and integrates them out before estimating the parameters (population mean and variance and the group variances). The underpinning of the EM algorithm is the sampling of pointsets...

Comparing persistence diagrams through complex vectors

Di Fabio, Barbara; Ferri, Massimo
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Português
Relevância na Pesquisa
37.081572%
The natural pseudo-distance of spaces endowed with filtering functions is precious for shape classification and retrieval; its optimal estimate coming from persistence diagrams is the bottleneck distance, which unfortunately suffers from combinatorial explosion. A possible algebraic representation of persistence diagrams is offered by complex polynomials; since far polynomials represent far persistence diagrams, a fast comparison of the coefficient vectors can reduce the size of the database to be classified by the bottleneck distance. This article explores experimentally three transformations from diagrams to polynomials and three distances between the complex vectors of coefficients.; Comment: 11 pages, 4 figures, 2 tables

Maximum a posteriori estimation of piecewise arcs in tempo time-series

Stowell, Dan; Chew, Elaine
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 01/02/2013 Português
Relevância na Pesquisa
37.081572%
In musical performances with expressive tempo modulation, the tempo variation can be modelled as a sequence of tempo arcs. Previous authors have used this idea to estimate series of piecewise arc segments from data. In this paper we describe a probabilistic model for a time-series process of this nature, and use this to perform inference of single- and multi-level arc processes from data. We describe an efficient Viterbi-like process for MAP inference of arcs. Our approach is score-agnostic, and together with efficient inference allows for online analysis of performances including improvisations, and can predict immediate future tempo trajectories.; Comment: Submitted to postprint volume for Computer Music Modeling and Retrieval (CMMR) 2012

3D Pose Estimation from a Single Monocular Image

Yasin, Hashim; Iqbal, Umar; Krüger, Björn; Weber, Andreas; Gall, Juergen
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 22/09/2015 Português
Relevância na Pesquisa
37.081572%
One major challenge for 3D pose estimation from a single RGB image is the acquisition of sufficient training data. In particular, collecting large amounts of training data that contain unconstrained images and are annotated with accurate 3D poses is infeasible. We therefore propose to use two independent training sources. The first source consists of images with annotated 2D poses and the second source consists of accurate 3D motion capture data. To integrate both sources, we propose a dual-source approach that combines 2D pose estimation with efficient and robust 3D pose retrieval. In our experiments, we show that our approach achieves state-of-the-art results when both sources are from the same dataset, but it also achieves competitive results when the motion capture data is taken from a different dataset.

Multi-view Face Detection Using Deep Convolutional Neural Networks

Farfade, Sachin Sudhakar; Saberian, Mohammad; Li, Li-Jia
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Português
Relevância na Pesquisa
37.081572%
In this paper we consider the problem of multi-view face detection. While there has been significant research on this problem, current state-of-the-art approaches for this task require annotation of facial landmarks, e.g. TSM [25], or annotation of face poses [28, 22]. They also require training dozens of models to fully capture faces in all orientations, e.g. 22 models in HeadHunter method [22]. In this paper we propose Deep Dense Face Detector (DDFD), a method that does not require pose/landmark annotation and is able to detect faces in a wide range of orientations using a single model based on deep convolutional neural networks. The proposed method has minimal complexity; unlike other recent deep learning object detection methods [9], it does not require additional components such as segmentation, bounding-box regression, or SVM classifiers. Furthermore, we analyzed scores of the proposed face detector for faces in different orientations and found that 1) the proposed method is able to detect faces from different angles and can handle occlusion to some extent, 2) there seems to be a correlation between dis- tribution of positive examples in the training set and scores of the proposed face detector. The latter suggests that the proposed methods performance can be further improved by using better sampling strategies and more sophisticated data augmentation techniques. Evaluations on popular face detection benchmark datasets show that our single-model face detector algorithm has similar or better performance compared to the previous methods...

FPGA Based Assembling of Facial Components for Human Face Construction

Halder, Santanu; Bhattacharjee, Debotosh; Nasipuri, Mita; Basu, Dipak Kumar; Kundu, Mahantapas
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 30/06/2010 Português
Relevância na Pesquisa
37.081572%
This paper aims at VLSI realization for generation of a new face from textual description. The FASY (FAce SYnthesis) System is a Face Database Retrieval and new Face generation System that is under development. One of its main features is the generation of the requested face when it is not found in the existing database. The new face generation system works in three steps - searching phase, assembling phase and tuning phase. In this paper the tuning phase using hardware description language and its implementation in a Field Programmable Gate Array (FPGA) device is presented.

Face Synthesis (FASY) System for Determining the Characteristics of a Face Image

Halder, Santanu; Bhattacharjee, Debotosh; Nasipuri, Mita; Basu, Dipak Kumar; Kundu, Mahantapas
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 05/07/2010 Português
Relevância na Pesquisa
37.081572%
This paper aims at determining the characteristics of a face image by extracting its components. The FASY (FAce SYnthesis) System is a Face Database Retrieval and new Face generation System that is under development. One of its main features is the generation of the requested face when it is not found in the existing database, which allows a continuous growing of the database also. To generate the new face image, we need to store the face components in the database. So we have designed a new technique to extract the face components by a sophisticated method. After extraction of the facial feature points we have analyzed the components to determine their characteristics. After extraction and analysis we have stored the components along with their characteristics into the face database for later use during the face construction.

Learned versus Hand-Designed Feature Representations for 3d Agglomeration

Bogovic, John A.; Huang, Gary B.; Jain, Viren
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 20/12/2013 Português
Relevância na Pesquisa
37.081572%
For image recognition and labeling tasks, recent results suggest that machine learning methods that rely on manually specified feature representations may be outperformed by methods that automatically derive feature representations based on the data. Yet for problems that involve analysis of 3d objects, such as mesh segmentation, shape retrieval, or neuron fragment agglomeration, there remains a strong reliance on hand-designed feature descriptors. In this paper, we evaluate a large set of hand-designed 3d feature descriptors alongside features learned from the raw data using both end-to-end and unsupervised learning techniques, in the context of agglomeration of 3d neuron fragments. By combining unsupervised learning techniques with a novel dynamic pooling scheme, we show how pure learning-based methods are for the first time competitive with hand-designed 3d shape descriptors. We investigate data augmentation strategies for dramatically increasing the size of the training set, and show how combining both learned and hand-designed features leads to the highest accuracy.

Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos

Yeung, Serena; Russakovsky, Olga; Jin, Ning; Andriluka, Mykhaylo; Mori, Greg; Fei-Fei, Li
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Português
Relevância na Pesquisa
37.081572%
Every moment counts in action recognition. A comprehensive understanding of human activity in video requires labeling every frame according to the actions occurring, placing multiple labels densely over a video sequence. To study this problem we extend the existing THUMOS dataset and introduce MultiTHUMOS, a new dataset of dense labels over unconstrained internet videos. Modeling multiple, dense labels benefits from temporal relations within and across classes. We define a novel variant of long short-term memory (LSTM) deep networks for modeling these temporal relations via multiple input and output connections. We show that this model improves action labeling accuracy and further enables deeper understanding tasks ranging from structured retrieval to action prediction.

Understanding Intra-Class Knowledge Inside CNN

Wei, Donglai; Zhou, Bolei; Torrabla, Antonio; Freeman, William
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Português
Relevância na Pesquisa
37.081572%
Convolutional Neural Network (CNN) has been successful in image recognition tasks, and recent works shed lights on how CNN separates different classes with the learned inter-class knowledge through visualization. In this work, we instead visualize the intra-class knowledge inside CNN to better understand how an object class is represented in the fully-connected layers. To invert the intra-class knowledge into more interpretable images, we propose a non-parametric patch prior upon previous CNN visualization models. With it, we show how different "styles" of templates for an object class are organized by CNN in terms of location and content, and represented in a hierarchical and ensemble way. Moreover, such intra-class knowledge can be used in many interesting applications, e.g. style-based image retrieval and style-based object completion.; Comment: tech report for: http://vision03.csail.mit.edu/cnn_art/index.html

Group $K$-Means

Wang, Jianfeng; Yan, Shuicheng; Yang, Yi; Kankanhalli, Mohan S; Li, Shipeng; Wang, Jingdong
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 05/01/2015 Português
Relevância na Pesquisa
37.081572%
We study how to learn multiple dictionaries from a dataset, and approximate any data point by the sum of the codewords each chosen from the corresponding dictionary. Although theoretically low approximation errors can be achieved by the global solution, an effective solution has not been well studied in practice. To solve the problem, we propose a simple yet effective algorithm \textit{Group $K$-Means}. Specifically, we take each dictionary, or any two selected dictionaries, as a group of $K$-means cluster centers, and then deal with the approximation issue by minimizing the approximation errors. Besides, we propose a hierarchical initialization for such a non-convex problem. Experimental results well validate the effectiveness of the approach.; Comment: The developed algorithm is similar with "Christopher F. Barnes, A new multiple path search technique for residual vector quantizers, 1994", but we conduct the research independently and apply it in data/feature compression and image retrieval

Towards a solid solution of real-time fire and flame detection

Jiang, Bo; Lu, Yongyi; Li, Xiying; Lin, Liang
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 02/02/2015 Português
Relevância na Pesquisa
37.081572%
Although the object detection and recognition has received growing attention for decades, a robust fire and flame detection method is rarely explored. This paper presents an empirical study, towards a general and solid approach to fast detect fire and flame in videos, with the applications in video surveillance and event retrieval. Our system consists of three cascaded steps: (1) candidate regions proposing by a background model, (2) fire region classifying with color-texture features and a dictionary of visual words, and (3) temporal verifying. The experimental evaluation and analysis are done for each step. We believe that it is a useful service to both academic research and real-world application. In addition, we release the software of the proposed system with the source code, as well as a public benchmark and data set, including 64 video clips covered both indoor and outdoor scenes under different conditions. We achieve an 82% Recall with 93% Precision on the data set, and greatly improve the performance by state-of-the-arts methods.; Comment: 14 pages, 6 figures

An empirical study of software reuse by experts in object-oriented design

Burkhardt, Jean-Marie; Détienne, Françoise
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 01/02/2007 Português
Relevância na Pesquisa
37.081572%
This paper presents an empirical study of the software reuse activity by expert designers in the context of object-oriented design. Our study focuses on the three following aspects of reuse : (1) the interaction between some design processes, e.g. constructing a problem representation, searching for and evaluating solutions, and reuse processes, i.e. retrieving and using previous solutions, (2) the mental processes involved in reuse, e.g. example-based retrieval or bottom-up versus top-down expanding of the solution, and (3) the mental representations constructed throughout the reuse activity, e.g. dynamic versus static representations. Some implications of these results for the specification of software reuse support environments are discussed.

Video Text Localization with an emphasis on Edge Features

Shekar, B. H.; L., Smitha M.
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 22/02/2015 Português
Relevância na Pesquisa
37.081572%
The text detection and localization plays a major role in video analysis and understanding. The scene text embedded in video consist of high-level semantics and hence contributes significantly to visual content analysis and retrieval. This paper proposes a novel method to robustly localize the texts in natural scene images and videos based on sobel edge emphasizing approach. The input image is preprocessed and edge emphasis is done to detect the text clusters. Further, a set of rules have been devised using morphological operators for false positive elimination and connected component analysis is performed to detect the text regions and hence text localization is performed. The experimental results obtained on publicly available standard datasets illustrate that the proposed method can detect and localize the texts of various sizes, fonts and colors.; Comment: 8 pages, Eighth International Conference on Image and Signal Processing, Elsevier Publications, ISBN: 9789351072522, pp: 324-330, held at UVCE, Bangalore in July 2014. arXiv admin note: text overlap with arXiv:1502.03913

Image Tag Completion and Refinement by Subspace Clustering and Matrix Completion

Hou, Yuqing; Lin, Zhouchen
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 10/06/2015 Português
Relevância na Pesquisa
37.081572%
Tag-based image retrieval (TBIR) has drawn much attention in recent years due to the explosive amount of digital images and crowdsourcing tags. However, the TBIR applications still suffer from the deficient and inaccurate tags provided by users. Inspired by the subspace clustering methods, we formulate the tag completion problem in a subspace clustering model which assumes that images are sampled from subspaces, and complete the tags using the state-of-the-art Low Rank Representation (LRR) method. And we propose a matrix completion algorithm to further refine the tags. Our empirical results on multiple benchmark datasets for image annotation show that the proposed algorithm outperforms state-of-the-art approaches when handling missing and noisy tags.

Shape Characterization via Boundary Distortion

Descombes, Xavier; Komech, Serguei
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 24/02/2013 Português
Relevância na Pesquisa
37.081572%
In this paper, we derive new shape descriptors based on a directional characterization. The main idea is to study the behavior of the shape neighborhood under family of transformations. We obtain a description invariant with respect to rotation, reflection, translation and scaling. A well-defined metric is then proposed on the associated feature space. We show the continuity of this metric. Some results on shape retrieval are provided on two databases to show the accuracy of the proposed shape metric.; Comment: 14 pages, 5 figures

Entropy Computation of Document Images in Run-Length Compressed Domain

Nagabhushan, P.; Javed, Mohammed; Chaudhuri, B. B.
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 08/04/2014 Português
Relevância na Pesquisa
37.081572%
Compression of documents, images, audios and videos have been traditionally practiced to increase the efficiency of data storage and transfer. However, in order to process or carry out any analytical computations, decompression has become an unavoidable pre-requisite. In this research work, we have attempted to compute the entropy, which is an important document analytic directly from the compressed documents. We use Conventional Entropy Quantifier (CEQ) and Spatial Entropy Quantifiers (SEQ) for entropy computations [1]. The entropies obtained are useful in applications like establishing equivalence, word spotting and document retrieval. Experiments have been performed with all the data sets of [1], at character, word and line levels taking compressed documents in run-length compressed domain. The algorithms developed are computational and space efficient, and results obtained match 100% with the results reported in [1].; Comment: Published in IEEE Proceedings 2014 Fifth International Conference on Signals and Image Processing