Fork me on GitHub

Trending arXiv

Note: this version is tailored to @Smerity - though you can run your own! Trending arXiv may eventually be extended to multiple users ...


1 2 7 8 9 10 11 12 13 28 29

Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks

Konstantinos Bousmalis, Nathan Silberman, David Dohan, Dumitru Erhan, Dilip Krishnan

Collecting well-annotated image datasets to train modern machine learning algorithms is prohibitively expensive for many tasks. One appealing alternative is rendering synthetic data where ground-truth annotations are generated automatically. Unfortunately, models trained purely on rendered images often fail to generalize to real images. To address this shortcoming, prior work introduced unsupervised domain adaptation algorithms that attempt to map representations between the two domains or learn to extract features that are domain-invariant. In this work, we present a new approach that learns, in an unsupervised manner, a transformation in the pixel space from one domain to the other. Our generative adversarial network (GAN)-based method adapts source-domain images to appear as if drawn from the target domain. Our approach not only produces plausible samples, but also outperforms the state-of-the-art on a number of unsupervised domain adaptation scenarios by large margins. Finally, we demonstrate that the adaptation process generalizes to object classes unseen during training.

Captured tweets and retweets: 2

Learning Residual Images for Face Attribute Manipulation

Wei Shen, Rujie Liu

Face attributes are interesting due to their detailed description of human faces. Unlike prior research working on attribute prediction, we address an inverse and more challenging problem called face attribute manipulation which aims at modifying a face image according to a given attribute value. In order to obtain an efficient representation for the manipulation, we propose to learn the corresponding residual image which is defined as the difference between images after and before the manipulation. Using the residual image, the manipulation can be operated efficiently with modest pixel modification. The framework of our approach is based on the Generative Adversarial Network. It consists of two image transformation networks imitating the attribute manipulation and its dual operation and a shared discriminative network distinguishing the generated images from different reference images. We also apply dual learning to allow the two transformation networks to learn from each other. Experiments show that the learned residual images successfully simulate the manipulations and the generated images retain most of the details in attribute-irrelevant areas.

Captured tweets and retweets: 7

Medical Image Synthesis with Context-Aware Generative Adversarial Networks

Dong Nie, Roger Trullo, Caroline Petitjean, Su Ruan, Dinggang Shen

Computed tomography (CT) is critical for various clinical applications, e.g., radiotherapy treatment planning and also PET attenuation correction. However, CT exposes radiation during acquisition, which may cause side effects to patients. Compared to CT, magnetic resonance imaging (MRI) is much safer and does not involve any radiations. Therefore, recently, researchers are greatly motivated to estimate CT image from its corresponding MR image of the same subject for the case of radiotherapy planning. In this paper, we propose a data-driven approach to address this challenging problem. Specifically, we train a fully convolutional network to generate CT given an MR image. To better model the nonlinear relationship from MRI to CT and to produce more realistic images, we propose to use the adversarial training strategy and an image gradient difference loss function. We further apply AutoContext Model to implement a context-aware generative adversarial network. Experimental results show that our method is accurate and robust for predicting CT images from MRI images, and also outperforms three state-of-the-art methods under comparison.

Captured tweets and retweets: 3

Bayesian Optimization for Machine Learning : A Practical Guidebook

Ian Dewancker, Michael McCourt, Scott Clark

The engineering of machine learning systems is still a nascent field; relying on a seemingly daunting collection of quickly evolving tools and best practices. It is our hope that this guidebook will serve as a useful resource for machine learning practitioners looking to take advantage of Bayesian optimization techniques. We outline four example machine learning problems that can be solved using open source machine learning libraries, and highlight the benefits of using Bayesian optimization in the context of these common machine learning applications.

Captured tweets and retweets: 2

Attentive Explanations: Justifying Decisions and Pointing to the Evidence

Dong Huk Park, Lisa Anne Hendricks, Zeynep Akata, Bernt Schiele, Trevor Darrell, Marcus Rohrbach

Deep models are the defacto standard in visual decision models due to their impressive performance on a wide array of visual tasks. However, they are frequently seen as opaque and are unable to explain their decisions. In contrast, humans can justify their decisions with natural language and point to the evidence in the visual world which led to their decisions. We postulate that deep models can do this as well and propose our Pointing and Justification (PJ-X) model which can justify its decision with a sentence and point to the evidence by introspecting its decision and explanation process using an attention mechanism. Unfortunately there is no dataset available with reference explanations for visual decision making. We thus collect two datasets in two domains where it is interesting and challenging to explain decisions. First, we extend the visual question answering task to not only provide an answer but also a natural language explanation for the answer. Second, we focus on explaining human activities which is traditionally more challenging than object classification. We extensively evaluate our PJ-X model, both on the justification and pointing tasks, by comparing it to prior models and ablations using both automatic and human evaluations.

Captured tweets and retweets: 2

Harmonic Networks: Deep Translation and Rotation Equivariance

Daniel E. Worrall, Stephan J. Garbin, Daniyar Turmukhambetov, Gabriel J. Brostow

Translating or rotating an input image should not affect the results of many computer vision tasks. Convolutional neural networks (CNNs) are already translation equivariant: input image translations produce proportionate feature map translations. This is not the case for rotations. Global rotation equivariance is typically sought through data augmentation, but patch-wise equivariance is more difficult. We present Harmonic Networks or H-Nets, a CNN exhibiting equivariance to patch-wise translation and 360-rotation. We achieve this by replacing regular CNN filters with circular harmonics, returning a maximal response and orientation for every receptive field patch. H-Nets use a rich, parameter-efficient and low computational complexity representation, and we show that deep feature maps within the network encode complicated rotational invariants. We demonstrate that our layers are general enough to be used in conjunction with the latest architectures and techniques, such as deep supervision and batch normalization. We also achieve state-of-the-art classification on rotated-MNIST, and competitive results on other benchmark challenges.

Captured tweets and retweets: 2

Improving Neural Language Models with a Continuous Cache

Edouard Grave, Armand Joulin, Nicolas Usunier

We propose an extension to neural network language models to adapt their prediction to the recent history. Our model is a simplified version of memory augmented networks, which stores past hidden activations as memory and accesses them through a dot product with the current hidden activation. This mechanism is very efficient and scales to very large memory sizes. We also draw a link between the use of external memory in neural network and cache models used with count based language models. We demonstrate on several language model datasets that our approach performs significantly better than recent memory augmented networks.

Captured tweets and retweets: 2

Finding Tiny Faces

Peiyun Hu, Deva Ramanan

Though tremendous strides have been made in object recognition, one of the remaining open challenges is detecting small objects. We explore three aspects of the problem in the context of finding small faces: the role of scale invariance, image resolution, and contextual reasoning. While most recognition approaches aim to be scale-invariant, the cues for recognizing a 3px tall face are fundamentally different than those for recognizing a 300px tall face. We take a different approach and train separate detectors for different scales. To maintain efficiency, detectors are trained in a multi-task fashion: they make use of features extracted from multiple layers of single (deep) feature hierarchy. While training detectors for large objects is straightforward, the crucial challenge remains training detectors for small objects. We show that context is crucial, and define templates that make use of massively-large receptive fields (where 99% of the template extends beyond the object of interest). Finally, we explore the role of scale in pre-trained deep networks, providing ways to extrapolate networks tuned for limited scales to rather extreme ranges. We demonstrate state-of-the-art results on massively-benchmarked face datasets (FDDB and WIDER FACE). In particular, when compared to prior art on WIDER FACE, our results reduce error by a factor of 2 (our models produce an AP of 81% while prior art ranges from 29-64%).

Captured tweets and retweets: 2

Automated Inference on Sociopsychological Impressions of Attractive Female Faces

Xiaolin Wu, Xi Zhang, Chang Liu

This article is a sequel to our earlier paper [24]. Our main objective is to explore the potential of supervised machine learning in face-induced social computing and cognition, riding on the momentum of much heralded successes of face processing, analysis and recognition on the tasks of biometric-based identification. We present a case study of automated statistical inference on sociopsychological perceptions of female faces controlled for race, attractiveness, age and nationality. Like in [24], our empirical evidences point to the possibility of teaching computer vision and machine learning algorithms, using example face images, to predict personality traits and behavioral predisposition.

Captured tweets and retweets: 1

Learning representations through stochastic gradient descent in cross-validation error

Richard S. Sutton, Vivek Veeriah

Representations are fundamental to Artificial Intelligence. Typically, the performance of a learning system depends on its data representation. These data representations are usually hand-engineered based on some prior domain knowledge regarding the task. More recently, the trend is to learn these representations through deep neural networks as these can produce dramatical performance improvements over hand-engineered data representations. In this paper, we present a new incremental learning algorithm, called crossprop, for learning representations based on prior learning experiences. Unlike backpropagation, crossprop minimizes the cross-validation error. Specifically, our algorithm considers the influences of all the past weights on the current squared error, and uses this gradient for incrementally learning the weights in a neural network. This idea is similar to that of tuning the learning system through an offline cross-validation procedure. Crossprop is applicable to incremental learning tasks, where a sequence of examples are encountered by the learning system and they need to be processed one by one and then discarded. The learning system can use each example only once and can spend only a limited amount of computation for an example. From our preliminary experiments, we concluce that crossprop is a promising alternative for backprop for representation learning.

Captured tweets and retweets: 1

Quantum autoencoders for efficient compression of quantum data

Jonathan Romero, Jonathan Olson, Alan Aspuru-Guzik

Classical autoencoders are neural networks that can learn efficient codings of large datasets. The task of an autoencoder is, given an input $x$, to simply reproduce $x$ at the output with the smallest possible error. For one class of autoencoders, the structure of the underlying network forces the autoencoder to represent the data on a smaller number of bits than the input length, effectively compressing the input. Inspired by this idea, we introduce the model of a quantum autoencoder to perform similar tasks on quantum data. The quantum autoencoder is trained to compress a particular dataset, where an otherwise efficient compression algorithm cannot be employed. The training of the quantum autoencoder is done using classical optimization algorithms. We show that efficient quantum autoencoders can be obtained using simple circuit heuristics. We apply our model to compress molecular wavefunctions with applications in quantum simulation of chemistry.

Captured tweets and retweets: 1

Learning in the Machine: Random Backpropagation and the Learning Channel

Pierre Baldi, Peter Sadowski, Zhiqin Lu

Random backpropagation (RBP) is a variant of the backpropagation algorithm for training neural networks, where the transpose of the forward matrices are replaced by fixed random matrices in the calculation of the weight updates. It is remarkable both because of its effectiveness, in spite of using random matrices to communicate error information, and because it completely removes the taxing requirement of maintaining symmetric weights in a physical neural system. To better understand random backpropagation, we first connect it to the notions of local learning and the learning channel. Through this connection, we derive several alternatives to RBP, including skipped RBP (SRPB), adaptive RBP (ARBP), sparse RBP, and their combinations (e.g. ASRBP) and analyze their computational complexity. We then study their behavior through simulations using the MNIST and CIFAR-10 bechnmark datasets. These simulations show that most of these variants work robustly, almost as well as backpropagation, and that multiplication by the derivatives of the activation functions is important. As a follow-up, we study also the low-end of the number of bits required to communicate error information over the learning channel. We then provide partial intuitive explanations for some of the remarkable properties of RBP and its variations. Finally, we prove several mathematical results, including the convergence to fixed points of linear chains of arbitrary length, the convergence to fixed points of linear autoencoders with decorrelated data, the long-term existence of solutions for linear systems with a single hidden layer, and the convergence to fixed points of non-linear chains, when the derivative of the activation functions is included.

Captured tweets and retweets: 1

Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning

Jiasen Lu, Caiming Xiong, Devi Parikh, Richard Socher

Attention-based neural encoder-decoder frameworks have been widely adopted for image captioning. Most methods force visual attention to be active for every generated word. However, the decoder likely requires little to no visual information from the image to predict non-visual words such as "the" and "of". Other words that may seem visual can often be predicted reliably just from the language model e.g., "sign" after "behind a red stop" or "phone" following "talking on a cell". In this paper, we propose a novel adaptive attention model with a visual sentinel. At each time step, our model decides whether to attend to the image (and if so, to which regions) or to the visual sentinel. The model decides whether to attend to the image and where, in order to extract meaningful information for sequential word generation. We test our method on the COCO image captioning 2015 challenge dataset and Flickr30K. Our approach sets the new state-of-the-art by a significant margin.

Captured tweets and retweets: 99

Condensed Memory Networks for Clinical Diagnostic Inferencing

Aaditya Prakash, Siyuan Zhao, Sadid A. Hasan, Vivek Datla, Kathy Lee, Ashequl Qadir, Joey Liu, Oladimeji Farri

Diagnosis of a clinical condition is a challenging task, which often requires significant medical investigation. Previous work related to diagnostic inferencing problems mostly consider multivariate observational data (e.g. physiological signals, lab tests etc.). In contrast, we explore the problem using free-text medical notes recorded in an electronic health record (EHR). Complex tasks like these can benefit from structured knowledge bases, but those are not scalable. We instead exploit raw text from Wikipedia as a knowledge source. Memory networks have been demonstrated to be effective in tasks which require comprehension of free-form text. They use the final iteration of the learned representation to predict probable classes. We introduce condensed memory neural networks (C-MemNNs), a novel model with iterative condensation of memory representations that preserves the hierarchy of features in the memory. Experiments on the MIMIC-III dataset show that the proposed model outperforms other variants of memory networks to predict the most probable diagnoses given a complex clinical scenario.

Captured tweets and retweets: 2

MarioQA: Answering Questions by Watching Gameplay Videos

Jonghwan Mun, Paul Hongsuck Seo, Ilchae Jung, Bohyung Han

We present a new benchmark dataset for video question answering (VideoQA) designed to evaluate algorithms' capability of spatio-temporal event understanding. Existing datasets either require very high-level reasoning from multi-modal information to find answers, or is mostly composed of the questions that can be answered by watching a single frame. Therefore, they are not suitable to evaluate models' real capacity and flexibility for VideoQA. To overcome such critical limitations, we focus on event-centric questions that require understanding temporal relation between multiple events in videos. An interesting idea in dataset construction process is that question-answer pairs are automatically generated from Super Mario video gameplays given a set of question templates. We also tackle VideoQA problem in the new dataset, referred to as MarioQA, by proposing spatio-temporal attention models based on deep neural networks. Our experiments show that the proposed deep neural network models with attention have meaningful performance improvement over several baselines.

Captured tweets and retweets: 2

DeepBach: a Steerable Model for Bach chorales generation

Gaëtan Hadjeres, François Pachet

The composition of polyphonic chorale music in the style of J.S Bach has represented a major challenge in automatic music composition over the last decades. The art of Bach chorales composition involves combining four-part harmony with characteristic rhythmic patterns and typical melodic movements to produce musical phrases which begin, evolve and end (cadences) in a harmonious way. To our knowledge, no model so far was able to solve all these problems simultaneously using an agnostic machine-learning approach. This paper introduces DeepBach, a statistical model aimed at modeling polyphonic music and specifically four parts, hymn-like pieces. We claim that, after being trained on the chorale harmonizations by Johann Sebastian Bach, our model is capable of generating highly convincing chorales in the style of Bach. We evaluate how indistinguishable our generated chorales are from existing Bach chorales with a listening test. The results corroborate our claim. A key strength of DeepBach is that it is agnostic and flexible. Users can constrain the generation by imposing some notes, rhythms or cadences in the generated score. This allows users to reharmonize user-defined melodies. DeepBach's generation is fast, making it usable for interactive music composition applications. Several generation examples are provided and discussed from a musical point of view.

Captured tweets and retweets: 44

Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering

Yash Goyal, Tejas Khot, Douglas Summers-Stay, Dhruv Batra, Devi Parikh

Problems at the intersection of vision and language are of significant importance both as challenging research questions and for the rich set of applications they enable. However, inherent structure in our world and bias in our language tend to be a simpler signal for learning than visual modalities, resulting in models that ignore visual information, leading to an inflated sense of their capability. We propose to counter these language priors for the task of Visual Question Answering (VQA) and make vision (the V in VQA) matter! Specifically, we balance the popular VQA dataset (Antol et al., ICCV 2015) by collecting complementary images such that every question in our balanced dataset is associated with not just a single image, but rather a pair of similar images that result in two different answers to the question. Our dataset is by construction more balanced than the original VQA dataset and has approximately twice the number of image-question pairs. Our complete balanced dataset will be publicly released as part of the 2nd iteration of the Visual Question Answering Challenge (VQA v2.0). We further benchmark a number of state-of-art VQA models on our balanced dataset. All models perform significantly worse on our balanced dataset, suggesting that these models have indeed learned to exploit language priors. This finding provides the first concrete empirical evidence for what seems to be a qualitative sense among practitioners. Finally, our data collection protocol for identifying complementary images enables us to develop a novel interpretable model, which in addition to providing an answer to the given (image, question) pair also provides a counter-example based explanation - specifically, it identifies an image that is similar to the original image, but it believes has a different answer to the same question. This can help in building trust for machines among their users.

Captured tweets and retweets: 2

Overcoming catastrophic forgetting in neural networks

James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, Raia Hadsell

The ability to learn tasks in a sequential fashion is crucial to the development of artificial intelligence. Neural networks are not, in general, capable of this and it has been widely thought that catastrophic forgetting is an inevitable feature of connectionist models. We show that it is possible to overcome this limitation and train networks that can maintain expertise on tasks which they have not experienced for a long time. Our approach remembers old tasks by selectively slowing down learning on the weights important for those tasks. We demonstrate our approach is scalable and effective by solving a set of classification tasks based on the MNIST hand written digit dataset and by learning several Atari 2600 games sequentially.

Captured tweets and retweets: 1

Learning to Generate Images of Outdoor Scenes from Attributes and Semantic Layouts

Levent Karacan, Zeynep Akata, Aykut Erdem, Erkut Erdem

Automatic image synthesis research has been rapidly growing with deep networks getting more and more expressive. In the last couple of years, we have observed images of digits, indoor scenes, birds, chairs, etc. being automatically generated. The expressive power of image generators have also been enhanced by introducing several forms of conditioning variables such as object names, sentences, bounding box and key-point locations. In this work, we propose a novel deep conditional generative adversarial network architecture that takes its strength from the semantic layout and scene attributes integrated as conditioning variables. We show that our architecture is able to generate realistic outdoor scene images under different conditions, e.g. day-night, sunny-foggy, with clear object boundaries.

Captured tweets and retweets: 23

Texture Enhancement via High-Resolution Style Transfer for Single-Image Super-Resolution

Il Jun Ahn, Woo Hyun Nam

Recently, various deep-neural-network (DNN)-based approaches have been proposed for single-image super-resolution (SISR). Despite their promising results on major structure regions such as edges and lines, they still suffer from limited performance on texture regions that consist of very complex and fine patterns. This is because, during the acquisition of a low-resolution (LR) image via down-sampling, these regions lose most of the high frequency information necessary to represent the texture details. In this paper, we present a novel texture enhancement framework for SISR to effectively improve the spatial resolution in the texture regions as well as edges and lines. We call our method, high-resolution (HR) style transfer algorithm. Our framework consists of three steps: (i) generate an initial HR image from an interpolated LR image via an SISR algorithm, (ii) generate an HR style image from the initial HR image via down-scaling and tiling, and (iii) combine the HR style image with the initial HR image via a customized style transfer algorithm. Here, the HR style image is obtained by down-scaling the initial HR image and then repetitively tiling it into an image of the same size as the HR image. This down-scaling and tiling process comes from the idea that texture regions are often composed of small regions that similar in appearance albeit sometimes different in scale. This process creates an HR style image that is rich in details, which can be used to restore high-frequency texture details back into the initial HR image via the style transfer algorithm. Experimental results on a number of texture datasets show that our proposed HR style transfer algorithm provides more visually pleasing results compared with competitive methods.

Captured tweets and retweets: 25

1 2 7 8 9 10 11 12 13 28 29