# Trending arXiv

Note: this version is tailored to @Smerity - though you can run your own! Trending arXiv may eventually be extended to multiple users ...

### Papers

1 2 3 4 5 26 27

#### Not All Ops Are Created Equal!

Liangzhen Lai, Naveen Suda, Vikas Chandra

Efficient and compact neural network models are essential for enabling the deployment on mobile and embedded devices. In this work, we point out that typical design metrics for gauging the efficiency of neural network architectures -- total number of operations and parameters -- are not sufficient. These metrics may not accurately correlate with the actual deployment metrics such as energy and memory footprint. We show that throughput and energy varies by up to 5X across different neural network operation types on an off-the-shelf Arm Cortex-M7 microcontroller. Furthermore, we show that the memory required for activation data also need to be considered, apart from the model parameters, for network architecture exploration studies.

Captured tweets and retweets: 2

#### MINE: Mutual Information Neural Estimation

Ishmael Belghazi, Sai Rajeswar, Aristide Baratin, R Devon Hjelm, Aaron Courville

We argue that the estimation of the mutual information between high dimensional continuous random variables is achievable by gradient descent over neural networks. This paper presents a Mutual Information Neural Estimator (MINE) that is linearly scalable in dimensionality as well as in sample size. MINE is back-propable and we prove that it is strongly consistent. We illustrate a handful of applications in which MINE is succesfully applied to enhance the property of generative models in both unsupervised and supervised settings. We apply our framework to estimate the information bottleneck, and apply it in tasks related to supervised classification problems. Our results demonstrate substantial added flexibility and improvement in these settings.

Captured tweets and retweets: 2

#### Neural Program Synthesis with Priority Queue Training

Daniel A. Abolafia, Mohammad Norouzi, Quoc V. Le

We consider the task of program synthesis in the presence of a reward function over the output of programs, where the goal is to find programs with maximal rewards. We employ an iterative optimization scheme, where we train an RNN on a dataset of K best programs from a priority queue of the generated programs so far. Then, we synthesize new programs and add them to the priority queue by sampling from the RNN. We benchmark our algorithm, called priority queue training (or PQT), against genetic algorithm and reinforcement learning baselines on a simple but expressive Turing complete programming language called BF. Our experimental results show that our simple PQT algorithm significantly outperforms the baselines. By adding a program length penalty to the reward function, we are able to synthesize short, human readable programs.

Captured tweets and retweets: 2

#### Modeling urbanization patterns with generative adversarial networks

Adrian Albert, Emanuele Strano, Jasleen Kaur, Marta Gonzalez

In this study we propose a new method to simulate hyper-realistic urban patterns using Generative Adversarial Networks trained with a global urban land-use inventory. We generated a synthetic urban "universe" that qualitatively reproduces the complex spatial organization observed in global urban patterns, while being able to quantitatively recover certain key high-level urban spatial metrics.

Captured tweets and retweets: 1

#### SenseNet: 3D Objects Database and Tactile Simulator

Jason Toy

The majority of artificial intelligence research, as it relates from which to biological senses has been focused on vision. The recent explosion of machine learning and in particular, dee p learning, can be partially attributed to the release of high quality data sets for algorithm s from which to model the world on. Thus, most of these datasets are comprised of images. We believe that focusing on sensorimotor systems and tactile feedback will create algorithms that better mimic human intelligence. Here we present SenseNet: a collection of tactile simulators and a large scale dataset of 3D objects for manipulation. SenseNet was created for the purpose of researching and training Artificial Intelligences (AIs) to interact with the environment via sensorimotor neural systems and tactile feedback. We aim to accelerate that same explosion in image processing, but for the domain of tactile feedback and sensorimotor research. We hope that SenseNet can offer researchers in both the machine learning and computational neuroscience communities brand new opportunities and avenues to explore.

Captured tweets and retweets: 2

#### Recent Advances in Recurrent Neural Networks

Hojjat Salehinejad, Julianne Baarbe, Sharan Sankar, Joseph Barfett, Errol Colak, Shahrokh Valaee

Recurrent neural networks (RNNs) are capable of learning features and long term dependencies from sequential and time-series data. The RNNs have a stack of non-linear units where at least one connection between units forms a directed cycle. A well-trained RNN can model any dynamical system; however, training RNNs is mostly plagued by issues in learning long-term dependencies. In this paper, we present a survey on RNNs and several new advances for newcomers and professionals in the field. The fundamentals and recent advances are explained and the research challenges are introduced.

Captured tweets and retweets: 45

#### A Flexible Approach to Automated RNN Architecture Generation

Martin Schrimpf, Stephen Merity, James Bradbury, Richard Socher

The process of designing neural architectures requires expert knowledge and extensive trial and error. While automated architecture search may simplify these requirements, the recurrent neural network (RNN) architectures generated by existing methods are limited in both flexibility and components. We propose a domain-specific language (DSL) for use in automated architecture search which can produce novel RNNs of arbitrary depth and width. The DSL is flexible enough to define standard architectures such as the Gated Recurrent Unit and Long Short Term Memory and allows the introduction of non-standard RNN components such as trigonometric curves and layer normalization. Using two different candidate generation techniques, random search with a ranking function and reinforcement learning, we explore the novel architectures produced by the RNN DSL for language modeling and machine translation domains. The resulting architectures do not follow human intuition yet perform well on their targeted tasks, suggesting the space of usable RNN architectures is far larger than previously assumed.

Captured tweets and retweets: 1

#### Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, RJ Skerry-Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis, Yonghui Wu

This paper describes Tacotron 2, a neural network architecture for speech synthesis directly from text. The system is composed of a recurrent sequence-to-sequence feature prediction network that maps character embeddings to mel-scale spectrograms, followed by a modified WaveNet model acting as a vocoder to synthesize timedomain waveforms from those spectrograms. Our model achieves a mean opinion score (MOS) of 4.53 comparable to a MOS of 4.58 for professionally recorded speech. To validate our design choices, we present ablation studies of key components of our system and evaluate the impact of using mel spectrograms as the input to WaveNet instead of linguistic, duration, and F0 features. We further demonstrate that using a compact acoustic intermediate representation enables significant simplification of the WaveNet architecture.

Captured tweets and retweets: 2

#### Peephole: Predicting Network Performance Before Training

Boyang Deng, Junjie Yan, Dahua Lin

The quest for performant networks has been a significant force that drives the advancements of deep learning in recent years. While rewarding, improving network design has never been an easy journey. The large design space combined with the tremendous cost required for network training poses a major obstacle to this endeavor. In this work, we propose a new approach to this problem, namely, predicting the performance of a network before training, based on its architecture. Specifically, we develop a unified way to encode individual layers into vectors and bring them together to form an integrated description via LSTM. Taking advantage of the recurrent network's strong expressive power, this method can reliably predict the performances of various network architectures. Our empirical studies showed that it not only achieved accurate predictions but also produced consistent rankings across datasets -- a key desideratum in performance prediction.

Captured tweets and retweets: 24

#### Progressive Neural Architecture Search

Chenxi Liu, Barret Zoph, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan Yuille, Jonathan Huang, Kevin Murphy

We propose a method for learning CNN structures that is more efficient than previous approaches: instead of using reinforcement learning (RL) or genetic algorithms (GA), we use a sequential model-based optimization (SMBO) strategy, in which we search for architectures in order of increasing complexity, while simultaneously learning a surrogate function to guide the search, similar to A* search. On the CIFAR-10 dataset, our method finds a CNN structure with the same classification accuracy (3.41% error rate) as the RL method of Zoph et al. (2017), but 2 times faster (in terms of number of models evaluated). It also outperforms the GA method of Liu et al. (2017), which finds a model with worse performance (3.63% error rate), and takes 5 times longer. Finally we show that the model we learned on CIFAR also works well at the task of ImageNet classification. In particular, we match the state-of-the-art performance of 82.9% top-1 and 96.1% top-5 accuracy.

Captured tweets and retweets: 2

#### Deep Learning Scaling is Predictable, Empirically

Joel Hestness, Sharan Narang, Newsha Ardalani, Gregory Diamos, Heewoo Jun, Hassan Kianinejad, Md. Mostofa Ali Patwary, Yang Yang, Yanqi Zhou

Deep learning (DL) creates impactful advances following a virtuous recipe: model architecture search, creating large training data sets, and scaling computation. It is widely believed that growing training sets and models should improve accuracy and result in better products. As DL application domains grow, we would like a deeper understanding of the relationships between training set size, computational scale, and model accuracy improvements to advance the state-of-the-art. This paper presents a large scale empirical characterization of generalization error and model size growth as training sets grow. We introduce a methodology for this measurement and test four machine learning domains: machine translation, language modeling, image processing, and speech recognition. Our empirical results show power-law generalization error scaling across a breadth of factors, resulting in power-law exponents---the "steepness" of the learning curve---yet to be explained by theoretical work. Further, model improvements only shift the error but do not appear to affect the power-law exponent. We also show that model size scales sublinearly with data size. These scaling relationships have significant implications on deep learning research, practice, and systems. They can assist model debugging, setting accuracy targets, and decisions about data set growth. They can also guide computing system design and underscore the importance of continued computational scaling.

Captured tweets and retweets: 8

#### Wavenet based low rate speech coding

W. Bastiaan Kleijn, Felicia S. C. Lim, Alejandro Luebs, Jan Skoglund, Florian Stimberg, Quan Wang, Thomas C. Walters

Traditional parametric coding of speech facilitates low rate but provides poor reconstruction quality because of the inadequacy of the model used. We describe how a WaveNet generative speech model can be used to generate high quality speech from the bit stream of a standard parametric coder operating at 2.4 kb/s. We compare this parametric coder with a waveform coder based on the same generative model and show that approximating the signal waveform incurs a large rate penalty. Our experiments confirm the high performance of the WaveNet based coder and show that the speech produced by the system is able to additionally perform implicit bandwidth extension and does not significantly impair recognition of the original speaker for the human listener, even when that speaker has not been used during the training of the generative model.

Captured tweets and retweets: 2

#### Neural Text Generation: A Practical Guide

Ziang Xie

Deep learning methods have recently achieved great empirical success on machine translation, dialogue response generation, summarization, and other text generation tasks. At a high level, the technique has been to train end-to-end neural network models consisting of an encoder model to produce a hidden representation of the source text, followed by a decoder model to generate the target. While such models have significantly fewer pieces than earlier systems, significant tuning is still required to achieve good performance. For text generation models in particular, the decoder can behave in undesired ways, such as by generating truncated or repetitive outputs, outputting bland and generic responses, or in some cases producing ungrammatical gibberish. This paper is intended as a practical guide for resolving such undesired behavior in text generation models, with the aim of helping enable real-world applications.

Captured tweets and retweets: 3

#### Improving Factor-Based Quantitative Investing by Forecasting Company Fundamentals

John Alberg, Zachary C. Lipton

On a periodic basis, publicly traded companies are required to report fundamentals: financial data such as revenue, operating income, debt, among others. These data points provide some insight into the financial health of a company. Academic research has identified some factors, i.e. computed features of the reported data, that are known through retrospective analysis to outperform the market average. Two popular factors are the book value normalized by market capitalization (book-to-market) and the operating income normalized by the enterprise value (EBIT/EV). In this paper: we first show through simulation that if we could (clairvoyantly) select stocks using factors calculated on future fundamentals (via oracle), then our portfolios would far outperform a standard factor approach. Motivated by this analysis, we train deep neural networks to forecast future fundamentals based on a trailing 5-years window. Quantitative analysis demonstrates a significant improvement in MSE over a naive strategy. Moreover, in retrospective analysis using an industry-grade stock portfolio simulator (backtester), we show an improvement in compounded annual return to 17.1% (MLP) vs 14.4% for a standard factor model.

Captured tweets and retweets: 2

#### Learning Explanatory Rules from Noisy Data

Richard Evans, Edward Grefenstette

Artificial Neural Networks are powerful function approximators capable of modelling solutions to a wide variety of problems, both supervised and unsupervised. As their size and expressivity increases, so too does the variance of the model, yielding a nearly ubiquitous overfitting problem. Although mitigated by a variety of model regularisation methods, the common cure is to seek large amounts of training data---which is not necessarily easily obtained---that sufficiently approximates the data distribution of the domain we wish to test on. In contrast, logic programming methods such as Inductive Logic Programming offer an extremely data-efficient process by which models can be trained to reason on symbolic domains. However, these methods are unable to deal with the variety of domains neural networks can be applied to: they are not robust to noise in or mislabelling of inputs, and perhaps more importantly, cannot be applied to non-symbolic domains where the data is ambiguous, such as operating on raw pixels. In this paper, we propose a Differentiable Inductive Logic framework ($\partial$ILP), which can not only solve tasks which traditional ILP systems are suited for, but shows a robustness to noise and error in the training data which ILP cannot cope with. Furthermore, as it is trained by backpropagation against a likelihood objective, it can be hybridised by connecting it with neural networks over ambiguous data in order to be applied to domains which ILP cannot address, while providing data efficiency and generalisation beyond what neural networks on their own can achieve.

Captured tweets and retweets: 1

#### Unbounded cache model for online language modeling with open vocabulary

Edouard Grave, Moustapha Cisse, Armand Joulin

Recently, continuous cache models were proposed as extensions to recurrent neural network language models, to adapt their predictions to local changes in the data distribution. These models only capture the local context, of up to a few thousands tokens. In this paper, we propose an extension of continuous cache models, which can scale to larger contexts. In particular, we use a large scale non-parametric memory component that stores all the hidden activations seen in the past. We leverage recent advances in approximate nearest neighbor search and quantization algorithms to store millions of representations while searching them efficiently. We conduct extensive experiments showing that our approach significantly improves the perplexity of pre-trained language models on new distributions, and can scale efficiently to much larger contexts than previously proposed local cache models.

Captured tweets and retweets: 2

#### An Information-Theoretic Analysis of Deep Latent-Variable Models

Alexander A. Alemi, Ben Poole, Ian Fischer, Joshua V. Dillon, Rif A. Saurous, Kevin Murphy

We present an information-theoretic framework for understanding trade-offs in unsupervised learning of deep latent-variables models using variational inference. This framework emphasizes the need to consider latent-variable models along two dimensions: the ability to reconstruct inputs (distortion) and the communication cost (rate). We derive the optimal frontier of generative models in the two-dimensional rate-distortion plane, and show how the standard evidence lower bound objective is insufficient to select between points along this frontier. However, by performing targeted optimization to learn generative models with different rates, we are able to learn many models that can achieve similar generative performance but make vastly different trade-offs in terms of the usage of the latent variable. Through experiments on MNIST and Omniglot with a variety of architectures, we show how our framework sheds light on many recent proposed extensions to the variational autoencoder family.

Captured tweets and retweets: 1

#### Still not systematic after all these years: On the compositional skills of sequence-to-sequence recurrent networks

Brenden M. Lake, Marco Baroni

Humans can understand and produce new utterances effortlessly, thanks to their systematic compositional skills. Once a person learns the meaning of a new verb "dax," he or she can immediately understand the meaning of "dax twice" or "sing and dax." In this paper, we introduce the SCAN domain, consisting of a set of simple compositional navigation commands paired with the corresponding action sequences. We then test the zero-shot generalization capabilities of a variety of recurrent neural networks (RNNs) trained on SCAN with sequence-to-sequence methods. We find that RNNs can generalize well when the differences between training and test commands are small, so that they can apply "mix-and-match" strategies to solve the task. However, when generalization requires systematic compositional skills (as in the "dax" example above), RNNs fail spectacularly. We conclude with a proof-of-concept experiment in neural machine translation, supporting the conjecture that lack of systematicity is an important factor explaining why neural networks need very large training sets.

Captured tweets and retweets: 2

#### JSUT corpus: free large-scale Japanese speech corpus for end-to-end speech synthesis

Ryosuke Sonobe, Shinnosuke Takamichi, Hiroshi Saruwatari

Thanks to improvements in machine learning techniques including deep learning, a free large-scale speech corpus that can be shared between academic institutions and commercial companies has an important role. However, such a corpus for Japanese speech synthesis does not exist. In this paper, we designed a novel Japanese speech corpus, named the "JSUT corpus," that is aimed at achieving end-to-end speech synthesis. The corpus consists of 10 hours of reading-style speech data and its transcription and covers all of the main pronunciations of daily-use Japanese characters. In this paper, we describe how we designed and analyzed the corpus. The corpus is freely available online.

Captured tweets and retweets: 2

#### Deep Residual Learning for Small-Footprint Keyword Spotting

Raphael Tang, Jimmy Lin

We explore the application of deep residual learning and dilated convolutions to the keyword spotting task, using the recently-released Google Speech Commands Dataset as our benchmark. Our best residual network (ResNet) implementation significantly outperforms Google's previous convolutional neural networks in terms of accuracy. By varying model depth and width, we can achieve compact models that also outperform previous small-footprint variants. To our knowledge, we are the first to examine these approaches for keyword spotting, and our results establish an open-source state-of-the-art reference to support the development of future speech-based interfaces.

Captured tweets and retweets: 1

1 2 3 4 5 26 27