Fork me on GitHub

Trending arXiv

Note: this version is tailored to @Smerity - though you can run your own! Trending arXiv may eventually be extended to multiple users ...

Papers


1 2 6 7 8 9 10 11 12 36 37

The challenge of realistic music generation: modelling raw audio at scale

Sander Dieleman, Aäron van den Oord, Karen Simonyan

Realistic music generation is a challenging task. When building generative models of music that are learnt from data, typically high-level representations such as scores or MIDI are used that abstract away the idiosyncrasies of a particular performance. But these nuances are very important for our perception of musicality and realism, so in this work we embark on modelling music in the raw audio domain. It has been shown that autoregressive models excel at generating raw audio waveforms of speech, but when applied to music, we find them biased towards capturing local signal structure at the expense of modelling long-range correlations. This is problematic because music exhibits structure at many different timescales. In this work, we explore autoregressive discrete autoencoders (ADAs) as a means to enable autoregressive models to capture long-range correlations in waveforms. We find that they allow us to unconditionally generate piano music directly in the raw audio domain, which shows stylistic consistency across tens of seconds.

Captured tweets and retweets: 1


The Physics of baking good Pizza

Andrey Varlamov, Andreas Glatz, Sergio Grasso

Physical principles are involved in almost any aspect of cooking. Here we analyze the specific process of baking pizzas, deriving in simple terms the baking times for two different situations: For a brick oven in a pizzeria and a modern metallic oven at home. Our study is based on basic thermodynamic principles relevant to the cooking process and is accessible to undergraduate students. We start with a historical overview of the development and art of pizza baking, illustrate the underlying physics by some simple common examples, and then apply them in detail to the example of baking pizza.

Captured tweets and retweets: 2


StructVAE: Tree-structured Latent Variable Models for Semi-supervised Semantic Parsing

Pengcheng Yin, Chunting Zhou, Junxian He, Graham Neubig

Semantic parsing is the task of transducing natural language (NL) utterances into formal meaning representations (MRs), commonly represented as tree structures. Annotating NL utterances with their corresponding MRs is expensive and time-consuming, and thus the limited availability of labeled data often becomes the bottleneck of data-driven, supervised models. We introduce StructVAE, a variational auto-encoding model for semisupervised semantic parsing, which learns both from limited amounts of parallel data, and readily-available unlabeled NL utterances. StructVAE models latent MRs not observed in the unlabeled data as tree-structured latent variables. Experiments on semantic parsing on the ATIS domain and Python code generation show that with extra unlabeled data, StructVAE outperforms strong supervised models.

Captured tweets and retweets: 2


A Probabilistic U-Net for Segmentation of Ambiguous Images

Simon A. A. Kohl, Bernardino Romera-Paredes, Clemens Meyer, Jeffrey De Fauw, Joseph R. Ledsam, Klaus H. Maier-Hein, S. M. Ali Eslami, Danilo Jimenez Rezende, Olaf Ronneberger

Many real-world vision problems suffer from inherent ambiguities. In clinical applications for example, it might not be clear from a CT scan alone which particular region is cancer tissue. Therefore a group of graders typically produces a set of diverse but plausible segmentations. We consider the task of learning a distribution over segmentations given an input. To this end we propose a generative segmentation model based on a combination of a U-Net with a conditional variational autoencoder that is capable of efficiently producing an unlimited number of plausible hypotheses. We show on a lung abnormalities segmentation task and on a Cityscapes segmentation task that our model reproduces the possible segmentation variants as well as the frequencies with which they occur, doing so significantly better than published approaches. These models could have a high impact in real-world applications, such as being used as clinical decision-making algorithms accounting for multiple plausible semantic segmentation hypotheses to provide possible diagnoses and recommend further actions to resolve the present ambiguities.

Captured tweets and retweets: 2


Opportunities in Machine Learning for Healthcare

Marzyeh Ghassemi, Tristan Naumann, Peter Schulam, Andrew L. Beam, Rajesh Ranganath

Healthcare is a natural arena for the application of machine learning, especially as modern electronic health records (EHRs) provide increasingly large amounts of data to answer clinically meaningful questions. However, clinical data and practice present unique challenges that complicate the use of common methodologies. This article serves as a primer on addressing these challenges and highlights opportunities for members of the machine learning and data science communities to contribute to this growing domain.

Captured tweets and retweets: 1


Learning a Latent Space of Multitrack Measures

Ian Simon, Adam Roberts, Colin Raffel, Jesse Engel, Curtis Hawthorne, Douglas Eck

Discovering and exploring the underlying structure of multi-instrumental music using learning-based approaches remains an open problem. We extend the recent MusicVAE model to represent multitrack polyphonic measures as vectors in a latent space. Our approach enables several useful operations such as generating plausible measures from scratch, interpolating between measures in a musically meaningful way, and manipulating specific musical attributes. We also introduce chord conditioning, which allows all of these operations to be performed while keeping harmony fixed, and allows chords to be changed while maintaining musical "style". By generating a sequence of measures over a predefined chord progression, our model can produce music with convincing long-term structure. We demonstrate that our latent space model makes it possible to intuitively control and generate musical sequences with rich instrumentation (see https://goo.gl/s2N7dV for generated audio).

Captured tweets and retweets: 2


Variational Measure Preserving Flows

Yichuan Zhang, Jose Miguel Hernandez-Lobato, Zoubin Ghahramani

Probabilistic modelling is a general and elegant framework to capture the uncertainty, ambiguity and diversity of hidden structures in data. Probabilistic inference is the key operation on probabilistic models to obtain the distribution over the latent representations given data. Unfortunately, the computation of inference on complex models is extremely challenging. In spite of the success of existing inference methods, like Markov chain Monte Carlo(MCMC) and variational inference(VI), many powerful models are not available for large scale problems because inference is simply computationally intractable. The recent advances in using neural networks for probabilistic inference have shown promising results on this challenge. In this work, we propose a novel general inference framework that has the strength from both MCMC and VI. The proposed method is not only computationally scalable and efficient, but also has its root from the ergodicity theorem, that provides the guarantee of better performance with more computational power. Our experiment results suggest that our method can outperform state-of-the-art methods on generative models and Bayesian neural networks on some popular benchmark problems.

Captured tweets and retweets: 2


Model Selection in Time Series Analysis: Using Information Criteria as an Alternative to Hypothesis Testing

R. Scott Hacker, Abdulnasser Hatemi-J

The issue of model selection in applied research is of vital importance. Since the true model in such research is not known, which model should be used from among various potential ones is an empirical question. There might exist several competitive models. A typical approach to dealing with this is classic hypothesis testing using an arbitrarily chosen significance level based on the underlying assumption that a true null hypothesis exists. In this paper we investigate how successful this approach is in determining the correct model for different data generating processes using time series data. An alternative approach based on more formal model selection techniques using an information criterion or cross-validation is suggested and evaluated in the time series environment via Monte Carlo experiments. This paper also explores the effectiveness of deciding what type of general relation exists between two variables (e.g. relation in levels or relation in first differences) using various strategies based on hypothesis testing and on information criteria with the presence or absence of unit roots.

Captured tweets and retweets: 2


Implicit Reparameterization Gradients

Michael Figurnov, Shakir Mohamed, Andriy Mnih

By providing a simple and efficient way of computing low-variance gradients of continuous random variables, the reparameterization trick has become the technique of choice for training a variety of latent variable models. However, it is not applicable to a number of important continuous distributions. We introduce an alternative approach to computing reparameterization gradients based on implicit differentiation and demonstrate its broader applicability by applying it to Gamma, Beta, Dirichlet, and von Mises distributions, which cannot be used with the classic reparameterization trick. Our experiments show that the proposed approach is faster and more accurate than the existing gradient estimators for these distributions.

Captured tweets and retweets: 2


Learning Permutations with Sinkhorn Policy Gradient

Patrick Emami, Sanjay Ranka

Many problems at the intersection of combinatorics and computer science require solving for a permutation that optimally matches, ranks, or sorts some data. These problems usually have a task-specific, often non-differentiable objective function that data-driven algorithms can use as a learning signal. In this paper, we propose the Sinkhorn Policy Gradient (SPG) algorithm for learning policies on permutation matrices. The actor-critic neural network architecture we introduce for SPG uniquely decouples representation learning of the state space from the highly-structured action space of permutations with a temperature-controlled Sinkhorn layer. The Sinkhorn layer produces continuous relaxations of permutation matrices so that the actor-critic architecture can be trained end-to-end. Our empirical results show that agents trained with SPG can perform competitively on sorting, the Euclidean TSP, and matching tasks. We also observe that SPG is significantly more data efficient at the matching task than the baseline methods, which indicates that SPG is conducive to learning representations that are useful for reasoning about permutations.

Captured tweets and retweets: 2


Neural Factor Graph Models for Cross-lingual Morphological Tagging

Chaitanya Malaviya, Matthew R. Gormley, Graham Neubig

Morphological analysis involves predicting the syntactic traits of a word (e.g. {POS: Noun, Case: Acc, Gender: Fem}). Previous work in morphological tagging improves performance for low-resource languages (LRLs) through cross-lingual training with a high-resource language (HRL) from the same family, but is limited by the strict, often false, assumption that tag sets exactly overlap between the HRL and LRL. In this paper we propose a method for cross-lingual morphological tagging that aims to improve information sharing between languages by relaxing this assumption. The proposed model uses factorial conditional random fields with neural network potentials, making it possible to (1) utilize the expressive power of neural network representations to smooth over superficial differences in the surface forms, (2) model pairwise and transitive relationships between tags, and (3) accurately generate tag sets that are unseen or rare in the training data. Experiments on four languages from the Universal Dependencies Treebank demonstrate superior tagging accuracies over existing cross-lingual approaches.

Captured tweets and retweets: 2


Learning to See in the Dark

Chen Chen, Qifeng Chen, Jia Xu, Vladlen Koltun

Imaging in low light is challenging due to low photon count and low SNR. Short-exposure images suffer from noise, while long exposure can induce blur and is often impractical. A variety of denoising, deblurring, and enhancement techniques have been proposed, but their effectiveness is limited in extreme conditions, such as video-rate imaging at night. To support the development of learning-based pipelines for low-light image processing, we introduce a dataset of raw short-exposure low-light images, with corresponding long-exposure reference images. Using the presented dataset, we develop a pipeline for processing low-light images, based on end-to-end training of a fully-convolutional network. The network operates directly on raw sensor data and replaces much of the traditional image processing pipeline, which tends to perform poorly on such data. We report promising results on the new dataset, analyze factors that affect performance, and highlight opportunities for future work. The results are shown in the supplementary video at https://youtu.be/qWKUFK7MWvg

Captured tweets and retweets: 4


AGI Safety Literature Review

Tom Everitt, Gary Lea, Marcus Hutter

The development of Artificial General Intelligence (AGI) promises to be a major event. Along with its many potential benefits, it also raises serious safety concerns (Bostrom, 2014). The intention of this paper is to provide an easily accessible and up-to-date collection of references for the emerging field of AGI safety. A significant number of safety problems for AGI have been identified. We list these, and survey recent research on solving them. We also cover works on how best to think of AGI from the limited knowledge we have today, predictions for when AGI will first be created, and what will happen after its creation. Finally, we review the current public policy on AGI.

Captured tweets and retweets: 2


Generative Temporal Models with Spatial Memory for Partially Observed Environments

Marco Fraccaro, Danilo Jimenez Rezende, Yori Zwols, Alexander Pritzel, S. M. Ali Eslami, Fabio Viola

In model-based reinforcement learning, generative and temporal models of environments can be leveraged to boost agent performance, either by tuning the agent's representations during training or via use as part of an explicit planning mechanism. However, their application in practice has been limited to simplistic environments, due to the difficulty of training such models in larger, potentially partially-observed and 3D environments. In this work we introduce a novel action-conditioned generative model of such challenging environments. The model features a non-parametric spatial memory system in which we store learned, disentangled representations of the environment. Low-dimensional spatial updates are computed using a state-space model that makes use of knowledge on the prior dynamics of the moving agent, and high-dimensional visual observations are modelled with a Variational Auto-Encoder. The result is a scalable architecture capable of performing coherent predictions over hundreds of time steps across a range of partially observed 2D and 3D environments.

Captured tweets and retweets: 2


Phrase-Based & Neural Unsupervised Machine Translation

Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoyer, Marc'Aurelio Ranzato

Machine translation systems achieve near human-level performance on some languages, yet their effectiveness strongly relies on the availability of large amounts of bitexts, which hinders their applicability to the majority of language pairs. This work investigates how to learn to translate when having access to only large monolingual corpora in each language. We propose two model variants, a neural and a phrase-based model. Both versions leverage automatic generation of parallel data by backtranslating with a backward model operating in the other direction, and the denoising effect of a language model trained on the target side. These models are significantly better than methods from the literature, while being simpler and having fewer hyper-parameters. On the widely used WMT14 English-French and WMT16 German-English benchmarks, our models respectively obtain 27.1 and 23.6 BLEU points without using a single parallel sentence, outperforming the state of the art by more than 11 BLEU points.

Captured tweets and retweets: 4


EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models

Hyrum S. Anderson, Phil Roth

This paper describes EMBER: a labeled benchmark dataset for training machine learning models to statically detect malicious Windows portable executable files. The dataset includes features extracted from 1.1M binary files: 900K training samples (300K malicious, 300K benign, 300K unlabeled) and 200K test samples (100K malicious, 100K benign). To accompany the dataset, we also release open source code for extracting features from additional binaries so that additional sample features can be appended to the dataset. This dataset fills a void in the information security machine learning community: a benign/malicious dataset that is large, open and general enough to cover several interesting use cases. We enumerate several use cases that we considered when structuring the dataset. Additionally, we demonstrate one use case wherein we compare a baseline gradient boosted decision tree model trained using LightGBM with default settings to MalConv, a recently published end-to-end (featureless) deep learning model for malware detection. Results show that even without hyper-parameter optimization, the baseline EMBER model outperforms MalConv. The authors hope that the dataset, code and baseline model provided by EMBER will help invigorate machine learning research for malware detection, in much the same way that benchmark datasets have advanced computer vision research.

Captured tweets and retweets: 1


Training a Ranking Function for Open-Domain Question Answering

Phu Mon Htut, Samuel R. Bowman, Kyunghyun Cho

In recent years, there have been amazing advances in deep learning methods for machine reading. In machine reading, the machine reader has to extract the answer from the given ground truth paragraph. Recently, the state-of-the-art machine reading models achieve human level performance in SQuAD which is a reading comprehension-style question answering (QA) task. The success of machine reading has inspired researchers to combine information retrieval with machine reading to tackle open-domain QA. However, these systems perform poorly compared to reading comprehension-style QA because it is difficult to retrieve the pieces of paragraphs that contain the answer to the question. In this study, we propose two neural network rankers that assign scores to different passages based on their likelihood of containing the answer to a given question. Additionally, we analyze the relative importance of semantic similarity and word level relevance matching in open-domain QA.

Captured tweets and retweets: 2


Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

Noam Shazeer, Mitchell Stern

In several recently proposed stochastic optimization methods (e.g. RMSProp, Adam, Adadelta), parameter updates are scaled by the inverse square roots of exponential moving averages of squared past gradients. Maintaining these per-parameter second-moment estimators requires memory equal to the number of parameters. For the case of neural network weight matrices, we propose maintaining only the per-row and per-column sums of these moving averages, and estimating the per-parameter second moments based on these sums. We demonstrate empirically that this method produces similar results to the baseline. Secondly, we show that adaptive methods can produce larger-than-desired updates when the decay rate of the second moment accumulator is too slow. We propose update clipping and a gradually increasing decay rate scheme as remedies. Combining these methods and dropping momentum, we achieve comparable results to the published Adam regime in training the Transformer model on the WMT 2014 English-German machine translation task, while using very little auxiliary storage in the optimizer. Finally, we propose scaling the parameter updates based on the scale of the parameters themselves.

Captured tweets and retweets: 2


Associative Compression Networks for Representation Learning

Alex Graves, Jacob Menick, Aaron van den Oord

This paper introduces Associative Compression Networks (ACNs), a new framework for variational autoencoding with neural networks. The system differs from existing variational autoencoders (VAEs) in that the prior distribution used to model each code is conditioned on a similar code from the dataset. In compression terms this equates to sequentially transmitting the dataset using an ordering determined by proximity in latent space. Since the prior need only account for local, rather than global variations in the latent space, the coding cost is greatly reduced, leading to rich, informative codes. Crucially, the codes remain informative when powerful, autoregressive decoders are used, which we argue is fundamentally difficult with normal VAEs. Experimental results on MNIST, CIFAR-10, ImageNet and CelebA show that ACNs discover high-level latent features such as object class, writing style, pose and facial expression, which can be used to cluster and classify the data, as well as to generate diverse and convincing samples. We conclude that ACNs are a promising new direction for representation learning: one that steps away from IID modelling, and towards learning a structured description of the dataset as a whole.

Captured tweets and retweets: 25


Copula Variational Bayes inference via information geometry

Viet Hung Tran

Variational Bayes (VB), also known as independent mean-field approximation, has become a popular method for Bayesian network inference in recent years. Its application is vast, e.g. in neural network, compressed sensing, clustering, etc. to name just a few. In this paper, the independence constraint in VB will be relaxed to a conditional constraint class, called copula in statistics. Since a joint probability distribution always belongs to a copula class, the novel copula VB (CVB) approximation is a generalized form of VB. Via information geometry, we will see that CVB algorithm iteratively projects the original joint distribution to a copula constraint space until it reaches a local minimum Kullback-Leibler (KL) divergence. By this way, all mean-field approximations, e.g. iterative VB, Expectation-Maximization (EM), Iterated Conditional Mode (ICM) and k-means algorithms, are special cases of CVB approximation. For a generic Bayesian network, an augmented hierarchy form of CVB will also be designed. While mean-field algorithms can only return a locally optimal approximation for a correlated network, the augmented CVB network, which is an optimally weighted average of a mixture of simpler network structures, can potentially achieve the globally optimal approximation for the first time. Via simulations of Gaussian mixture clustering, the classification's accuracy of CVB will be shown to be far superior to that of state-of-the-art VB, EM and k-means algorithms.

Captured tweets and retweets: 2


1 2 6 7 8 9 10 11 12 36 37