The fisherman caught a quantum fish. "Fisherman, please let me go", begged the fish, "and I will grant you three wishes". The fisherman agreed. The fish gave the fisherman a quantum computer, three quantum signing tokens and his classical public key. The fish explained: "to sign your three wishes, use the tokenized signature scheme on this quantum computer, then show your valid signature to the king, who owes me a favor". The fisherman used one of the signing tokens to sign the document "give me a castle!" and rushed to the palace. The king executed the classical verification algorithm using the fish's public key, and since it was valid, the king complied. The fisherman's wife wanted to sign ten wishes using their two remaining signing tokens. The fisherman did not want to cheat, and secretly sailed to meet the fish. "Fish, my wife wants to sign ten more wishes". But the fish was not worried: "I have learned quantum cryptography following the previous story (The Fisherman and His Wife by the brothers Grimm). The quantum tokens are consumed during the signing. Your polynomial wife cannot even sign four wishes using the three signing tokens I gave you". "How does it work?" wondered the fisherman. "Have you heard of quantum money? These are quantum states which can be easily verified but are hard to copy. This tokenized quantum signature scheme extends Aaronson and Christiano's quantum money scheme, which is why the signing tokens cannot be copied". "Does your scheme have additional fancy properties?" the fisherman asked. "Yes, the scheme has other security guarantees: revocability, testability and everlasting security. Furthermore, If you're at the sea and your quantum phone has only classical reception, you can use this scheme to transfer the value of the quantum money to shore", said the fish, and swam his way.
Captured tweets and retweets: 3
No Free Lunch theorems show that the average performance across any closed-under-permutation set of problems is fixed for all algorithms, under appropriate conditions. Extending these results, we demonstrate that the proportion of favorable problems is itself strictly bounded, such that no single algorithm can perform well over a large fraction of possible problems. Our results explain why we must either continue to develop new learning methods year after year or move towards highly parameterized models that are both flexible and sensitive to their hyperparameters.
Captured tweets and retweets: 1
Many current natural language processing applications for social media rely on representation learning and utilize pre-trained word embeddings. There currently exist several publicly-available, pre-trained sets of word embeddings, but they contain few or no emoji representations even as emoji usage in social media has increased. In this paper we release emoji2vec, pre-trained embeddings for all Unicode emoji which are learned from their description in the Unicode emoji standard. The resulting emoji embeddings can be readily used in downstream social natural language processing applications alongside word2vec. We demonstrate, for the downstream task of sentiment analysis, that emoji embeddings learned from short descriptions outperforms a skip-gram model trained on a large collection of tweets, while avoiding the need for contexts in which emoji need to appear frequently in order to estimate a representation.
Captured tweets and retweets: 10
This work explores hypernetworks: an approach of using a small network, also known as a hypernetwork, to generate the weights for a larger network. Hypernetworks provide an abstraction that is similar to what is found in nature: the relationship between a genotype - the hypernetwork - and a phenotype - the main network. Though they are also reminiscent of HyperNEAT in evolution, our hypernetworks are trained end-to-end with backpropagation and thus are usually faster. The focus of this work is to make hypernetworks useful for deep convolutional networks and long recurrent networks, where hypernetworks can be viewed as relaxed form of weight-sharing across layers. Our main result is that hypernetworks can generate non-shared weights for LSTM and achieve state-of-art results on a variety of language modeling tasks with Character-Level Penn Treebank and Hutter Prize Wikipedia datasets, challenging the weight-sharing paradigm for recurrent networks. Our results also show that hypernetworks applied to convolutional networks still achieve respectable results for image recognition tasks compared to state-of-the-art baseline models while requiring fewer learnable parameters.
Captured tweets and retweets: 6
Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. Unfortunately, NMT systems are known to be computationally expensive both in training and in translation inference. Also, most NMT systems have difficulty with rare words. These issues have hindered NMT's use in practical deployments and services, where both accuracy and speed are essential. In this work, we present GNMT, Google's Neural Machine Translation system, which attempts to address many of these issues. Our model consists of a deep LSTM network with 8 encoder and 8 decoder layers using attention and residual connections. To improve parallelism and therefore decrease training time, our attention mechanism connects the bottom layer of the decoder to the top layer of the encoder. To accelerate the final translation speed, we employ low-precision arithmetic during inference computations. To improve handling of rare words, we divide words into a limited set of common sub-word units ("wordpieces") for both input and output. This method provides a good balance between the flexibility of "character"-delimited models and the efficiency of "word"-delimited models, naturally handles translation of rare words, and ultimately improves the overall accuracy of the system. Our beam search technique employs a length-normalization procedure and uses a coverage penalty, which encourages generation of an output sentence that is most likely to cover all the words in the source sentence. On the WMT'14 English-to-French and English-to-German benchmarks, GNMT achieves competitive results to state-of-the-art. Using a human side-by-side evaluation on a set of isolated simple sentences, it reduces translation errors by an average of 60% compared to Google's phrase-based production system.
Captured tweets and retweets: 13
This paper introduces multiplicative LSTM, a novel hybrid recurrent neural network architecture for sequence modelling that combines the long short-term memory (LSTM) and multiplicative recurrent neural network architectures. Multiplicative LSTM is motivated by its flexibility to have very different recurrent transition functions for each possible input, which we argue helps make it more expressive in autoregressive density estimation. We show empirically that multiplicative LSTM outperforms standard LSTM and deep variants for a range of character level modelling tasks. We also found that this improvement increases as the complexity of the task scales up. This model achieves a validation error of 1.20 bits/character on the Hutter prize dataset when combined with dynamic evaluation.
Captured tweets and retweets: 2
Recent neural network sequence models with softmax classifiers have achieved their best language modeling performance only with very large hidden states and large vocabularies. Even then they struggle to predict rare or unseen words even if the context makes the prediction unambiguous. We introduce the pointer sentinel mixture architecture for neural sequence models which has the ability to either reproduce a word from the recent context or produce a word from a standard softmax classifier. Our pointer sentinel-LSTM model achieves state of the art language modeling performance on the Penn Treebank (70.9 perplexity) while using far fewer parameters than a standard softmax LSTM. In order to evaluate how well language models can exploit longer contexts and deal with more realistic vocabularies and larger corpora we also introduce the freely available WikiText corpus.
Captured tweets and retweets: 130
In this work we explore deep generative models of text in which the latent representation of a document is itself drawn from a discrete language model distribution. We formulate a variational auto-encoder for inference in this model and apply it to the task of compressing sentences. In this application the generative model first draws a latent summary sentence from a background language model, and then subsequently draws the observed sentence conditioned on this latent summary. In our empirical evaluation we show that generative formulations of both abstractive and extractive compression yield state-of-the-art results when trained on a large amount of supervised data. Further, we explore semi-supervised compression scenarios where we show that it is possible to achieve performance competitive with previously proposed supervised models while training on a fraction of the supervised data.
Captured tweets and retweets: 1
What does it mean for an algorithm to be fair? Different papers use different notions of algorithmic fairness, and although these appear internally consistent, they also seem mutually incompatible. We present a mathematical setting in which the distinctions in previous papers can be made formal. In addition to characterizing the spaces of inputs (the "observed" space) and outputs (the "decision" space), we introduce the notion of a construct space: a space that captures unobservable, but meaningful variables for the prediction. We show that in order to prove desirable properties of the entire decision-making process, different mechanisms for fairness require different assumptions about the nature of the mapping from construct space to decision space. The results in this paper imply that future treatments of algorithmic fairness should more explicitly state assumptions about the relationship between constructs and observations.
Captured tweets and retweets: 3
We present the Neural Photo Editor, an interface for exploring the latent space of generative image models and making large, semantically coherent changes to existing images. Our interface is powered by the Introspective Adversarial Network, a hybridization of the Generative Adversarial Network and the Variational Autoencoder designed for use in the editor. Our model makes use of a novel computational block based on dilated convolutions, and Orthogonal Regularization, a novel weight regularization method. We validate our model on CelebA, SVHN, and ImageNet, and produce samples and reconstructions with high visual fidelity.
Captured tweets and retweets: 104
Reinforcement learning (RL) can automate a wide variety of robotic skills, but learning each new skill requires considerable real-world data collection and manual representation engineering to design policy classes or features. Using deep reinforcement learning to train general purpose neural network policies alleviates some of the burden of manual representation engineering by using expressive policy classes, but exacerbates the challenge of data collection, since such methods tend to be less efficient than RL with low-dimensional, hand-designed representations. Transfer learning can mitigate this problem by enabling us to transfer information from one skill to another and even from one robot to another. We show that neural network policies can be decomposed into "task-specific" and "robot-specific" modules, where the task-specific modules are shared across robots, and the robot-specific modules are shared across all tasks on that robot. This allows for sharing task information, such as perception, between robots and sharing robot information, such as dynamics and kinematics, between tasks. We exploit this decomposition to train mix-and-match modules that can solve new robot-task combinations that were not seen during training. Using a novel neural network architecture, we demonstrate the effectiveness of our transfer method for enabling zero-shot generalization with a variety of robots and tasks in simulation for both visual and non-visual tasks.
Captured tweets and retweets: 1
In this note, we want to focus on aspects related to two questions most people asked us at CVPR about the network we presented. Firstly, What is the relationship between our proposed layer and the deconvolution layer? And secondly, why are convolutions in low-resolution (LR) space a better choice? These are key questions we tried to answer in the paper, but we were not able to go into as much depth and clarity as we would have liked in the space allowance. To better answer these questions in this note, we first discuss the relationships between the deconvolution layer in the forms of the transposed convolution layer, the sub-pixel convolutional layer and our efficient sub-pixel convolutional layer. We will refer to our efficient sub-pixel convolutional layer as a convolutional layer in LR space to distinguish it from the common sub-pixel convolutional layer. We will then show that for a fixed computational budget and complexity, a network with convolutions exclusively in LR space has more representation power at the same speed than a network that first upsamples the input in high resolution space.
Captured tweets and retweets: 2
We explore architectures for general pixel-level prediction problems, from low-level edge detection to mid-level surface normal estimation to high-level semantic segmentation. Convolutional predictors, such as the fully-convolutional network (FCN), have achieved remarkable success by exploiting the spatial redundancy of neighboring pixels through convolutional processing. Though computationally efficient, we point out that such approaches are not statistically efficient during learning precisely because spatial redundancy limits the information learned from neighboring pixels. We demonstrate that (1) stratified sampling allows us to add diversity during batch updates and (2) sampled multi-scale features allow us to explore more nonlinear predictors (multiple fully-connected layers followed by ReLU) that improve overall accuracy. Finally, our objective is to show how a architecture can get performance better than (or comparable to) the architectures designed for a particular task. Interestingly, our single architecture produces state-of-the-art results for semantic segmentation on PASCAL-Context, surface normal estimation on NYUDv2 dataset, and edge detection on BSDS without contextual post-processing.
Captured tweets and retweets: 20
Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computer vision and natural language processing. In this paper, we present a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image. The model is trained to maximize the likelihood of the target description sentence given the training image. Experiments on several datasets show the accuracy of the model and the fluency of the language it learns solely from image descriptions. Our model is often quite accurate, which we verify both qualitatively and quantitatively. Finally, given the recent surge of interest in this task, a competition was organized in 2015 using the newly released COCO dataset. We describe and analyze the various improvements we applied to our own baseline and show the resulting performance in the competition, which we won ex-aequo with a team from Microsoft Research, and provide an open source implementation in TensorFlow.
Captured tweets and retweets: 1
Reasoning and inference are central to human and artificial intelligence. Modeling inference in human language is notoriously challenging but is fundamental to natural language understanding and many applications. With the availability of large annotated data, neural network models have recently advanced the field significantly. In this paper, we present a new state-of-the-art result, achieving the accuracy of 88.3% on the standard benchmark, the Stanford Natural Language Inference dataset. This result is achieved first through our enhanced sequential encoding model, which outperforms the previous best model that employs more complicated network architectures, suggesting that the potential of sequential LSTM-based models have not been fully explored yet in previous work. We further show that by explicitly considering recursive architectures, we achieve additional improvement. Particularly, incorporating syntactic parse information contributes to our best result; it improves the performance even when the parse information is added to an already very strong system.
Captured tweets and retweets: 4
Recent discussion in the public sphere about algorithmic classification has involved tension between competing notions of what it means for a probabilistic classification to be fair to different groups. We formalize three fairness conditions that lie at the heart of these debates, and we prove that except in highly constrained special cases, there is no method that can satisfy these three conditions simultaneously. Moreover, even satisfying all three conditions approximately requires that the data lie in an approximate version of one of the constrained special cases identified by our theorem. These results suggest some of the ways in which key notions of fairness are incompatible with each other, and hence provide a framework for thinking about the trade-offs between them.
Captured tweets and retweets: 3
In this article, we take one step toward understanding the learning behavior of deep residual networks, and supporting the hypothesis that deep residual networks are exponential ensembles by construction. We examine the effective range of ensembles by introducing multi-residual networks that significantly improve classification accuracy of residual networks. The multi-residual networks increase the number of residual functions in the residual blocks. This is shown to improve the accuracy of the residual network when the network is deeper than a threshold. Based on a series of empirical studies on CIFAR-10 and CIFAR-100 datasets, the proposed multi-residual network yield $6\%$ and $10\%$ improvement with respect to the residual networks with identity mappings. Comparing with other state-of-the-art models, the proposed multi-residual network obtains a test error rate of $3.92\%$ on CIFAR-10 that outperforms all existing models.
Captured tweets and retweets: 15
Advances in deep reinforcement learning have allowed autonomous agents to perform well on Atari games, often outperforming humans, using only raw pixels to make their decisions. However, most of these games take place in 2D environments that are fully observable to the agent. In this paper, we present the first architecture to tackle 3D environments in first-person shooter games, that involve partially observable states. Typically, deep reinforcement learning methods only utilize visual input for training. We present a method to augment these models to exploit game feature information such as the presence of enemies or items, during the training phase. Our model is trained to simultaneously learn these features along with minimizing a Q-learning objective, which is shown to dramatically improve the training speed and performance of our agent. Our architecture is also modularized to allow different models to be independently trained for different phases of the game. We show that the proposed architecture substantially outperforms built-in AI agents of the game as well as humans in deathmatch scenarios.
Captured tweets and retweets: 4
As a new way of training generative models, Generative Adversarial Nets (GAN) that uses a discriminative model to guide the training of the generative model has enjoyed considerable success in generating real-valued data. However, it has limitations when the goal is for generating sequences of discrete tokens. A major reason lies in that the discrete outputs from the generative model make it difficult to pass the gradient update from the discriminative model to the generative model. Also, the discriminative model can only assess a complete sequence, while for a partially generated sequence, it is non-trivial to balance its current score and the future one once the entire sequence has been generated. In this paper, we propose a sequence generation framework, called SeqGAN, to solve the problems. Modeling the data generator as a stochastic policy in reinforcement learning (RL), SeqGAN bypasses the generator differentiation problem by directly performing gradient policy update. The RL reward signal comes from the GAN discriminator judged on a complete sequence, and is passed back to the intermediate state-action steps using Monte Carlo search. Extensive experiments on synthetic data and real-world tasks demonstrate significant improvements over strong baselines.
Captured tweets and retweets: 11
We prove that gradient descent efficiently converges to the global optimizer of the maximum likelihood objective of an unknown linear time-invariant dynamical system from a sequence of noisy observations generated by the system. Even though the objective function is non-convex, we provide polynomial running time and sample complexity bounds under strong but natural assumptions. Linear systems identification has been studied for many decades, yet, to the best of our knowledge, these are the first polynomial guarantees for the problem we consider.
Captured tweets and retweets: 12