Rank |
Cited by |
Paper name |
0 | 265 | Large Scale GAN Training for High Fidelity Natural Image Synthesis |
1 | 210 | DARTS: Differentiable Architecture Search |
2 | 135 | GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding |
3 | 108 | Gradient Descent Provably Optimizes Over-parameterized Neural Networks |
4 | 90 | Robustness May Be at Odds with Accuracy |
5 | 86 | ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness |
6 | 85 | The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks |
7 | 81 | Diversity is All You Need: Learning Skills without a Reward Function |
8 | 81 | Large-Scale Study of Curiosity-Driven Learning |
9 | 79 | Evaluating Robustness of Neural Networks with Mixed Integer Programming |
10 | 72 | The relativistic discriminator: a key element missing from standard GAN |
11 | 72 | ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware |
12 | 68 | FFJORD: Free-Form Continuous Dynamics for Scalable Reversible Generative Models |
13 | 66 | Learning deep representations by mutual information estimation and maximization |
14 | 63 | Universal Transformers |
15 | 61 | Meta-Learning with Latent Embedding Optimization |
16 | 60 | ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech |
17 | 59 | How Powerful are Graph Neural Networks? |
18 | 53 | Learning a SAT Solver from Single-Bit Supervision |
19 | 50 | Rethinking the Value of Network Pruning |
20 | 50 | Exploration by random network distillation |
21 | 44 | Benchmarking Neural Network Robustness to Common Corruptions and Perturbations |
22 | 44 | SNAS: stochastic neural architecture search |
23 | 37 | Do Deep Generative Models Know What They Don’t Know? |
24 | 36 | What do you learn from context? Probing for sentence structure in contextualized word representations |
25 | 36 | Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet |
26 | 34 | A Universal Music Translation Network |
27 | 34 | Are adversarial examples inevitable? |
28 | 32 | A Variational Inequality Perspective on Generative Adversarial Networks |
29 | 30 | On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization |
30 | 29 | Pay Less Attention with Lightweight and Dynamic Convolutions |
31 | 29 | LEARNING TO PROPAGATE LABELS: TRANSDUCTIVE PROPAGATION NETWORK FOR FEW-SHOT LEARNING |
32 | 29 | Towards the first adversarially robust neural network model on MNIST |
33 | 28 | Deep Graph Infomax |
34 | 27 | Analyzing Inverse Problems with Invertible Neural Networks |
35 | 27 | A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks |
36 | 27 | Meta-learning with differentiable closed-form solvers |
37 | 26 | Decoupled Weight Decay Regularization |
38 | 26 | Local SGD Converges Fast and Communicates Little |
39 | 25 | Adversarial Audio Synthesis |
40 | 24 | Wizard of Wikipedia: Knowledge-Powered Conversational Agents |
41 | 24 | Adaptive Gradient Methods with Dynamic Bound of Learning Rate |
42 | 24 | Deep Convolutional Networks as shallow Gaussian Processes |
43 | 23 | Sample Efficient Adaptive Text-to-Speech |
44 | 23 | Training for Faster Adversarial Robustness Verification via Inducing ReLU Stability |
45 | 23 | Efficient Multi-Objective Neural Architecture Search via Lamarckian Evolution |
46 | 22 | Adaptive Input Representations for Neural Language Modeling |
47 | 22 | Deep Anomaly Detection with Outlier Exposure |
48 | 22 | Episodic Curiosity through Reachability |
49 | 22 | Differentiable Learning-to-Normalize via Switchable Normalization |
50 | 21 | Query-Efficient Hard-label Black-box Attack: An Optimization-based Approach |
51 | 20 | Hierarchical Generative Modeling for Controllable Speech Synthesis |
52 | 20 | Hyperbolic Attention Networks |
53 | 20 | Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks |
54 | 20 | Prior Convictions: Black-box Adversarial Attacks with Bandits and Priors |
55 | 19 | Lagging Inference Networks and Posterior Collapse in Variational Autoencoders |
56 | 19 | The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision |
57 | 19 | Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset |
58 | 18 | Attentive Neural Processes |
59 | 18 | Exemplar Guided Unsupervised Image-to-Image Translation with Semantic Consistency |
60 | 18 | Slimmable Neural Networks |
61 | 18 | Gradient descent aligns the layers of deep linear networks |
62 | 18 | How to train your MAML |
63 | 18 | Slalom: Fast, Verifiable and Private Execution of Neural Networks in Trusted Hardware |
64 | 18 | The Singular Values of Convolutional Layers |
65 | 18 | GENERATING HIGH FIDELITY IMAGES WITH SUBSCALE PIXEL NETWORKS AND MULTIDIMENSIONAL UPSCALING |
66 | 17 | Recurrent Experience Replay in Distributed Reinforcement Learning |
67 | 17 | SNIP: SINGLE-SHOT NETWORK PRUNING BASED ON CONNECTION SENSITIVITY |
68 | 17 | Unsupervised Learning via Meta-Learning |
69 | 17 | code2seq: Generating Sequences from Structured Representations of Code |
70 | 17 | GAN Dissection: Visualizing and Understanding Generative Adversarial Networks |
71 | 16 | Efficient Lifelong Learning with A-GEM |
72 | 16 | Deep Decoder: Concise Image Representations from Untrained Non-convolutional Networks |
73 | 15 | FlowQA: Grasping Flow in History for Conversational Machine Comprehension |
74 | 15 | Three Mechanisms of Weight Decay Regularization |
75 | 15 | Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile |
76 | 15 | Universal Stagewise Learning for Non-Convex Problems with Convergence on Averaged Solutions |
77 | 15 | A Mean Field Theory of Batch Normalization |
78 | 15 | Deep Online Learning Via Meta-Learning: Continual Adaptation for Model-Based RL |
79 | 15 | RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space |
80 | 15 | Self-Monitoring Navigation Agent via Auxiliary Progress Estimation |
81 | 15 | Time-Agnostic Prediction: Predicting Predictable Video Frames |
82 | 14 | Attention, Learn to Solve Routing Problems! |
83 | 14 | GamePad: A Learning Environment for Theorem Proving |
84 | 14 | Learning Factorized Multimodal Representations |
85 | 14 | Quaternion Recurrent Neural Networks |
86 | 14 | Fixup Initialization: Residual Learning Without Normalization |
87 | 14 | Meta-Learning Probabilistic Inference for Prediction |
88 | 14 | Graph HyperNetworks for Neural Architecture Search |
89 | 14 | Defensive Quantization: When Efficiency Meets Robustness |
90 | 14 | Adversarial Attacks on Graph Neural Networks via Meta Learning |
91 | 14 | Understanding and Improving Interpolation in Autoencoders via an Adversarial Regularizer |
92 | 14 | Non-vacuous Generalization Bounds at the ImageNet Scale: a PAC-Bayesian Compression Approach |
93 | 14 | Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes |
94 | 13 | Janossy Pooling: Learning Deep Permutation-Invariant Functions for Variable-Size Inputs |
95 | 13 | No Training Required: Exploring Random Encoders for Sentence Classification |
96 | 13 | DialogWAE: Multimodal Response Generation with Conditional Wasserstein Auto-Encoder |
97 | 13 | Critical Learning Periods in Deep Networks |
98 | 13 | Excessive Invariance Causes Adversarial Vulnerability |
99 | 13 | On Self Modulation for Generative Adversarial Networks |
100 | 13 | L2-Nonexpansive Neural Networks |
101 | 13 | Recall Traces: Backtracking Models for Efficient Reinforcement Learning |
102 | 13 | BA-Net: Dense Bundle Adjustment Networks |
103 | 12 | Temporal Difference Variational Auto-Encoder |
104 | 12 | Diagnosing and Enhancing VAE Models |
105 | 12 | TimbreTron: A WaveNet(CycleGAN(CQT(Audio))) Pipeline for Musical Timbre Transfer |
106 | 12 | Dynamically Unfolding Recurrent Restorer: A Moving Endpoint Control Method for Image Restoration |
107 | 12 | Identifying and Controlling Important Neurons in Neural Machine Translation |
108 | 12 | Aggregated Momentum: Stability Through Passive Damping |
109 | 12 | The role of over-parametrization in generalization of neural networks |
110 | 12 | There Are Many Consistent Explanations of Unlabeled Data: Why You Should Average |
111 | 12 | Small nonlinearities in activation functions create bad local minima in neural networks |
112 | 12 | Plan Online, Learn Offline: Efficient Learning and Exploration via Model-Based Control |
113 | 12 | Near-Optimal Representation Learning for Hierarchical Reinforcement Learning |
114 | 12 | Improving the Generalization of Adversarial Training with Domain Adaptation |
115 | 12 | Supervised Community Detection with Line Graph Neural Networks |
116 | 12 | Discriminator Rejection Sampling |
117 | 12 | GANSynth: Adversarial Neural Audio Synthesis |
118 | 11 | Doubly Reparameterized Gradient Estimators for Monte Carlo Objectives |
119 | 11 | Sliced Wasserstein Auto-Encoders |
120 | 11 | Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning |
121 | 11 | Unsupervised Hyper-alignment for Multilingual Word Embeddings |
122 | 11 | Riemannian Adaptive Optimization Methods |
123 | 11 | Hindsight policy gradients |
124 | 11 | CEM-RL: Combining evolutionary and gradient-based methods for policy search |
125 | 11 | Emergent Coordination Through Competition |
126 | 11 | Deterministic Variational Inference for Robust Bayesian Neural Networks |
127 | 11 | Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees |
128 | 11 | Multi-step Retriever-Reader Interaction for Scalable Open-domain Question Answering |
129 | 11 | Adv-BNN: Improved Adversarial Defense through Robust Bayesian Neural Network |
130 | 11 | Approximability of Discriminators Implies Diversity in GANs |
131 | 11 | Generative Code Modeling with Graphs |
132 | 11 | PeerNets: Exploiting Peer Wisdom Against Adversarial Attacks |
133 | 11 | Visual Semantic Navigation using Scene Priors |
134 | 11 | Residual Non-local Attention Networks for Image Restoration |
135 | 11 | Diffusion Scattering Transforms on Graphs |
136 | 10 | Unsupervised Control Through Non-Parametric Discriminative Rewards |
137 | 10 | FUNCTIONAL VARIATIONAL BAYESIAN NEURAL NETWORKS |
138 | 10 | Trellis Networks for Sequence Modeling |
139 | 10 | Poincare Glove: Hyperbolic Word Embeddings |
140 | 10 | Towards Understanding Regularization in Batch Normalization |
141 | 10 | Learning to Learn without Forgetting by Maximizing Transfer and Minimizing Interference |
142 | 10 | Reward Constrained Policy Optimization |
143 | 10 | L-Shapley and C-Shapley: Efficient Model Interpretation for Structured Data |
144 | 10 | Invariant and Equivariant Graph Networks |
145 | 10 | Improving Generalization and Stability of Generative Adversarial Networks |
146 | 10 | InstaGAN: Instance-aware Image-to-Image Translation |
147 | 9 | Stable Recurrent Models |
148 | 9 | From Language to Goals: Inverse Reinforcement Learning for Vision-Based Instruction Following |
149 | 9 | Learning to Represent Edits |
150 | 9 | Hierarchical interpretations for neural network predictions |
151 | 9 | Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality |
152 | 9 | Data-Dependent Coresets for Compressing Neural Networks with Applications to Generalization Bounds |
153 | 9 | Relaxed Quantization for Discretized Neural Networks |
154 | 9 | Optimal Completion Distillation for Sequence Learning |
155 | 9 | Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow |
156 | 9 | Adversarial Imitation via Variational Inverse Reinforcement Learning |
157 | 9 | Predict then Propagate: Graph Neural Networks meet Personalized PageRank |
158 | 9 | Learning to Infer and Execute 3D Shape Programs |
159 | 9 | Context-adaptive Entropy Model for End-to-end Optimized Image Compression |
160 | 9 | Biologically-Plausible Learning Algorithms Can Scale to Large Datasets |
161 | 9 | Model-Predictive Policy Learning with Uncertainty Regularization for Driving in Dense Traffic |
162 | 8 | Learning Particle Dynamics for Manipulating Rigid Bodies, Deformable Objects, and Fluids |
163 | 8 | Deep learning generalizes because the parameter-function map is biased towards simple functions |
164 | 8 | On the loss landscape of a class of deep neural networks with no bad local valleys |
165 | 8 | Stable Opponent Shaping in Differentiable Games |
166 | 8 | Detecting Egregious Responses in Neural Sequence-to-sequence Models |
167 | 8 | Differentiable Perturb-and-Parse: Semi-Supervised Parsing with a Structured Variational Autoencoder |
168 | 8 | ALISTA: Analytic Weights Are As Good As Learned Weights in LISTA |
169 | 8 | An analytic theory of generalization dynamics and transfer learning in deep linear networks |
170 | 8 | Automatically Composing Representation Transformations as a Means for Generalization |
171 | 8 | An Empirical Study of Example Forgetting during Deep Neural Network Learning |
172 | 8 | Hierarchical Visuomotor Control of Humanoids |
173 | 8 | Generalizable Adversarial Training via Spectral Normalization |
174 | 8 | Deep reinforcement learning with relational inductive biases |
175 | 8 | Structured Adversarial Attack: Towards General Implementation and Better Interpretability |
176 | 8 | Capsule Graph Neural Network |
177 | 8 | Whitening and Coloring Batch Transform for GANs |
178 | 8 | Dynamic Channel Pruning: Feature Boosting and Suppression |
179 | 8 | On the Minimal Supervision for Training Any Binary Classifier from Only Unlabeled Data |
180 | 8 | LanczosNet: Multi-Scale Deep Graph Convolutional Networks |
181 | 8 | Explaining Image Classifiers by Counterfactual Generation |
182 | 7 | Amortized Bayesian Meta-Learning |
183 | 7 | Preventing Posterior Collapse with delta-VAEs |
184 | 7 | Music Transformer: Generating Music with Long-Term Structure |
185 | 7 | AntisymmetricRNN: A Dynamical System View on Recurrent Neural Networks |
186 | 7 | Multilingual Neural Machine Translation with Knowledge Distillation |
187 | 7 | ProxQuant: Quantized Neural Networks via Proximal Operators |
188 | 7 | Regularized Learning for Domain Adaptation under Label Shifts |
189 | 7 | Deep Lagrangian Networks: Using Physics as Model Prior for Deep Learning |
190 | 7 | Predicting the Generalization Gap in Deep Networks with Margin Distributions |
191 | 7 | Selfless Sequential Learning |
192 | 7 | Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning |
193 | 7 | Neural Probabilistic Motor Primitives for Humanoid Control |
194 | 7 | Meta-Learning For Stochastic Gradient MCMC |
195 | 7 | Reasoning About Physical Interactions with Object-Oriented Prediction and Planning |
196 | 7 | Soft Q-Learning with Mutual-Information Regularization |
197 | 7 | DyRep: Learning Representations over Dynamic Graphs |
198 | 7 | Building Dynamic Knowledge Graphs from Text using Machine Reading Comprehension |
199 | 7 | The Unusual Effectiveness of Averaging in GAN Training |
200 | 7 | SOM-VAE: Interpretable Discrete Representation Learning on Time Series |
201 | 7 | Beyond Pixel Norm-Balls: Parametric Adversaries using an Analytically Differentiable Renderer |
202 | 7 | Spherical CNNs on Unstructured Grids |
203 | 7 | Disjoint Mapping Network for Cross-modal Matching of Voices and Faces |
204 | 7 | Generating Multiple Objects at Spatially Distinct Locations |
205 | 6 | Spreading vectors for similarity search |
206 | 6 | Learning Multimodal Graph-to-Graph Translation for Molecule Optimization |
207 | 6 | Structured Neural Summarization |
208 | 6 | Multiple-Attribute Text Rewriting |
209 | 6 | InfoBot: Transfer and Exploration via the Information Bottleneck |
210 | 6 | On the Universal Approximability and Complexity Bounds of Quantized ReLU Neural Networks |
211 | 6 | Learning Self-Imitating Diverse Policies |
212 | 6 | Stochastic Gradient/Mirror Descent: Minimax Optimality and Implicit Regularization |
213 | 6 | Fluctuation-dissipation relations for stochastic gradient descent |
214 | 6 | Efficient Training on Very Large Corpora via Gramian Estimation |
215 | 6 | ProMP: Proximal Meta-Policy Search |
216 | 6 | Analysing Mathematical Reasoning Abilities of Neural Models |
217 | 6 | Learning when to Communicate at Scale in Multiagent Cooperative and Competitive Tasks |
218 | 6 | Learning Multi-Level Hierarchies with Hindsight |
219 | 6 | A comprehensive, application-oriented study of catastrophic forgetting in DNNs |
220 | 6 | M^3RL: Mind-aware Multi-agent Management Reinforcement Learning |
221 | 6 | Neural Logic Machines |
222 | 6 | LayoutGAN: Generating Graphic Layouts with Wireframe Discriminators |
223 | 6 | SPIGAN: Privileged Adversarial Learning from Simulation |
224 | 6 | Verification of Non-Linear Specifications for Neural Networks |
225 | 6 | Learning Protein Structure with a Differentiable Simulator |
226 | 6 | ADef: an Iterative Algorithm to Construct Adversarial Deformations |
227 | 5 | Beyond Greedy Ranking: Slate Optimization via List-CVAE |
228 | 5 | Variance Networks: When Expectation Does Not Meet Your Expectations |
229 | 5 | Modeling Uncertainty with Hedged Instance Embeddings |
230 | 5 | How Important is a Neuron |
231 | 5 | Optimal Transport Maps For Distribution Preserving Operations on Latent Spaces of Generative Models |
232 | 5 | Spectral Inference Networks: Unifying Deep and Spectral Learning |
233 | 5 | Learning to Understand Goal Specifications by Modelling Reward |
234 | 5 | Imposing Category Trees Onto Word-Embeddings Using A Geometric Construction |
235 | 5 | Don’t Settle for Average, Go for the Max: Fuzzy Sets and Max-Pooled Word Vectors |
236 | 5 | Multilingual Neural Machine Translation With Soft Decoupled Encoding |
237 | 5 | Smoothing the Geometry of Probabilistic Box Embeddings |
238 | 5 | Tree-Structured Recurrent Switching Linear Dynamical Systems for Multi-Scale Modeling |
239 | 5 | Neural Speed Reading with Structural-Jump-LSTM |
240 | 5 | Coarse-grain Fine-grain Coattention Network for Multi-evidence Question Answering |
241 | 5 | Subgradient Descent Learns Orthogonal Dictionaries |
242 | 5 | Learning Two-layer Neural Networks with Symmetric Inputs |
243 | 5 | signSGD with Majority Vote is Communication Efficient and Fault Tolerant |
244 | 5 | SGD Converges to Global Minimum in Deep Learning via Star-convex Path |
245 | 5 | Deterministic PAC-Bayesian generalization bounds for deep networks via generalizing noise-resilience |
246 | 5 | Learning to Learn with Conditional Class Dependencies |
247 | 5 | Policy Transfer with Strategy Optimization |
248 | 5 | Learning to Schedule Communication in Multi-agent Reinforcement Learning |
249 | 5 | Measuring and regularizing networks in function space |
250 | 5 | Learning Exploration Policies for Navigation |
251 | 5 | Relational Forward Models for Multi-Agent Learning |
252 | 5 | Woulda, Coulda, Shoulda: Counterfactually-Guided Policy Search |
253 | 5 | KnockoffGAN: Generating Knockoffs for Feature Selection using Generative Adversarial Networks |
254 | 5 | Combinatorial Attacks on Binarized Neural Networks |
255 | 5 | Stochastic Optimization of Sorting Networks via Continuous Relaxations |
256 | 5 | On the Sensitivity of Adversarial Robustness to Input Data Distributions |
257 | 5 | RelGAN: Relational Generative Adversarial Networks for Text Generation |
258 | 5 | Learning Robust Representations by Projecting Superficial Statistics Out |
259 | 5 | PATE-GAN: Generating Synthetic Data with Differential Privacy Guarantees |
260 | 5 | Bayesian Prediction of Future Street Scenes using Synthetic Likelihoods |
261 | 5 | Multi-class classification without multi-class labels |
262 | 5 | DPSNet: End-to-end Deep Plane Sweep Stereo |
263 | 5 | Emerging Disentanglement in Auto-Encoder Based Unsupervised Image Content Transfer |
264 | 5 | Learning To Simulate |
265 | 5 | Visual Explanation by Interpretation: Improving Visual Feedback Capabilities of Deep Neural Networks |
266 | 5 | Random mesh projectors for inverse problems |
267 | 4 | MAE: Mutual Posterior-Divergence Regularization for Variational AutoEncoders |
268 | 4 | Meta-Learning Update Rules for Unsupervised Representation Learning |
269 | 4 | Bias-Reduced Uncertainty Estimation for Deep Neural Classifiers |
270 | 4 | The Deep Weight Prior |
271 | 4 | LEARNING FACTORIZED REPRESENTATIONS FOR OPEN-SET DOMAIN ADAPTATION |
272 | 4 | Neural Persistence: A Complexity Measure for Deep Neural Networks Using Algebraic Topology |
273 | 4 | Learning Neural PDE Solvers with Convergence Guarantees |
274 | 4 | Unsupervised Domain Adaptation for Distance Metric Learning |
275 | 4 | ARM: Augment-REINFORCE-Merge Gradient for Stochastic Binary Networks |
276 | 4 | Generative Question Answering: Learning to Answer the Whole Question |
277 | 4 | Stochastic Prediction of Multi-Agent Interactions from Partial Observations |
278 | 4 | Learning Programmatically Structured Representations with Perceptor Gradients |
279 | 4 | Global-to-local Memory Pointer Networks for Task-Oriented Dialogue |
280 | 4 | Multi-Agent Dual Learning |
281 | 4 | Von Mises-Fisher Loss for Training Sequence to Sequence Models with Continuous Outputs |
282 | 4 | Learning protein sequence embeddings using information from structure |
283 | 4 | Harmonic Unpaired Image-to-image Translation |
284 | 4 | Characterizing Audio Adversarial Examples Using Temporal Dependency |
285 | 4 | Systematic Generalization: What Is Required and Can It Be Learned? |
286 | 4 | On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step Length |
287 | 4 | Understanding Straight-Through Estimator in Training Activation Quantized Neural Nets |
288 | 4 | Caveats for information bottleneck in deterministic scenarios |
289 | 4 | A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation |
290 | 4 | AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods |
291 | 4 | Opportunistic Learning: Budgeted Cost-Sensitive Learning from Data Streams |
292 | 4 | Information-Directed Exploration for Deep Reinforcement Learning |
293 | 4 | Probabilistic Recursive Reasoning for Multi-Agent Reinforcement Learning |
294 | 4 | Variance Reduction for Reinforcement Learning in Input-Driven Environments |
295 | 4 | Directed-Info GAIL: Learning Hierarchical Policies from Unsegmented Demonstrations using Directed Information |
296 | 4 | Probabilistic Planning with Sequential Monte Carlo methods |
297 | 4 | The Limitations of Adversarial Training and the Blind-Spot Attack |
298 | 4 | Large Scale Graph Learning From Smooth Signals |
299 | 4 | Feature Intertwiner for Object Detection |
300 | 4 | StrokeNet: A Neural Painting Environment |
301 | 4 | Eidetic 3D LSTM: A Model for Video Prediction and Beyond |
302 | 4 | Learning Localized Generative Models for 3D Point Clouds via Graph Convolution |
303 | 4 | Generating Multi-Agent Trajectories using Programmatic Weak Supervision |
304 | 4 | Diversity-Sensitive Conditional Generative Adversarial Networks |
305 | 4 | A Unified Theory of Early Visual Representations from Retina to Cortex through Anatomically Constrained Deep CNNs |
306 | 3 | Efficiently testing local optimality and escaping saddles for ReLU networks |
307 | 3 | Bayesian Policy Optimization for Model Uncertainty |
308 | 3 | Variational Autoencoder with Arbitrary Conditioning |
309 | 3 | DHER: Hindsight Experience Replay for Dynamic Goals |
310 | 3 | Dimensionality Reduction for Representing the Knowledge of Probabilistic Models |
311 | 3 | Learning-Based Frequency Estimation Algorithms |
312 | 3 | Practical lossless compression with latent variables using bits back coding |
313 | 3 | Label super-resolution networks |
314 | 3 | Measuring Compositionality in Representation Learning |
315 | 3 | Distribution-Interpolation Trade off in Generative Models |
316 | 3 | Improving Sequence-to-Sequence Learning via Optimal Transport |
317 | 3 | Guiding Policies with Language via Meta-Learning |
318 | 3 | Learning to Design RNA |
319 | 3 | Learning what and where to attend |
320 | 3 | Unsupervised Speech Recognition via Segmental Empirical Output Distribution Matching |
321 | 3 | Learning Finite State Representations of Recurrent Policy Networks |
322 | 3 | Backpropamine: training self-modifying neural networks with differentiable neuromodulated plasticity |
323 | 3 | Theoretical Analysis of Auto Rate-Tuning by Batch Normalization |
324 | 3 | Towards Robust, Locally Linear Deep Networks |
325 | 3 | Quasi-hyperbolic momentum and Adam for deep learning |
326 | 3 | Double Viterbi: Weight Encoding for High Compression Ratio and Fast On-Chip Reconstruction for Deep Neural Network |
327 | 3 | Optimal Control Via Neural Networks: A Convex Approach |
328 | 3 | ANYTIME MINIBATCH: EXPLOITING STRAGGLERS IN ONLINE DISTRIBUTED OPTIMIZATION |
329 | 3 | A2BCD: Asynchronous Acceleration with Optimal Complexity |
330 | 3 | Learning to Make Analogies by Contrasting Abstract Relational Structure |
331 | 3 | Adaptive Posterior Learning: few-shot learning with a surprise-based memory module |
332 | 3 | AutoLoss: Learning Discrete Schedule for Alternate Optimization |
333 | 3 | Universal Successor Features Approximators |
334 | 3 | Information asymmetry in KL-regularized RL |
335 | 3 | Learning Actionable Representations with Goal Conditioned Policies |
336 | 3 | Two-Timescale Networks for Nonlinear Value Function Approximation |
337 | 3 | Sample Efficient Imitation Learning for Continuous Control |
338 | 3 | A Direct Approach to Robust Deep Learning Using Adversarial Networks |
339 | 3 | Scalable Unbalanced Optimal Transport using Generative Adversarial Networks |
340 | 3 | Graph Wavelet Neural Network |
341 | 3 | CAMOU: Learning Physical Vehicle Camouflages to Adversarially Attack Detectors in the Wild |
342 | 3 | Learnable Embedding Space for Efficient Neural Architecture Compression |
343 | 3 | Learning To Solve Circuit-SAT: An Unsupervised Differentiable Approach |
344 | 3 | Towards GAN Benchmarks Which Require Generalization |
345 | 3 | Learning Mixed-Curvature Representations in Product Spaces |
346 | 3 | Augmented Cyclic Adversarial Learning for Low Resource Domain Adaptation |
347 | 3 | Bounce and Learn: Modeling Scene Dynamics with Real-World Bounces |
348 | 3 | Learning to Describe Scenes with Programs |
349 | 3 | DELTA: DEEP LEARNING TRANSFER USING FEATURE MAP WITH ATTENTION FOR CONVOLUTIONAL NETWORKS |
350 | 3 | Mode Normalization |
351 | 3 | A rotation-equivariant convolutional neural network model of primary visual cortex |
352 | 3 | Big-Little Net: An Efficient Multi-Scale Feature Representation for Visual and Speech Recognition |
353 | 3 | Human-level Protein Localization with Convolutional Neural Networks |
354 | 3 | Value Propagation Networks |
355 | 3 | Diversity and Depth in Per-Example Routing Models |
356 | 2 | Accumulation Bit-Width Scaling For Ultra-Low Precision Training Of Deep Networks |
357 | 2 | Kernel Change-point Detection with Auxiliary Deep Generative Models |
358 | 2 | Learning a Meta-Solver for Syntax-Guided Program Synthesis |
359 | 2 | On the Turing Completeness of Modern Neural Network Architectures |
360 | 2 | Active Learning with Partial Feedback |
361 | 2 | Toward Understanding the Impact of Staleness in Distributed Machine Learning |
362 | 2 | Feature-Wise Bias Amplification |
363 | 2 | Transferring Knowledge across Learning Processes |
364 | 2 | Deep, Skinny Neural Networks are not Universal Approximators |
365 | 2 | Interpolation-Prediction Networks for Irregularly Sampled Time Series |
366 | 2 | Learning Representations of Sets through Optimized Permutations |
367 | 2 | Variational Bayesian Phylogenetic Inference |
368 | 2 | BabyAI: A Platform to Study the Sample Efficiency of Grounded Language Learning |
369 | 2 | Learning to Screen for Fast Softmax Inference on Large Vocabulary Neural Networks |
370 | 2 | Posterior Attention Models for Sequence to Sequence Learning |
371 | 2 | Learning Implicitly Recurrent CNNs Through Parameter Sharing |
372 | 2 | A Generative Model For Electron Paths |
373 | 2 | RNNs implicitly implement tensor-product representations |
374 | 2 | h-detach: Modifying the LSTM Gradient Towards Better Optimization |
375 | 2 | Adaptive Estimators Show Information Compression in Deep Neural Networks |
376 | 2 | Learning sparse relational transition models |
377 | 2 | From Hard to Soft: Understanding Deep Network Nonlinearities via Vector Quantization and Statistical Inference |
378 | 2 | ACCELERATING NONCONVEX LEARNING VIA REPLICA EXCHANGE LANGEVIN DIFFUSION |
379 | 2 | Per-Tensor Fixed-Point Quantization of the Back-Propagation Algorithm |
380 | 2 | Deep Layers as Stochastic Solvers |
381 | 2 | Energy-Constrained Compression for Deep Neural Networks via Weighted Sparse Projection and Layer Input Masking |
382 | 2 | An Empirical study of Binary Neural Networks’ Optimisation |
383 | 2 | Deep Frank-Wolfe For Neural Network Optimization |
384 | 2 | G-SGD: Optimizing ReLU Neural Networks in its Positively Scale-Invariant Space |
385 | 2 | Contingency-Aware Exploration in Reinforcement Learning |
386 | 2 | Hierarchical Reinforcement Learning via Advantage-Weighted Information Maximization |
387 | 2 | The Laplacian in RL: Learning Representations with Efficient Approximations |
388 | 2 | Execution-Guided Neural Program Synthesis |
389 | 2 | Marginal Policy Gradients: A Unified Family of Estimators for Bounded Action Spaces with Applications |
390 | 2 | Modeling the Long Term Future in Model-Based Reinforcement Learning |
391 | 2 | Environment Probing Interaction Policies |
392 | 2 | Neural Program Repair by Jointly Learning to Localize and Repair |
393 | 2 | DISTRIBUTIONAL CONCAVITY REGULARIZATION FOR GANS |
394 | 2 | Dynamic Sparse Graph for Efficient Deep Learning |
395 | 2 | A Statistical Approach to Assessing Neural Network Robustness |
396 | 2 | Multi-Domain Adversarial Learning |
397 | 2 | Conditional Network Embeddings |
398 | 2 | signSGD via Zeroth-Order Oracle |
399 | 2 | Robust Conditional Generative Adversarial Networks |
400 | 2 | Deep Learning 3D Shapes Using Alt-az Anisotropic 2-Sphere Convolution |
401 | 2 | AD-VAT: An Asymmetric Dueling mechanism for learning Visual Active Tracking |
402 | 2 | Latent Convolutional Models |
403 | 2 | LeMoNADe: Learned Motif and Neuronal Assembly Detection in calcium imaging videos |
404 | 2 | STCN: Stochastic Temporal Convolutional Networks |
405 | 2 | Improving MMD-GAN Training with Repulsive Loss Function |
406 | 1 | Wasserstein Barycenter Model Ensembling |
407 | 1 | Learning Grid Cells as Vector Representation of Self-Position Coupled with Matrix Representation of Self-Motion |
408 | 1 | Generating Liquid Simulations with Deformation-aware Neural Networks |
409 | 1 | Efficient Augmentation via Data Subsampling |
410 | 1 | Generative predecessor models for sample-efficient imitation learning |
411 | 1 | Auxiliary Variational MCMC |
412 | 1 | Variational Autoencoders with Jointly Optimized Latent Dependency Structure |
413 | 1 | Function Space Particle Optimization for Bayesian Neural Networks |
414 | 1 | MARGINALIZED AVERAGE ATTENTIONAL NETWORK FOR WEAKLY-SUPERVISED LEARNING |
415 | 1 | Neural TTS Stylization with Adversarial and Collaborative Games |
416 | 1 | Representation Degeneration Problem in Training Natural Language Generation Models |
417 | 1 | Transfer Learning for Sequences via Learning to Collocate |
418 | 1 | Understanding Composition of Word Embeddings via Tensor Decomposition |
419 | 1 | Complement Objective Training |
420 | 1 | DOM-Q-NET: Grounded RL on Structured Language |
421 | 1 | Generalized Tensor Models for Recurrent Neural Networks |
422 | 1 | textTOvec: DEEP CONTEXTUALIZED NEURAL AUTOREGRESSIVE TOPIC MODELS OF LANGUAGE WITH DISTRIBUTED COMPOSITIONAL PRIOR |
423 | 1 | Kernel RNN Learning (KeRNL) |
424 | 1 | Minimum Divergence vs. Maximum Margin: an Empirical Comparison on Seq2Seq Models |
425 | 1 | Analysis of Quantized Models |
426 | 1 | A Kernel Random Matrix-Based Approach for Sparse PCA |
427 | 1 | DeepOBS: A Deep Learning Optimizer Benchmark Suite |
428 | 1 | Learning concise representations for regression by evolving networks of trees |
429 | 1 | Initialized Equilibrium Propagation for Backprop-Free Training |
430 | 1 | Minimal Random Code Learning: Getting Bits Back from Compressed Model Parameters |
431 | 1 | Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions |
432 | 1 | On Random Deep Weight-Tied Autoencoders: Exact Asymptotic Analysis, Phase Transitions, and Implications to Training |
433 | 1 | Overcoming Catastrophic Forgetting for Continual Learning via Model Adaptation |
434 | 1 | Preferences Implicit in the State of the World |
435 | 1 | Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies |
436 | 1 | Solving the Rubik’s Cube with Approximate Policy Iteration |
437 | 1 | Towards Metamerism via Foveated Style Transfer |
438 | 1 | Learning to Navigate the Web |
439 | 1 | Knowledge Flow: Improve Upon Your Teachers |
440 | 1 | Rigorous Agent Evaluation: An Adversarial Approach to Uncover Catastrophic Failures |
441 | 1 | Cost-Sensitive Robustness against Adversarial Examples |
442 | 1 | MisGAN: Learning from Incomplete Data with Generative Adversarial Networks |
443 | 1 | Don’t let your Discriminator be fooled |
444 | 1 | Learning to Remember More with Less Memorization |
445 | 1 | Boosting Robustness Certification of Neural Networks |
446 | 1 | Post Selection Inference with Incomplete Maximum Mean Discrepancy Estimator |
447 | 1 | RotDCF: Decomposition of Convolutional Filters for Rotation-Equivariant Deep Networks |
448 | 1 | ProbGAN: Towards Probabilistic GAN with Theoretical Guarantees |
449 | 1 | K for the Price of 1: Parameter-efficient Multi-task and Transfer Learning |
450 | 1 | Neural network gradient-based learning of black-box function interfaces |
451 | 1 | Adversarial Domain Adaptation for Stable Brain-Machine Interfaces |
452 | 1 | Unsupervised Adversarial Image Reconstruction |
453 | 0 | Unsupervised Learning of the Set of Local Maxima |
454 | 0 | Feed-forward Propagation in Probabilistic Neural Networks with Categorical and Max Layers |
455 | 0 | Maximal Divergence Sequential Autoencoder for Binary Software Vulnerability Detection |
456 | 0 | Learning Procedural Abstractions and Evaluating Discrete Latent Temporal Structure |
457 | 0 | Information Theoretic lower bounds on negative log likelihood |
458 | 0 | Learning from Positive and Unlabeled Data with a Selection Bias |
459 | 0 | Integer Networks for Data Compression with Latent-Variable Models |
460 | 0 | Improving Differentiable Neural Computers Through Memory Masking, De-allocation, and Link Distribution Sharpness Control |
461 | 0 | A Max-Affine Spline Perspective of Recurrent Neural Networks |
462 | 0 | Discovery of Natural Language Concepts in Individual Units of CNNs |
463 | 0 | Learning Recurrent Binary/Ternary Weights |
464 | 0 | Large-Scale Answerer in Questioner’s Mind for Visual Dialog Question Generation |
465 | 0 | Variational Smoothing in Recurrent Neural Network Language Models |
466 | 0 | Top-Down Neural Model For Formulae |
467 | 0 | Representing Formal Languages: A Comparison Between Finite Automata and Recurrent Neural Networks |
468 | 0 | CBOW Is Not All You Need: Combining CBOW with the Compositional Matrix Space Model |
469 | 0 | Sparse Dictionary Learning by Dynamical Neural Networks |
470 | 0 | Learning Embeddings into Entropic Wasserstein Spaces |
471 | 0 | Max-MIG: an Information Theoretic Approach for Joint Learning from Crowds |
472 | 0 | Preconditioner on Matrix Lie Group for SGD |
473 | 0 | NOODL: Provable Online Dictionary Learning and Sparse Coding |
474 | 0 | The Comparative Power of ReLU Networks and Polynomial Kernels in the Presence of Sparse Latent Structure |
475 | 0 | Off-Policy Evaluation and Learning from Logged Bandit Feedback: Error Reduction via Surrogate Policy |
476 | 0 | A Closer Look at Few-shot Classification |
477 | 0 | NADPEx: An on-policy temporally consistent exploration method for deep reinforcement learning |
478 | 0 | Composing Complex Skills by Learning Transition Policies |
479 | 0 | Supervised Policy Update for Deep Reinforcement Learning |
480 | 0 | Synthetic Datasets for Neural Program Synthesis |
481 | 0 | A new dog learns old tricks: RL finds classic optimization algorithms |
482 | 0 | Neural Graph Evolution: Automatic Robot Design |
483 | 0 | Competitive experience replay |
484 | 0 | Visceral Machines: Risk-Aversion in Reinforcement Learning with Intrinsic Physiological Rewards |
485 | 0 | Learning Latent Superstructures in Variational Autoencoders for Deep Multidimensional Clustering |
486 | 0 | On Computation and Generalization of Generative Adversarial Networks under Spectrum Control |
487 | 0 | Adversarial Reprogramming of Neural Networks |
488 | 0 | INVASE: Instance-wise Variable Selection using Neural Networks |
489 | 0 | GO Gradient for Expectation-Based Objectives |
490 | 0 | Revealing interpretable object representations from human behavior |
491 | 0 | Learning what you can do before doing anything |
492 | 0 | A Data-Driven and Distributed Approach to Sparse Signal Representation and Recovery |
493 | 0 | Convolutional Neural Networks on Non-uniform Geometrical Signals Using Euclidean Spectral Transformation |
494 | 0 | Equi-normalization of Neural Networks |
495 | 0 | ROBUST ESTIMATION VIA GENERATIVE ADVERSARIAL NETWORKS |
496 | 0 | Unsupervised Discovery of Parts, Structure, and Dynamics |
497 | 0 | Harmonizing Maximum Likelihood with GANs for Multimodal Conditional Generation |
498 | 0 | Minimal Images in Deep Neural Networks: Fragile Object Recognition in Natural Images |
499 | 0 | Overcoming the Disentanglement vs Reconstruction Trade-off via Jacobian Supervision |
500 | 0 | Visual Reasoning by Progressive Module Networks |