| Rank |
Cited by |
Paper name |
| 0 | 265 | Large Scale GAN Training for High Fidelity Natural Image Synthesis |
| 1 | 210 | DARTS: Differentiable Architecture Search |
| 2 | 135 | GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding |
| 3 | 108 | Gradient Descent Provably Optimizes Over-parameterized Neural Networks |
| 4 | 90 | Robustness May Be at Odds with Accuracy |
| 5 | 86 | ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness |
| 6 | 85 | The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks |
| 7 | 81 | Diversity is All You Need: Learning Skills without a Reward Function |
| 8 | 81 | Large-Scale Study of Curiosity-Driven Learning |
| 9 | 79 | Evaluating Robustness of Neural Networks with Mixed Integer Programming |
| 10 | 72 | The relativistic discriminator: a key element missing from standard GAN |
| 11 | 72 | ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware |
| 12 | 68 | FFJORD: Free-Form Continuous Dynamics for Scalable Reversible Generative Models |
| 13 | 66 | Learning deep representations by mutual information estimation and maximization |
| 14 | 63 | Universal Transformers |
| 15 | 61 | Meta-Learning with Latent Embedding Optimization |
| 16 | 60 | ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech |
| 17 | 59 | How Powerful are Graph Neural Networks? |
| 18 | 53 | Learning a SAT Solver from Single-Bit Supervision |
| 19 | 50 | Rethinking the Value of Network Pruning |
| 20 | 50 | Exploration by random network distillation |
| 21 | 44 | Benchmarking Neural Network Robustness to Common Corruptions and Perturbations |
| 22 | 44 | SNAS: stochastic neural architecture search |
| 23 | 37 | Do Deep Generative Models Know What They Don’t Know? |
| 24 | 36 | What do you learn from context? Probing for sentence structure in contextualized word representations |
| 25 | 36 | Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet |
| 26 | 34 | A Universal Music Translation Network |
| 27 | 34 | Are adversarial examples inevitable? |
| 28 | 32 | A Variational Inequality Perspective on Generative Adversarial Networks |
| 29 | 30 | On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization |
| 30 | 29 | Pay Less Attention with Lightweight and Dynamic Convolutions |
| 31 | 29 | LEARNING TO PROPAGATE LABELS: TRANSDUCTIVE PROPAGATION NETWORK FOR FEW-SHOT LEARNING |
| 32 | 29 | Towards the first adversarially robust neural network model on MNIST |
| 33 | 28 | Deep Graph Infomax |
| 34 | 27 | Analyzing Inverse Problems with Invertible Neural Networks |
| 35 | 27 | A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks |
| 36 | 27 | Meta-learning with differentiable closed-form solvers |
| 37 | 26 | Decoupled Weight Decay Regularization |
| 38 | 26 | Local SGD Converges Fast and Communicates Little |
| 39 | 25 | Adversarial Audio Synthesis |
| 40 | 24 | Wizard of Wikipedia: Knowledge-Powered Conversational Agents |
| 41 | 24 | Adaptive Gradient Methods with Dynamic Bound of Learning Rate |
| 42 | 24 | Deep Convolutional Networks as shallow Gaussian Processes |
| 43 | 23 | Sample Efficient Adaptive Text-to-Speech |
| 44 | 23 | Training for Faster Adversarial Robustness Verification via Inducing ReLU Stability |
| 45 | 23 | Efficient Multi-Objective Neural Architecture Search via Lamarckian Evolution |
| 46 | 22 | Adaptive Input Representations for Neural Language Modeling |
| 47 | 22 | Deep Anomaly Detection with Outlier Exposure |
| 48 | 22 | Episodic Curiosity through Reachability |
| 49 | 22 | Differentiable Learning-to-Normalize via Switchable Normalization |
| 50 | 21 | Query-Efficient Hard-label Black-box Attack: An Optimization-based Approach |
| 51 | 20 | Hierarchical Generative Modeling for Controllable Speech Synthesis |
| 52 | 20 | Hyperbolic Attention Networks |
| 53 | 20 | Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks |
| 54 | 20 | Prior Convictions: Black-box Adversarial Attacks with Bandits and Priors |
| 55 | 19 | Lagging Inference Networks and Posterior Collapse in Variational Autoencoders |
| 56 | 19 | The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision |
| 57 | 19 | Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset |
| 58 | 18 | Attentive Neural Processes |
| 59 | 18 | Exemplar Guided Unsupervised Image-to-Image Translation with Semantic Consistency |
| 60 | 18 | Slimmable Neural Networks |
| 61 | 18 | Gradient descent aligns the layers of deep linear networks |
| 62 | 18 | How to train your MAML |
| 63 | 18 | Slalom: Fast, Verifiable and Private Execution of Neural Networks in Trusted Hardware |
| 64 | 18 | The Singular Values of Convolutional Layers |
| 65 | 18 | GENERATING HIGH FIDELITY IMAGES WITH SUBSCALE PIXEL NETWORKS AND MULTIDIMENSIONAL UPSCALING |
| 66 | 17 | Recurrent Experience Replay in Distributed Reinforcement Learning |
| 67 | 17 | SNIP: SINGLE-SHOT NETWORK PRUNING BASED ON CONNECTION SENSITIVITY |
| 68 | 17 | Unsupervised Learning via Meta-Learning |
| 69 | 17 | code2seq: Generating Sequences from Structured Representations of Code |
| 70 | 17 | GAN Dissection: Visualizing and Understanding Generative Adversarial Networks |
| 71 | 16 | Efficient Lifelong Learning with A-GEM |
| 72 | 16 | Deep Decoder: Concise Image Representations from Untrained Non-convolutional Networks |
| 73 | 15 | FlowQA: Grasping Flow in History for Conversational Machine Comprehension |
| 74 | 15 | Three Mechanisms of Weight Decay Regularization |
| 75 | 15 | Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile |
| 76 | 15 | Universal Stagewise Learning for Non-Convex Problems with Convergence on Averaged Solutions |
| 77 | 15 | A Mean Field Theory of Batch Normalization |
| 78 | 15 | Deep Online Learning Via Meta-Learning: Continual Adaptation for Model-Based RL |
| 79 | 15 | RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space |
| 80 | 15 | Self-Monitoring Navigation Agent via Auxiliary Progress Estimation |
| 81 | 15 | Time-Agnostic Prediction: Predicting Predictable Video Frames |
| 82 | 14 | Attention, Learn to Solve Routing Problems! |
| 83 | 14 | GamePad: A Learning Environment for Theorem Proving |
| 84 | 14 | Learning Factorized Multimodal Representations |
| 85 | 14 | Quaternion Recurrent Neural Networks |
| 86 | 14 | Fixup Initialization: Residual Learning Without Normalization |
| 87 | 14 | Meta-Learning Probabilistic Inference for Prediction |
| 88 | 14 | Graph HyperNetworks for Neural Architecture Search |
| 89 | 14 | Defensive Quantization: When Efficiency Meets Robustness |
| 90 | 14 | Adversarial Attacks on Graph Neural Networks via Meta Learning |
| 91 | 14 | Understanding and Improving Interpolation in Autoencoders via an Adversarial Regularizer |
| 92 | 14 | Non-vacuous Generalization Bounds at the ImageNet Scale: a PAC-Bayesian Compression Approach |
| 93 | 14 | Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes |
| 94 | 13 | Janossy Pooling: Learning Deep Permutation-Invariant Functions for Variable-Size Inputs |
| 95 | 13 | No Training Required: Exploring Random Encoders for Sentence Classification |
| 96 | 13 | DialogWAE: Multimodal Response Generation with Conditional Wasserstein Auto-Encoder |
| 97 | 13 | Critical Learning Periods in Deep Networks |
| 98 | 13 | Excessive Invariance Causes Adversarial Vulnerability |
| 99 | 13 | On Self Modulation for Generative Adversarial Networks |
| 100 | 13 | L2-Nonexpansive Neural Networks |
| 101 | 13 | Recall Traces: Backtracking Models for Efficient Reinforcement Learning |
| 102 | 13 | BA-Net: Dense Bundle Adjustment Networks |
| 103 | 12 | Temporal Difference Variational Auto-Encoder |
| 104 | 12 | Diagnosing and Enhancing VAE Models |
| 105 | 12 | TimbreTron: A WaveNet(CycleGAN(CQT(Audio))) Pipeline for Musical Timbre Transfer |
| 106 | 12 | Dynamically Unfolding Recurrent Restorer: A Moving Endpoint Control Method for Image Restoration |
| 107 | 12 | Identifying and Controlling Important Neurons in Neural Machine Translation |
| 108 | 12 | Aggregated Momentum: Stability Through Passive Damping |
| 109 | 12 | The role of over-parametrization in generalization of neural networks |
| 110 | 12 | There Are Many Consistent Explanations of Unlabeled Data: Why You Should Average |
| 111 | 12 | Small nonlinearities in activation functions create bad local minima in neural networks |
| 112 | 12 | Plan Online, Learn Offline: Efficient Learning and Exploration via Model-Based Control |
| 113 | 12 | Near-Optimal Representation Learning for Hierarchical Reinforcement Learning |
| 114 | 12 | Improving the Generalization of Adversarial Training with Domain Adaptation |
| 115 | 12 | Supervised Community Detection with Line Graph Neural Networks |
| 116 | 12 | Discriminator Rejection Sampling |
| 117 | 12 | GANSynth: Adversarial Neural Audio Synthesis |
| 118 | 11 | Doubly Reparameterized Gradient Estimators for Monte Carlo Objectives |
| 119 | 11 | Sliced Wasserstein Auto-Encoders |
| 120 | 11 | Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning |
| 121 | 11 | Unsupervised Hyper-alignment for Multilingual Word Embeddings |
| 122 | 11 | Riemannian Adaptive Optimization Methods |
| 123 | 11 | Hindsight policy gradients |
| 124 | 11 | CEM-RL: Combining evolutionary and gradient-based methods for policy search |
| 125 | 11 | Emergent Coordination Through Competition |
| 126 | 11 | Deterministic Variational Inference for Robust Bayesian Neural Networks |
| 127 | 11 | Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees |
| 128 | 11 | Multi-step Retriever-Reader Interaction for Scalable Open-domain Question Answering |
| 129 | 11 | Adv-BNN: Improved Adversarial Defense through Robust Bayesian Neural Network |
| 130 | 11 | Approximability of Discriminators Implies Diversity in GANs |
| 131 | 11 | Generative Code Modeling with Graphs |
| 132 | 11 | PeerNets: Exploiting Peer Wisdom Against Adversarial Attacks |
| 133 | 11 | Visual Semantic Navigation using Scene Priors |
| 134 | 11 | Residual Non-local Attention Networks for Image Restoration |
| 135 | 11 | Diffusion Scattering Transforms on Graphs |
| 136 | 10 | Unsupervised Control Through Non-Parametric Discriminative Rewards |
| 137 | 10 | FUNCTIONAL VARIATIONAL BAYESIAN NEURAL NETWORKS |
| 138 | 10 | Trellis Networks for Sequence Modeling |
| 139 | 10 | Poincare Glove: Hyperbolic Word Embeddings |
| 140 | 10 | Towards Understanding Regularization in Batch Normalization |
| 141 | 10 | Learning to Learn without Forgetting by Maximizing Transfer and Minimizing Interference |
| 142 | 10 | Reward Constrained Policy Optimization |
| 143 | 10 | L-Shapley and C-Shapley: Efficient Model Interpretation for Structured Data |
| 144 | 10 | Invariant and Equivariant Graph Networks |
| 145 | 10 | Improving Generalization and Stability of Generative Adversarial Networks |
| 146 | 10 | InstaGAN: Instance-aware Image-to-Image Translation |
| 147 | 9 | Stable Recurrent Models |
| 148 | 9 | From Language to Goals: Inverse Reinforcement Learning for Vision-Based Instruction Following |
| 149 | 9 | Learning to Represent Edits |
| 150 | 9 | Hierarchical interpretations for neural network predictions |
| 151 | 9 | Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality |
| 152 | 9 | Data-Dependent Coresets for Compressing Neural Networks with Applications to Generalization Bounds |
| 153 | 9 | Relaxed Quantization for Discretized Neural Networks |
| 154 | 9 | Optimal Completion Distillation for Sequence Learning |
| 155 | 9 | Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow |
| 156 | 9 | Adversarial Imitation via Variational Inverse Reinforcement Learning |
| 157 | 9 | Predict then Propagate: Graph Neural Networks meet Personalized PageRank |
| 158 | 9 | Learning to Infer and Execute 3D Shape Programs |
| 159 | 9 | Context-adaptive Entropy Model for End-to-end Optimized Image Compression |
| 160 | 9 | Biologically-Plausible Learning Algorithms Can Scale to Large Datasets |
| 161 | 9 | Model-Predictive Policy Learning with Uncertainty Regularization for Driving in Dense Traffic |
| 162 | 8 | Learning Particle Dynamics for Manipulating Rigid Bodies, Deformable Objects, and Fluids |
| 163 | 8 | Deep learning generalizes because the parameter-function map is biased towards simple functions |
| 164 | 8 | On the loss landscape of a class of deep neural networks with no bad local valleys |
| 165 | 8 | Stable Opponent Shaping in Differentiable Games |
| 166 | 8 | Detecting Egregious Responses in Neural Sequence-to-sequence Models |
| 167 | 8 | Differentiable Perturb-and-Parse: Semi-Supervised Parsing with a Structured Variational Autoencoder |
| 168 | 8 | ALISTA: Analytic Weights Are As Good As Learned Weights in LISTA |
| 169 | 8 | An analytic theory of generalization dynamics and transfer learning in deep linear networks |
| 170 | 8 | Automatically Composing Representation Transformations as a Means for Generalization |
| 171 | 8 | An Empirical Study of Example Forgetting during Deep Neural Network Learning |
| 172 | 8 | Hierarchical Visuomotor Control of Humanoids |
| 173 | 8 | Generalizable Adversarial Training via Spectral Normalization |
| 174 | 8 | Deep reinforcement learning with relational inductive biases |
| 175 | 8 | Structured Adversarial Attack: Towards General Implementation and Better Interpretability |
| 176 | 8 | Capsule Graph Neural Network |
| 177 | 8 | Whitening and Coloring Batch Transform for GANs |
| 178 | 8 | Dynamic Channel Pruning: Feature Boosting and Suppression |
| 179 | 8 | On the Minimal Supervision for Training Any Binary Classifier from Only Unlabeled Data |
| 180 | 8 | LanczosNet: Multi-Scale Deep Graph Convolutional Networks |
| 181 | 8 | Explaining Image Classifiers by Counterfactual Generation |
| 182 | 7 | Amortized Bayesian Meta-Learning |
| 183 | 7 | Preventing Posterior Collapse with delta-VAEs |
| 184 | 7 | Music Transformer: Generating Music with Long-Term Structure |
| 185 | 7 | AntisymmetricRNN: A Dynamical System View on Recurrent Neural Networks |
| 186 | 7 | Multilingual Neural Machine Translation with Knowledge Distillation |
| 187 | 7 | ProxQuant: Quantized Neural Networks via Proximal Operators |
| 188 | 7 | Regularized Learning for Domain Adaptation under Label Shifts |
| 189 | 7 | Deep Lagrangian Networks: Using Physics as Model Prior for Deep Learning |
| 190 | 7 | Predicting the Generalization Gap in Deep Networks with Margin Distributions |
| 191 | 7 | Selfless Sequential Learning |
| 192 | 7 | Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning |
| 193 | 7 | Neural Probabilistic Motor Primitives for Humanoid Control |
| 194 | 7 | Meta-Learning For Stochastic Gradient MCMC |
| 195 | 7 | Reasoning About Physical Interactions with Object-Oriented Prediction and Planning |
| 196 | 7 | Soft Q-Learning with Mutual-Information Regularization |
| 197 | 7 | DyRep: Learning Representations over Dynamic Graphs |
| 198 | 7 | Building Dynamic Knowledge Graphs from Text using Machine Reading Comprehension |
| 199 | 7 | The Unusual Effectiveness of Averaging in GAN Training |
| 200 | 7 | SOM-VAE: Interpretable Discrete Representation Learning on Time Series |
| 201 | 7 | Beyond Pixel Norm-Balls: Parametric Adversaries using an Analytically Differentiable Renderer |
| 202 | 7 | Spherical CNNs on Unstructured Grids |
| 203 | 7 | Disjoint Mapping Network for Cross-modal Matching of Voices and Faces |
| 204 | 7 | Generating Multiple Objects at Spatially Distinct Locations |
| 205 | 6 | Spreading vectors for similarity search |
| 206 | 6 | Learning Multimodal Graph-to-Graph Translation for Molecule Optimization |
| 207 | 6 | Structured Neural Summarization |
| 208 | 6 | Multiple-Attribute Text Rewriting |
| 209 | 6 | InfoBot: Transfer and Exploration via the Information Bottleneck |
| 210 | 6 | On the Universal Approximability and Complexity Bounds of Quantized ReLU Neural Networks |
| 211 | 6 | Learning Self-Imitating Diverse Policies |
| 212 | 6 | Stochastic Gradient/Mirror Descent: Minimax Optimality and Implicit Regularization |
| 213 | 6 | Fluctuation-dissipation relations for stochastic gradient descent |
| 214 | 6 | Efficient Training on Very Large Corpora via Gramian Estimation |
| 215 | 6 | ProMP: Proximal Meta-Policy Search |
| 216 | 6 | Analysing Mathematical Reasoning Abilities of Neural Models |
| 217 | 6 | Learning when to Communicate at Scale in Multiagent Cooperative and Competitive Tasks |
| 218 | 6 | Learning Multi-Level Hierarchies with Hindsight |
| 219 | 6 | A comprehensive, application-oriented study of catastrophic forgetting in DNNs |
| 220 | 6 | M^3RL: Mind-aware Multi-agent Management Reinforcement Learning |
| 221 | 6 | Neural Logic Machines |
| 222 | 6 | LayoutGAN: Generating Graphic Layouts with Wireframe Discriminators |
| 223 | 6 | SPIGAN: Privileged Adversarial Learning from Simulation |
| 224 | 6 | Verification of Non-Linear Specifications for Neural Networks |
| 225 | 6 | Learning Protein Structure with a Differentiable Simulator |
| 226 | 6 | ADef: an Iterative Algorithm to Construct Adversarial Deformations |
| 227 | 5 | Beyond Greedy Ranking: Slate Optimization via List-CVAE |
| 228 | 5 | Variance Networks: When Expectation Does Not Meet Your Expectations |
| 229 | 5 | Modeling Uncertainty with Hedged Instance Embeddings |
| 230 | 5 | How Important is a Neuron |
| 231 | 5 | Optimal Transport Maps For Distribution Preserving Operations on Latent Spaces of Generative Models |
| 232 | 5 | Spectral Inference Networks: Unifying Deep and Spectral Learning |
| 233 | 5 | Learning to Understand Goal Specifications by Modelling Reward |
| 234 | 5 | Imposing Category Trees Onto Word-Embeddings Using A Geometric Construction |
| 235 | 5 | Don’t Settle for Average, Go for the Max: Fuzzy Sets and Max-Pooled Word Vectors |
| 236 | 5 | Multilingual Neural Machine Translation With Soft Decoupled Encoding |
| 237 | 5 | Smoothing the Geometry of Probabilistic Box Embeddings |
| 238 | 5 | Tree-Structured Recurrent Switching Linear Dynamical Systems for Multi-Scale Modeling |
| 239 | 5 | Neural Speed Reading with Structural-Jump-LSTM |
| 240 | 5 | Coarse-grain Fine-grain Coattention Network for Multi-evidence Question Answering |
| 241 | 5 | Subgradient Descent Learns Orthogonal Dictionaries |
| 242 | 5 | Learning Two-layer Neural Networks with Symmetric Inputs |
| 243 | 5 | signSGD with Majority Vote is Communication Efficient and Fault Tolerant |
| 244 | 5 | SGD Converges to Global Minimum in Deep Learning via Star-convex Path |
| 245 | 5 | Deterministic PAC-Bayesian generalization bounds for deep networks via generalizing noise-resilience |
| 246 | 5 | Learning to Learn with Conditional Class Dependencies |
| 247 | 5 | Policy Transfer with Strategy Optimization |
| 248 | 5 | Learning to Schedule Communication in Multi-agent Reinforcement Learning |
| 249 | 5 | Measuring and regularizing networks in function space |
| 250 | 5 | Learning Exploration Policies for Navigation |
| 251 | 5 | Relational Forward Models for Multi-Agent Learning |
| 252 | 5 | Woulda, Coulda, Shoulda: Counterfactually-Guided Policy Search |
| 253 | 5 | KnockoffGAN: Generating Knockoffs for Feature Selection using Generative Adversarial Networks |
| 254 | 5 | Combinatorial Attacks on Binarized Neural Networks |
| 255 | 5 | Stochastic Optimization of Sorting Networks via Continuous Relaxations |
| 256 | 5 | On the Sensitivity of Adversarial Robustness to Input Data Distributions |
| 257 | 5 | RelGAN: Relational Generative Adversarial Networks for Text Generation |
| 258 | 5 | Learning Robust Representations by Projecting Superficial Statistics Out |
| 259 | 5 | PATE-GAN: Generating Synthetic Data with Differential Privacy Guarantees |
| 260 | 5 | Bayesian Prediction of Future Street Scenes using Synthetic Likelihoods |
| 261 | 5 | Multi-class classification without multi-class labels |
| 262 | 5 | DPSNet: End-to-end Deep Plane Sweep Stereo |
| 263 | 5 | Emerging Disentanglement in Auto-Encoder Based Unsupervised Image Content Transfer |
| 264 | 5 | Learning To Simulate |
| 265 | 5 | Visual Explanation by Interpretation: Improving Visual Feedback Capabilities of Deep Neural Networks |
| 266 | 5 | Random mesh projectors for inverse problems |
| 267 | 4 | MAE: Mutual Posterior-Divergence Regularization for Variational AutoEncoders |
| 268 | 4 | Meta-Learning Update Rules for Unsupervised Representation Learning |
| 269 | 4 | Bias-Reduced Uncertainty Estimation for Deep Neural Classifiers |
| 270 | 4 | The Deep Weight Prior |
| 271 | 4 | LEARNING FACTORIZED REPRESENTATIONS FOR OPEN-SET DOMAIN ADAPTATION |
| 272 | 4 | Neural Persistence: A Complexity Measure for Deep Neural Networks Using Algebraic Topology |
| 273 | 4 | Learning Neural PDE Solvers with Convergence Guarantees |
| 274 | 4 | Unsupervised Domain Adaptation for Distance Metric Learning |
| 275 | 4 | ARM: Augment-REINFORCE-Merge Gradient for Stochastic Binary Networks |
| 276 | 4 | Generative Question Answering: Learning to Answer the Whole Question |
| 277 | 4 | Stochastic Prediction of Multi-Agent Interactions from Partial Observations |
| 278 | 4 | Learning Programmatically Structured Representations with Perceptor Gradients |
| 279 | 4 | Global-to-local Memory Pointer Networks for Task-Oriented Dialogue |
| 280 | 4 | Multi-Agent Dual Learning |
| 281 | 4 | Von Mises-Fisher Loss for Training Sequence to Sequence Models with Continuous Outputs |
| 282 | 4 | Learning protein sequence embeddings using information from structure |
| 283 | 4 | Harmonic Unpaired Image-to-image Translation |
| 284 | 4 | Characterizing Audio Adversarial Examples Using Temporal Dependency |
| 285 | 4 | Systematic Generalization: What Is Required and Can It Be Learned? |
| 286 | 4 | On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step Length |
| 287 | 4 | Understanding Straight-Through Estimator in Training Activation Quantized Neural Nets |
| 288 | 4 | Caveats for information bottleneck in deterministic scenarios |
| 289 | 4 | A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation |
| 290 | 4 | AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods |
| 291 | 4 | Opportunistic Learning: Budgeted Cost-Sensitive Learning from Data Streams |
| 292 | 4 | Information-Directed Exploration for Deep Reinforcement Learning |
| 293 | 4 | Probabilistic Recursive Reasoning for Multi-Agent Reinforcement Learning |
| 294 | 4 | Variance Reduction for Reinforcement Learning in Input-Driven Environments |
| 295 | 4 | Directed-Info GAIL: Learning Hierarchical Policies from Unsegmented Demonstrations using Directed Information |
| 296 | 4 | Probabilistic Planning with Sequential Monte Carlo methods |
| 297 | 4 | The Limitations of Adversarial Training and the Blind-Spot Attack |
| 298 | 4 | Large Scale Graph Learning From Smooth Signals |
| 299 | 4 | Feature Intertwiner for Object Detection |
| 300 | 4 | StrokeNet: A Neural Painting Environment |
| 301 | 4 | Eidetic 3D LSTM: A Model for Video Prediction and Beyond |
| 302 | 4 | Learning Localized Generative Models for 3D Point Clouds via Graph Convolution |
| 303 | 4 | Generating Multi-Agent Trajectories using Programmatic Weak Supervision |
| 304 | 4 | Diversity-Sensitive Conditional Generative Adversarial Networks |
| 305 | 4 | A Unified Theory of Early Visual Representations from Retina to Cortex through Anatomically Constrained Deep CNNs |
| 306 | 3 | Efficiently testing local optimality and escaping saddles for ReLU networks |
| 307 | 3 | Bayesian Policy Optimization for Model Uncertainty |
| 308 | 3 | Variational Autoencoder with Arbitrary Conditioning |
| 309 | 3 | DHER: Hindsight Experience Replay for Dynamic Goals |
| 310 | 3 | Dimensionality Reduction for Representing the Knowledge of Probabilistic Models |
| 311 | 3 | Learning-Based Frequency Estimation Algorithms |
| 312 | 3 | Practical lossless compression with latent variables using bits back coding |
| 313 | 3 | Label super-resolution networks |
| 314 | 3 | Measuring Compositionality in Representation Learning |
| 315 | 3 | Distribution-Interpolation Trade off in Generative Models |
| 316 | 3 | Improving Sequence-to-Sequence Learning via Optimal Transport |
| 317 | 3 | Guiding Policies with Language via Meta-Learning |
| 318 | 3 | Learning to Design RNA |
| 319 | 3 | Learning what and where to attend |
| 320 | 3 | Unsupervised Speech Recognition via Segmental Empirical Output Distribution Matching |
| 321 | 3 | Learning Finite State Representations of Recurrent Policy Networks |
| 322 | 3 | Backpropamine: training self-modifying neural networks with differentiable neuromodulated plasticity |
| 323 | 3 | Theoretical Analysis of Auto Rate-Tuning by Batch Normalization |
| 324 | 3 | Towards Robust, Locally Linear Deep Networks |
| 325 | 3 | Quasi-hyperbolic momentum and Adam for deep learning |
| 326 | 3 | Double Viterbi: Weight Encoding for High Compression Ratio and Fast On-Chip Reconstruction for Deep Neural Network |
| 327 | 3 | Optimal Control Via Neural Networks: A Convex Approach |
| 328 | 3 | ANYTIME MINIBATCH: EXPLOITING STRAGGLERS IN ONLINE DISTRIBUTED OPTIMIZATION |
| 329 | 3 | A2BCD: Asynchronous Acceleration with Optimal Complexity |
| 330 | 3 | Learning to Make Analogies by Contrasting Abstract Relational Structure |
| 331 | 3 | Adaptive Posterior Learning: few-shot learning with a surprise-based memory module |
| 332 | 3 | AutoLoss: Learning Discrete Schedule for Alternate Optimization |
| 333 | 3 | Universal Successor Features Approximators |
| 334 | 3 | Information asymmetry in KL-regularized RL |
| 335 | 3 | Learning Actionable Representations with Goal Conditioned Policies |
| 336 | 3 | Two-Timescale Networks for Nonlinear Value Function Approximation |
| 337 | 3 | Sample Efficient Imitation Learning for Continuous Control |
| 338 | 3 | A Direct Approach to Robust Deep Learning Using Adversarial Networks |
| 339 | 3 | Scalable Unbalanced Optimal Transport using Generative Adversarial Networks |
| 340 | 3 | Graph Wavelet Neural Network |
| 341 | 3 | CAMOU: Learning Physical Vehicle Camouflages to Adversarially Attack Detectors in the Wild |
| 342 | 3 | Learnable Embedding Space for Efficient Neural Architecture Compression |
| 343 | 3 | Learning To Solve Circuit-SAT: An Unsupervised Differentiable Approach |
| 344 | 3 | Towards GAN Benchmarks Which Require Generalization |
| 345 | 3 | Learning Mixed-Curvature Representations in Product Spaces |
| 346 | 3 | Augmented Cyclic Adversarial Learning for Low Resource Domain Adaptation |
| 347 | 3 | Bounce and Learn: Modeling Scene Dynamics with Real-World Bounces |
| 348 | 3 | Learning to Describe Scenes with Programs |
| 349 | 3 | DELTA: DEEP LEARNING TRANSFER USING FEATURE MAP WITH ATTENTION FOR CONVOLUTIONAL NETWORKS |
| 350 | 3 | Mode Normalization |
| 351 | 3 | A rotation-equivariant convolutional neural network model of primary visual cortex |
| 352 | 3 | Big-Little Net: An Efficient Multi-Scale Feature Representation for Visual and Speech Recognition |
| 353 | 3 | Human-level Protein Localization with Convolutional Neural Networks |
| 354 | 3 | Value Propagation Networks |
| 355 | 3 | Diversity and Depth in Per-Example Routing Models |
| 356 | 2 | Accumulation Bit-Width Scaling For Ultra-Low Precision Training Of Deep Networks |
| 357 | 2 | Kernel Change-point Detection with Auxiliary Deep Generative Models |
| 358 | 2 | Learning a Meta-Solver for Syntax-Guided Program Synthesis |
| 359 | 2 | On the Turing Completeness of Modern Neural Network Architectures |
| 360 | 2 | Active Learning with Partial Feedback |
| 361 | 2 | Toward Understanding the Impact of Staleness in Distributed Machine Learning |
| 362 | 2 | Feature-Wise Bias Amplification |
| 363 | 2 | Transferring Knowledge across Learning Processes |
| 364 | 2 | Deep, Skinny Neural Networks are not Universal Approximators |
| 365 | 2 | Interpolation-Prediction Networks for Irregularly Sampled Time Series |
| 366 | 2 | Learning Representations of Sets through Optimized Permutations |
| 367 | 2 | Variational Bayesian Phylogenetic Inference |
| 368 | 2 | BabyAI: A Platform to Study the Sample Efficiency of Grounded Language Learning |
| 369 | 2 | Learning to Screen for Fast Softmax Inference on Large Vocabulary Neural Networks |
| 370 | 2 | Posterior Attention Models for Sequence to Sequence Learning |
| 371 | 2 | Learning Implicitly Recurrent CNNs Through Parameter Sharing |
| 372 | 2 | A Generative Model For Electron Paths |
| 373 | 2 | RNNs implicitly implement tensor-product representations |
| 374 | 2 | h-detach: Modifying the LSTM Gradient Towards Better Optimization |
| 375 | 2 | Adaptive Estimators Show Information Compression in Deep Neural Networks |
| 376 | 2 | Learning sparse relational transition models |
| 377 | 2 | From Hard to Soft: Understanding Deep Network Nonlinearities via Vector Quantization and Statistical Inference |
| 378 | 2 | ACCELERATING NONCONVEX LEARNING VIA REPLICA EXCHANGE LANGEVIN DIFFUSION |
| 379 | 2 | Per-Tensor Fixed-Point Quantization of the Back-Propagation Algorithm |
| 380 | 2 | Deep Layers as Stochastic Solvers |
| 381 | 2 | Energy-Constrained Compression for Deep Neural Networks via Weighted Sparse Projection and Layer Input Masking |
| 382 | 2 | An Empirical study of Binary Neural Networks’ Optimisation |
| 383 | 2 | Deep Frank-Wolfe For Neural Network Optimization |
| 384 | 2 | G-SGD: Optimizing ReLU Neural Networks in its Positively Scale-Invariant Space |
| 385 | 2 | Contingency-Aware Exploration in Reinforcement Learning |
| 386 | 2 | Hierarchical Reinforcement Learning via Advantage-Weighted Information Maximization |
| 387 | 2 | The Laplacian in RL: Learning Representations with Efficient Approximations |
| 388 | 2 | Execution-Guided Neural Program Synthesis |
| 389 | 2 | Marginal Policy Gradients: A Unified Family of Estimators for Bounded Action Spaces with Applications |
| 390 | 2 | Modeling the Long Term Future in Model-Based Reinforcement Learning |
| 391 | 2 | Environment Probing Interaction Policies |
| 392 | 2 | Neural Program Repair by Jointly Learning to Localize and Repair |
| 393 | 2 | DISTRIBUTIONAL CONCAVITY REGULARIZATION FOR GANS |
| 394 | 2 | Dynamic Sparse Graph for Efficient Deep Learning |
| 395 | 2 | A Statistical Approach to Assessing Neural Network Robustness |
| 396 | 2 | Multi-Domain Adversarial Learning |
| 397 | 2 | Conditional Network Embeddings |
| 398 | 2 | signSGD via Zeroth-Order Oracle |
| 399 | 2 | Robust Conditional Generative Adversarial Networks |
| 400 | 2 | Deep Learning 3D Shapes Using Alt-az Anisotropic 2-Sphere Convolution |
| 401 | 2 | AD-VAT: An Asymmetric Dueling mechanism for learning Visual Active Tracking |
| 402 | 2 | Latent Convolutional Models |
| 403 | 2 | LeMoNADe: Learned Motif and Neuronal Assembly Detection in calcium imaging videos |
| 404 | 2 | STCN: Stochastic Temporal Convolutional Networks |
| 405 | 2 | Improving MMD-GAN Training with Repulsive Loss Function |
| 406 | 1 | Wasserstein Barycenter Model Ensembling |
| 407 | 1 | Learning Grid Cells as Vector Representation of Self-Position Coupled with Matrix Representation of Self-Motion |
| 408 | 1 | Generating Liquid Simulations with Deformation-aware Neural Networks |
| 409 | 1 | Efficient Augmentation via Data Subsampling |
| 410 | 1 | Generative predecessor models for sample-efficient imitation learning |
| 411 | 1 | Auxiliary Variational MCMC |
| 412 | 1 | Variational Autoencoders with Jointly Optimized Latent Dependency Structure |
| 413 | 1 | Function Space Particle Optimization for Bayesian Neural Networks |
| 414 | 1 | MARGINALIZED AVERAGE ATTENTIONAL NETWORK FOR WEAKLY-SUPERVISED LEARNING |
| 415 | 1 | Neural TTS Stylization with Adversarial and Collaborative Games |
| 416 | 1 | Representation Degeneration Problem in Training Natural Language Generation Models |
| 417 | 1 | Transfer Learning for Sequences via Learning to Collocate |
| 418 | 1 | Understanding Composition of Word Embeddings via Tensor Decomposition |
| 419 | 1 | Complement Objective Training |
| 420 | 1 | DOM-Q-NET: Grounded RL on Structured Language |
| 421 | 1 | Generalized Tensor Models for Recurrent Neural Networks |
| 422 | 1 | textTOvec: DEEP CONTEXTUALIZED NEURAL AUTOREGRESSIVE TOPIC MODELS OF LANGUAGE WITH DISTRIBUTED COMPOSITIONAL PRIOR |
| 423 | 1 | Kernel RNN Learning (KeRNL) |
| 424 | 1 | Minimum Divergence vs. Maximum Margin: an Empirical Comparison on Seq2Seq Models |
| 425 | 1 | Analysis of Quantized Models |
| 426 | 1 | A Kernel Random Matrix-Based Approach for Sparse PCA |
| 427 | 1 | DeepOBS: A Deep Learning Optimizer Benchmark Suite |
| 428 | 1 | Learning concise representations for regression by evolving networks of trees |
| 429 | 1 | Initialized Equilibrium Propagation for Backprop-Free Training |
| 430 | 1 | Minimal Random Code Learning: Getting Bits Back from Compressed Model Parameters |
| 431 | 1 | Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions |
| 432 | 1 | On Random Deep Weight-Tied Autoencoders: Exact Asymptotic Analysis, Phase Transitions, and Implications to Training |
| 433 | 1 | Overcoming Catastrophic Forgetting for Continual Learning via Model Adaptation |
| 434 | 1 | Preferences Implicit in the State of the World |
| 435 | 1 | Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies |
| 436 | 1 | Solving the Rubik’s Cube with Approximate Policy Iteration |
| 437 | 1 | Towards Metamerism via Foveated Style Transfer |
| 438 | 1 | Learning to Navigate the Web |
| 439 | 1 | Knowledge Flow: Improve Upon Your Teachers |
| 440 | 1 | Rigorous Agent Evaluation: An Adversarial Approach to Uncover Catastrophic Failures |
| 441 | 1 | Cost-Sensitive Robustness against Adversarial Examples |
| 442 | 1 | MisGAN: Learning from Incomplete Data with Generative Adversarial Networks |
| 443 | 1 | Don’t let your Discriminator be fooled |
| 444 | 1 | Learning to Remember More with Less Memorization |
| 445 | 1 | Boosting Robustness Certification of Neural Networks |
| 446 | 1 | Post Selection Inference with Incomplete Maximum Mean Discrepancy Estimator |
| 447 | 1 | RotDCF: Decomposition of Convolutional Filters for Rotation-Equivariant Deep Networks |
| 448 | 1 | ProbGAN: Towards Probabilistic GAN with Theoretical Guarantees |
| 449 | 1 | K for the Price of 1: Parameter-efficient Multi-task and Transfer Learning |
| 450 | 1 | Neural network gradient-based learning of black-box function interfaces |
| 451 | 1 | Adversarial Domain Adaptation for Stable Brain-Machine Interfaces |
| 452 | 1 | Unsupervised Adversarial Image Reconstruction |
| 453 | 0 | Unsupervised Learning of the Set of Local Maxima |
| 454 | 0 | Feed-forward Propagation in Probabilistic Neural Networks with Categorical and Max Layers |
| 455 | 0 | Maximal Divergence Sequential Autoencoder for Binary Software Vulnerability Detection |
| 456 | 0 | Learning Procedural Abstractions and Evaluating Discrete Latent Temporal Structure |
| 457 | 0 | Information Theoretic lower bounds on negative log likelihood |
| 458 | 0 | Learning from Positive and Unlabeled Data with a Selection Bias |
| 459 | 0 | Integer Networks for Data Compression with Latent-Variable Models |
| 460 | 0 | Improving Differentiable Neural Computers Through Memory Masking, De-allocation, and Link Distribution Sharpness Control |
| 461 | 0 | A Max-Affine Spline Perspective of Recurrent Neural Networks |
| 462 | 0 | Discovery of Natural Language Concepts in Individual Units of CNNs |
| 463 | 0 | Learning Recurrent Binary/Ternary Weights |
| 464 | 0 | Large-Scale Answerer in Questioner’s Mind for Visual Dialog Question Generation |
| 465 | 0 | Variational Smoothing in Recurrent Neural Network Language Models |
| 466 | 0 | Top-Down Neural Model For Formulae |
| 467 | 0 | Representing Formal Languages: A Comparison Between Finite Automata and Recurrent Neural Networks |
| 468 | 0 | CBOW Is Not All You Need: Combining CBOW with the Compositional Matrix Space Model |
| 469 | 0 | Sparse Dictionary Learning by Dynamical Neural Networks |
| 470 | 0 | Learning Embeddings into Entropic Wasserstein Spaces |
| 471 | 0 | Max-MIG: an Information Theoretic Approach for Joint Learning from Crowds |
| 472 | 0 | Preconditioner on Matrix Lie Group for SGD |
| 473 | 0 | NOODL: Provable Online Dictionary Learning and Sparse Coding |
| 474 | 0 | The Comparative Power of ReLU Networks and Polynomial Kernels in the Presence of Sparse Latent Structure |
| 475 | 0 | Off-Policy Evaluation and Learning from Logged Bandit Feedback: Error Reduction via Surrogate Policy |
| 476 | 0 | A Closer Look at Few-shot Classification |
| 477 | 0 | NADPEx: An on-policy temporally consistent exploration method for deep reinforcement learning |
| 478 | 0 | Composing Complex Skills by Learning Transition Policies |
| 479 | 0 | Supervised Policy Update for Deep Reinforcement Learning |
| 480 | 0 | Synthetic Datasets for Neural Program Synthesis |
| 481 | 0 | A new dog learns old tricks: RL finds classic optimization algorithms |
| 482 | 0 | Neural Graph Evolution: Automatic Robot Design |
| 483 | 0 | Competitive experience replay |
| 484 | 0 | Visceral Machines: Risk-Aversion in Reinforcement Learning with Intrinsic Physiological Rewards |
| 485 | 0 | Learning Latent Superstructures in Variational Autoencoders for Deep Multidimensional Clustering |
| 486 | 0 | On Computation and Generalization of Generative Adversarial Networks under Spectrum Control |
| 487 | 0 | Adversarial Reprogramming of Neural Networks |
| 488 | 0 | INVASE: Instance-wise Variable Selection using Neural Networks |
| 489 | 0 | GO Gradient for Expectation-Based Objectives |
| 490 | 0 | Revealing interpretable object representations from human behavior |
| 491 | 0 | Learning what you can do before doing anything |
| 492 | 0 | A Data-Driven and Distributed Approach to Sparse Signal Representation and Recovery |
| 493 | 0 | Convolutional Neural Networks on Non-uniform Geometrical Signals Using Euclidean Spectral Transformation |
| 494 | 0 | Equi-normalization of Neural Networks |
| 495 | 0 | ROBUST ESTIMATION VIA GENERATIVE ADVERSARIAL NETWORKS |
| 496 | 0 | Unsupervised Discovery of Parts, Structure, and Dynamics |
| 497 | 0 | Harmonizing Maximum Likelihood with GANs for Multimodal Conditional Generation |
| 498 | 0 | Minimal Images in Deep Neural Networks: Fragile Object Recognition in Natural Images |
| 499 | 0 | Overcoming the Disentanglement vs Reconstruction Trade-off via Jacobian Supervision |
| 500 | 0 | Visual Reasoning by Progressive Module Networks |