| Rank |
Cited by |
Paper name |
| 0 |
210 |
Glow: Generative Flow with Invertible 1×1 Convolutions |
| 1 |
186 |
Are GANs Created Equal? A Large-Scale Study |
| 2 |
180 |
Neural Ordinary Differential Equations |
| 3 |
176 |
Visualizing the Loss Landscape of Neural Nets |
| 4 |
123 |
How Does Batch Normalization Help Optimization? |
| 5 |
114 |
Isolating Sources of Disentanglement in Variational Autoencoders |
| 6 |
110 |
Video-to-Video Synthesis |
| 7 |
98 |
Natasha 2: Faster Non-Convex Optimization Than SGD |
| 8 |
95 |
PointCNN: Convolution On X-Transformed Points |
| 9 |
93 |
Adversarially Robust Generalization Requires More Data |
| 10 |
84 |
Realistic Evaluation of Deep Semi-Supervised Learning Algorithms |
| 11 |
81 |
Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data |
| 12 |
77 |
Scaling provable adversarial defenses |
| 13 |
76 |
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis |
| 14 |
74 |
Derivative Estimation in Random Design |
| 15 |
73 |
Neural Tangent Kernel: Convergence and Generalization in Neural Networks |
| 16 |
70 |
An intriguing failing of convolutional neural networks and the CoordConv solution |
| 17 |
70 |
Neural Architecture Optimization |
| 18 |
70 |
Data-Efficient Hierarchical Reinforcement Learning |
| 19 |
69 |
Sanity Checks for Saliency Maps |
| 20 |
68 |
Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents |
| 21 |
67 |
Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models |
| 22 |
65 |
Neural Architecture Search with Bayesian Optimisation and Optimal Transport |
| 23 |
62 |
TADAM: Task dependent adaptive metric for improved few-shot learning |
| 24 |
58 |
Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels |
| 25 |
57 |
Probabilistic Model-Agnostic Meta-Learning |
| 26 |
57 |
On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport |
| 27 |
56 |
Searching for Efficient Multi-Scale Architectures for Dense Image Prediction |
| 28 |
56 |
SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path-Integrated Differential Estimator |
| 29 |
55 |
Playing hard exploration games by watching YouTube |
| 30 |
55 |
Recurrent World Models Facilitate Policy Evolution |
| 31 |
55 |
Conditional Adversarial Domain Adaptation |
| 32 |
54 |
CatBoost: unbiased boosting with categorical features |
| 33 |
54 |
Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation |
| 34 |
53 |
Co-teaching: Robust training of deep neural networks with extremely noisy labels |
| 35 |
52 |
Neural Voice Cloning with a Few Samples |
| 36 |
52 |
Adversarial vulnerability for any classifier |
| 37 |
51 |
Hierarchical Graph Representation Learning with Differentiable Pooling |
| 38 |
50 |
Gradient Sparsification for Communication-Efficient Distributed Optimization |
| 39 |
49 |
Stochastic Cubic Regularization for Fast Nonconvex Optimization |
| 40 |
48 |
Bilinear Attention Networks |
| 41 |
47 |
SNIPER: Efficient Multi-Scale Training |
| 42 |
46 |
NEON2: Finding Local Minima via First-Order Oracles |
| 43 |
44 |
First-order Stochastic Algorithms for Escaping From Saddle Points in Almost Linear Time |
| 44 |
43 |
DropBlock: A regularization method for convolutional networks |
| 45 |
43 |
Visual Reinforcement Learning with Imagined Goals |
| 46 |
42 |
Empirical Risk Minimization Under Fairness Constraints |
| 47 |
41 |
Link Prediction Based on Graph Neural Networks |
| 48 |
41 |
Learning to Navigate in Cities Without a Map |
| 49 |
41 |
PacGAN: The power of two samples in generative adversarial networks |
| 50 |
41 |
Gradient Descent for Spiking Neural Networks |
| 51 |
40 |
Implicit Bias of Gradient Descent on Linear Convolutional Networks |
| 52 |
40 |
Learning to Infer Graphics Programs from Hand-Drawn Images |
| 53 |
40 |
Is Q-Learning Provably Efficient? |
| 54 |
38 |
Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks |
| 55 |
38 |
Meta-Reinforcement Learning of Structured Exploration Strategies |
| 56 |
37 |
A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks |
| 57 |
37 |
Pelee: A Real-Time Object Detection System on Mobile Devices |
| 58 |
36 |
Understanding Batch Normalization |
| 59 |
36 |
Unsupervised Text Style Transfer using Language Models as Discriminators |
| 60 |
36 |
DeepProbLog: Neural Probabilistic Logic Programming |
| 61 |
36 |
Recurrent Relational Networks |
| 62 |
35 |
Assessing the Scalability of Biologically-Motivated Deep Learning Algorithms and Architectures |
| 63 |
35 |
Predictive Uncertainty Estimation via Prior Networks |
| 64 |
35 |
Lipschitz-Margin Training: Scalable Certification of Perturbation Invariance for Deep Neural Networks |
| 65 |
35 |
Why Is My Classifier Discriminatory? |
| 66 |
35 |
Non-Local Recurrent Network for Image Restoration |
| 67 |
35 |
Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding |
| 68 |
34 |
Generalisation in humans and deep neural networks |
| 69 |
34 |
Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator |
| 70 |
34 |
Discrimination-aware Channel Pruning for Deep Neural Networks |
| 71 |
34 |
Long short-term memory and Learning-to-learn in networks of spiking neurons |
| 72 |
34 |
Implicit Reparameterization Gradients |
| 73 |
33 |
Joint Autoregressive and Hierarchical Priors for Learned Image Compression |
| 74 |
33 |
Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise |
| 75 |
33 |
Efficient Neural Network Robustness Certification with General Activation Functions |
| 76 |
33 |
Multi-Task Learning as Multi-Objective Optimization |
| 77 |
32 |
Constrained Graph Variational Autoencoders for Molecule Design |
| 78 |
32 |
A Probabilistic U-Net for Segmentation of Ambiguous Images |
| 79 |
32 |
Assessing Generative Models via Precision and Recall |
| 80 |
31 |
Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs |
| 81 |
31 |
Randomized Prior Functions for Deep Reinforcement Learning |
| 82 |
31 |
LF-Net: Learning Local Features from Images |
| 83 |
31 |
Adversarial Examples that Fool both Computer Vision and Time-Limited Humans |
| 84 |
31 |
Meta-Gradient Reinforcement Learning |
| 85 |
31 |
Image-to-image translation for cross-domain disentanglement |
| 86 |
31 |
Large Margin Deep Networks for Classification |
| 87 |
30 |
Semidefinite relaxations for certifying robustness to adversarial examples |
| 88 |
30 |
Reinforcement Learning for Solving the Vehicle Routing Problem |
| 89 |
30 |
Evolved Policy Gradients |
| 90 |
30 |
Byzantine Stochastic Gradient Descent |
| 91 |
30 |
Generating Informative and Diverse Conversational Responses via Adversarial Information Maximization |
| 92 |
29 |
A Unified View of Piecewise Linear Neural Network Verification |
| 93 |
29 |
Sparsified SGD with Memory |
| 94 |
29 |
Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization |
| 95 |
29 |
Tree-to-tree Neural Networks for Program Translation |
| 96 |
28 |
Unsupervised Attention-guided Image-to-Image Translation |
| 97 |
28 |
Discovery of Latent 3D Keypoints via End-to-end Geometric Reasoning |
| 98 |
27 |
3D Steerable CNNs: Learning Rotationally Equivariant Features in Volumetric Data |
| 99 |
27 |
The challenge of realistic music generation: modelling raw audio at scale |
| 100 |
27 |
Speaker-Follower Models for Vision-and-Language Navigation |
| 101 |
27 |
Entropy and mutual information in models of deep neural networks |
| 102 |
27 |
FRAGE: Frequency-Agnostic Word Representation |
| 103 |
26 |
Fast and Effective Robustness Certification |
| 104 |
26 |
Flexible neural representation for physics prediction |
| 105 |
26 |
Does mitigating ML’s impact disparity require treatment disparity? |
| 106 |
26 |
Verifiable Reinforcement Learning via Policy Extraction |
| 107 |
25 |
Balanced Policy Evaluation and Learning |
| 108 |
25 |
Reinforcement Learning of Theorem Proving |
| 109 |
25 |
Learning Plannable Representations with Causal InfoGAN |
| 110 |
25 |
A Lyapunov-based Approach to Safe Reinforcement Learning |
| 111 |
25 |
Neural Arithmetic Logic Units |
| 112 |
25 |
Training Deep Neural Networks with 8-bit Floating Point Numbers |
| 113 |
25 |
Relational recurrent neural networks |
| 114 |
25 |
ResNet with one-neuron hidden layers is a Universal Approximator |
| 115 |
25 |
Optimal Algorithms for Non-Smooth Distributed Optimization in Networks |
| 116 |
25 |
How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD |
| 117 |
25 |
Which Neural Net Architectures Give Rise to Exploding and Vanishing Gradients? |
| 118 |
25 |
Learning to Decompose and Disentangle Representations for Video Prediction |
| 119 |
24 |
Deep State Space Models for Time Series Forecasting |
| 120 |
24 |
Towards Robust Interpretability with Self-Explaining Neural Networks |
| 121 |
24 |
Learning Attentional Communication for Multi-Agent Cooperation |
| 122 |
24 |
The Convergence of Sparsified Gradient Methods |
| 123 |
24 |
Task-Driven Convolutional Recurrent Models of the Visual System |
| 124 |
24 |
SimplE Embedding for Link Prediction in Knowledge Graphs |
| 125 |
24 |
How to Start Training: The Effect of Initialization and Architecture |
| 126 |
24 |
IntroVAE: Introspective Variational Autoencoders for Photographic Image Synthesis |
| 127 |
23 |
Bayesian Model-Agnostic Meta-Learning |
| 128 |
23 |
Memory Replay GANs: Learning to Generate New Categories without Forgetting |
| 129 |
23 |
LAG: Lazily Aggregated Gradient for Communication-Efficient Distributed Learning |
| 130 |
23 |
Hessian-based Analysis of Large Batch Training and Robustness to Adversaries |
| 131 |
23 |
Online Learning with an Unknown Fairness Metric |
| 132 |
23 |
Neural Nearest Neighbors Networks |
| 133 |
22 |
ATOMO: Communication-efficient Learning via Atomic Sparsification |
| 134 |
22 |
GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration |
| 135 |
22 |
End-to-End Differentiable Physics for Learning and Control |
| 136 |
22 |
Efficient Formal Safety Analysis of Neural Networks |
| 137 |
22 |
Probabilistic Matrix Factorization for Automated Machine Learning |
| 138 |
22 |
Re-evaluating evaluation |
| 139 |
22 |
Delta-encoder: an effective sample synthesis method for few-shot object recognition |
| 140 |
22 |
GIANT: Globally Improved Approximate Newton Method for Distributed Optimization |
| 141 |
22 |
Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate |
| 142 |
22 |
Neighbourhood Consensus Networks |
| 143 |
22 |
Combinatorial Optimization with Graph Convolutional Networks and Guided Tree Search |
| 144 |
21 |
Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks |
| 145 |
21 |
Insights on representational similarity in neural networks with canonical correlation |
| 146 |
21 |
Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation |
| 147 |
21 |
Direct Runge-Kutta Discretization Achieves Acceleration |
| 148 |
21 |
SLAYER: Spike Layer Error Reassignment in Time |
| 149 |
20 |
Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization |
| 150 |
20 |
Phase Retrieval Under a Generative Prior |
| 151 |
20 |
Differentiable MPC for End-to-end Planning and Control |
| 152 |
20 |
On gradient regularizers for MMD GANs |
| 153 |
20 |
To Trust Or Not To Trust A Classifier |
| 154 |
20 |
Fairness Through Computationally-Bounded Awareness |
| 155 |
20 |
Learning to Optimize Tensor Programs |
| 156 |
20 |
Evidential Deep Learning to Quantify Classification Uncertainty |
| 157 |
20 |
Moonshine: Distilling with Cheap Convolutions |
| 158 |
20 |
A Smoothed Analysis of the Greedy Algorithm for the Linear Contextual Bandit Problem |
| 159 |
20 |
Deep Attentive Tracking via Reciprocative Learning |
| 160 |
20 |
A^2-Nets: Double Attention Networks |
| 161 |
19 |
Clebsch–Gordan Nets: a Fully Fourier Space Spherical Convolutional Neural Network |
| 162 |
19 |
Latent Alignment and Variational Attention |
| 163 |
19 |
Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects |
| 164 |
19 |
Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion |
| 165 |
19 |
Reward learning from human preferences and demonstrations in Atari |
| 166 |
19 |
Unsupervised Cross-Modal Alignment of Speech and Text Embedding Spaces |
| 167 |
19 |
Banach Wasserstein GAN |
| 168 |
19 |
Practical Deep Stereo (PDS): Toward applications-friendly deep stereo matching |
| 169 |
19 |
Amortized Inference Regularization |
| 170 |
19 |
MetaGAN: An Adversarial Approach to Few-Shot Learning |
| 171 |
19 |
Reinforced Continual Learning |
| 172 |
19 |
Explanations based on the Missing: Towards Contrastive Explanations with Pertinent Negatives |
| 173 |
18 |
Theoretical Linear Convergence of Unfolded ISTA and Its Practical Weights and Thresholds |
| 174 |
18 |
Graph Oracle Models, Lower Bounds, and Gaps for Parallel Stochastic Optimization |
| 175 |
18 |
Constructing Unrestricted Adversarial Examples with Generative Models |
| 176 |
18 |
Hybrid Macro/Micro Level Backpropagation for Training Deep Spiking Neural Networks |
| 177 |
18 |
Dimensionally Tight Bounds for Second-Order Hamiltonian Monte Carlo |
| 178 |
18 |
Spectral Filtering for General Linear Dynamical Systems |
| 179 |
18 |
Adaptive Sampling Towards Fast Graph Representation Learning |
| 180 |
18 |
On the Dimensionality of Word Embedding |
| 181 |
18 |
Are ResNets Provably Better than Linear Predictors? |
| 182 |
18 |
Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced |
| 183 |
17 |
Life-Long Disentangled Representation Learning with Cross-Domain Latent Homologies |
| 184 |
17 |
Adaptive Methods for Nonconvex Optimization |
| 185 |
17 |
Learning to Play With Intrinsically-Motivated, Self-Aware Agents |
| 186 |
17 |
Communication Compression for Decentralized Training |
| 187 |
17 |
Masking: A New Perspective of Noisy Supervision |
| 188 |
17 |
Hyperbolic Neural Networks |
| 189 |
17 |
Faster Neural Networks Straight from JPEG |
| 190 |
17 |
Online Structured Laplace Approximations for Overcoming Catastrophic Forgetting |
| 191 |
17 |
Zeroth-Order Stochastic Variance Reduction for Nonconvex Optimization |
| 192 |
17 |
Actor-Critic Policy Optimization in Partially Observable Multiagent Environments |
| 193 |
17 |
The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal |
| 194 |
17 |
Norm matters: efficient and accurate normalization schemes in deep networks |
| 195 |
17 |
Generalized Zero-Shot Learning with Deep Calibration Network |
| 196 |
17 |
FD-GAN: Pose-guided Feature Distilling GAN for Robust Person Re-identification |
| 197 |
16 |
Deep Generative Models with Learnable Knowledge Constraints |
| 198 |
16 |
Deepcode: Feedback Codes via Deep Learning |
| 199 |
16 |
The Limit Points of (Optimistic) Gradient Descent in Min-Max Optimization |
| 200 |
16 |
Watch Your Step: Learning Node Embeddings via Graph Attention |
| 201 |
16 |
FastGRNN: A Fast, Accurate, Stable and Tiny Kilobyte Sized Gated Recurrent Neural Network |
| 202 |
16 |
Adversarial Multiple Source Domain Adaptation |
| 203 |
16 |
Bayesian Control of Large MDPs with Unknown Dynamics in Data-Poor Environments |
| 204 |
16 |
Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples |
| 205 |
16 |
Multi-Agent Generative Adversarial Imitation Learning |
| 206 |
16 |
A Bayes-Sard Cubature Method |
| 207 |
16 |
Generalizing to Unseen Domains via Adversarial Data Augmentation |
| 208 |
16 |
On Learning Intrinsic Rewards for Policy Gradient Methods |
| 209 |
16 |
Towards Robust Detection of Adversarial Examples |
| 210 |
16 |
Adding One Neuron Can Eliminate All Bad Local Minima |
| 211 |
16 |
Unsupervised Learning of Shape and Pose with Differentiable Point Clouds |
| 212 |
16 |
Representation Balancing MDPs for Off-policy Policy Evaluation |
| 213 |
16 |
A Unified Feature Disentangler for Multi-Domain Image Translation and Manipulation |
| 214 |
16 |
A Linear Speedup Analysis of Distributed Deep Learning with Sparse and Quantized Communication |
| 215 |
16 |
Embedding Logical Queries on Knowledge Graphs |
| 216 |
16 |
Learning Deep Disentangled Embeddings With the F-Statistic Loss |
| 217 |
15 |
Mesh-TensorFlow: Deep Learning for Supercomputers |
| 218 |
15 |
A Stein variational Newton method |
| 219 |
15 |
Learning Conditioned Graph Structures for Interpretable Visual Question Answering |
| 220 |
15 |
cpSGD: Communication-efficient and differentially-private distributed SGD |
| 221 |
15 |
Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction |
| 222 |
15 |
Constrained Generation of Semantically Valid Graphs via Regularizing Variational Autoencoders |
| 223 |
15 |
On the Convergence and Robustness of Training GANs with Regularized Optimal Transport |
| 224 |
15 |
Multimodal Generative Models for Scalable Weakly-Supervised Learning |
| 225 |
15 |
RetGK: Graph Kernels based on Return Probabilities of Random Walks |
| 226 |
15 |
Multi-Layered Gradient Boosting Decision Trees |
| 227 |
15 |
Domain-Invariant Projection Learning for Zero-Shot Recognition |
| 228 |
15 |
Soft-Gated Warping-GAN for Pose-Guided Person Image Synthesis |
| 229 |
15 |
MetaAnchor: Learning to Detect Objects with Customized Anchors |
| 230 |
15 |
Visual Object Networks: Image Generation with Disentangled 3D Representations |
| 231 |
14 |
Decentralize and Randomize: Faster Algorithm for Wasserstein Barycenters |
| 232 |
14 |
Robust Learning of Fixed-Structure Bayesian Networks |
| 233 |
14 |
Generalizing Point Embeddings using the Wasserstein Space of Elliptical Distributions |
| 234 |
14 |
Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing |
| 235 |
14 |
Robot Learning in Homes: Improving Generalization and Reducing Dataset Bias |
| 236 |
14 |
Dendritic cortical microcircuits approximate the backpropagation algorithm |
| 237 |
14 |
Spectral Signatures in Backdoor Attacks |
| 238 |
14 |
VideoCapsuleNet: A Simplified Network for Action Detection |
| 239 |
14 |
Simple, Distributed, and Accelerated Probabilistic Programming |
| 240 |
14 |
Learning towards Minimum Hyperspherical Energy |
| 241 |
14 |
On GANs and GMMs |
| 242 |
14 |
Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model |
| 243 |
14 |
Scalable methods for 8-bit training of neural networks |
| 244 |
14 |
Adversarial Text Generation via Feature-Mover’s Distance |
| 245 |
14 |
Importance Weighting and Variational Inference |
| 246 |
14 |
Out of the Box: Reasoning with Graph Convolution Nets for Factual Visual Question Answering |
| 247 |
14 |
Learning to Reconstruct Shapes from Unseen Classes |
| 248 |
14 |
Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation |
| 249 |
14 |
Where Do You Think You’re Going?: Inferring Beliefs about Dynamics from Behavior |
| 250 |
14 |
Fairness Behind a Veil of Ignorance: A Welfare Analysis for Automated Decision Making |
| 251 |
14 |
FishNet: A Versatile Backbone for Image, Region, and Pixel Level Prediction |
| 252 |
13 |
Co-regularized Alignment for Unsupervised Domain Adaptation |
| 253 |
13 |
RenderNet: A deep convolutional network for differentiable rendering from 3D shapes |
| 254 |
13 |
Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization |
| 255 |
13 |
Distributed Multi-Player Bandits – a Game of Thrones Approach |
| 256 |
13 |
Learning to Teach with Dynamic Loss Functions |
| 257 |
13 |
Differential Properties of Sinkhorn Approximation for Learning with Wasserstein Distance |
| 258 |
13 |
Stochastic Nested Variance Reduced Gradient Descent for Nonconvex Optimization |
| 259 |
13 |
Minimax Statistical Learning with Wasserstein distances |
| 260 |
13 |
Non-monotone Submodular Maximization in Exponentially Fewer Iterations |
| 261 |
13 |
Evolution-Guided Policy Gradient in Reinforcement Learning |
| 262 |
13 |
Empirical Risk Minimization in Non-interactive Local Differential Privacy Revisited |
| 263 |
13 |
Self-Erasing Network for Integral Object Attention |
| 264 |
13 |
PAC-learning in the presence of adversaries |
| 265 |
12 |
Unsupervised Image-to-Image Translation Using Domain-Specific Variational Information Bound |
| 266 |
12 |
Neural Proximal Gradient Descent for Compressive Imaging |
| 267 |
12 |
Confounding-Robust Policy Improvement |
| 268 |
12 |
Reducing Network Agnostophobia |
| 269 |
12 |
DeepPINK: reproducible feature selection in deep neural networks |
| 270 |
12 |
Data-dependent PAC-Bayes priors via differential privacy |
| 271 |
12 |
Sparse Attentive Backtracking: Temporal Credit Assignment Through Reminding |
| 272 |
12 |
Knowledge Distillation by On-the-Fly Native Ensemble |
| 273 |
12 |
Dual Policy Iteration |
| 274 |
12 |
Differentially Private Testing of Identity and Closeness of Discrete Distributions |
| 275 |
12 |
On Fast Leverage Score Sampling and Optimal Learning |
| 276 |
12 |
How Much Restricted Isometry is Needed In Nonconvex Matrix Recovery? |
| 277 |
12 |
A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization |
| 278 |
12 |
COLA: Decentralized Linear Learning |
| 279 |
12 |
A theory on the absence of spurious solutions for nonconvex and nonsmooth optimization |
| 280 |
12 |
DVAE#: Discrete Variational Autoencoders with Relaxed Boltzmann Priors |
| 281 |
12 |
Simple random search of static linear policies is competitive for reinforcement learning |
| 282 |
12 |
Accelerated Stochastic Matrix Inversion: General Theory and Speeding up BFGS Rules for Faster Second-Order Optimization |
| 283 |
12 |
Overcoming Language Priors in Visual Question Answering with Adversarial Regularization |
| 284 |
12 |
KDGAN: Knowledge Distillation with Generative Adversarial Networks |
| 285 |
12 |
Do Less, Get More: Streaming Submodular Maximization with Subsampling |
| 286 |
12 |
Learning Disentangled Joint Continuous and Discrete Representations |
| 287 |
12 |
Dialog-based Interactive Image Retrieval |
| 288 |
11 |
Context-aware Synthesis and Placement of Object Instances |
| 289 |
11 |
Adversarial Risk and Robustness: General Definitions and Implications for the Uniform Distribution |
| 290 |
11 |
Learning with SGD and Random Features |
| 291 |
11 |
A Retrieve-and-Edit Framework for Predicting Structured Outputs |
| 292 |
11 |
Deep Dynamical Modeling and Control of Unsteady Fluid Flows |
| 293 |
11 |
Group Equivariant Capsule Networks |
| 294 |
11 |
Adversarial Regularizers in Inverse Problems |
| 295 |
11 |
Depth-Limited Solving for Imperfect-Information Games |
| 296 |
11 |
Online Adaptive Methods, Universality and Acceleration |
| 297 |
11 |
Privacy Amplification by Subsampling: Tight Analyses via Couplings and Divergences |
| 298 |
11 |
Tangent: Automatic differentiation using source-code transformation for dynamically typed array programming |
| 299 |
11 |
Unsupervised Video Object Segmentation for Deep Reinforcement Learning |
| 300 |
11 |
Fast Greedy MAP Inference for Determinantal Point Process to Improve Recommendation Diversity |
| 301 |
11 |
Can We Gain More from Orthogonality Regularizations in Training Deep Networks? |
| 302 |
11 |
Unsupervised Learning of Object Landmarks through Conditional Image Generation |
| 303 |
11 |
Data center cooling using model-predictive control |
| 304 |
11 |
Adversarial Attacks on Stochastic Bandits |
| 305 |
11 |
Multivariate Convolutional Sparse Coding for Electromagnetic Brain Signals |
| 306 |
11 |
Deep Reinforcement Learning of Marked Temporal Point Processes |
| 307 |
11 |
One-Shot Unsupervised Cross Domain Translation |
| 308 |
11 |
Distilled Wasserstein Learning for Word Embedding and Topic Modeling |
| 309 |
11 |
Deep Defense: Training DNNs with Improved Adversarial Robustness |
| 310 |
11 |
Sparse DNNs with Improved Adversarial Robustness |
| 311 |
11 |
Learning long-range spatial dependencies with horizontal gated recurrent units |
| 312 |
10 |
The Price of Fair PCA: One Extra dimension |
| 313 |
10 |
Domain Adaptation by Using Causal Inference to Predict Invariant Conditional Distributions |
| 314 |
10 |
Human-in-the-Loop Interpretability Prior |
| 315 |
10 |
Scalable End-to-End Autonomous Vehicle Testing via Rare-event Simulation |
| 316 |
10 |
Large Scale computation of Means and Clusters for Persistence Diagrams using Optimal Transport |
| 317 |
10 |
Deep Anomaly Detection Using Geometric Transformations |
| 318 |
10 |
Hardware Conditioned Policies for Multi-Robot Transfer Learning |
| 319 |
10 |
Statistical Optimality of Stochastic Gradient Descent on Hard Learning Problems through Multiple Passes |
| 320 |
10 |
Layer-Wise Coordination between Encoder and Decoder for Neural Machine Translation |
| 321 |
10 |
Plug-in Estimation in High-Dimensional Linear Inverse Problems: A Rigorous Analysis |
| 322 |
10 |
Chaining Mutual Information and Tightening Generalization Bounds |
| 323 |
10 |
Causal Inference with Noisy and Missing Covariates via Matrix Factorization |
| 324 |
10 |
Scalable Hyperparameter Transfer Learning |
| 325 |
10 |
Generative Probabilistic Novelty Detection with Adversarial Autoencoders |
| 326 |
10 |
BRITS: Bidirectional Recurrent Imputation for Time Series |
| 327 |
10 |
Compact Generalized Non-local Network |
| 328 |
10 |
Recurrent Transformer Networks for Semantic Correspondence |
| 329 |
10 |
A Dual Framework for Low-rank Tensor Completion |
| 330 |
10 |
Policy Optimization via Importance Sampling |
| 331 |
10 |
Boolean Decision Rules via Column Generation |
| 332 |
10 |
MiME: Multilevel Medical Embedding of Electronic Health Records for Predictive Healthcare |
| 333 |
10 |
Leveraging the Exact Likelihood of Deep Latent Variable Models |
| 334 |
10 |
The committee machine: Computational to statistical gaps in learning a two-layers neural network |
| 335 |
10 |
Paraphrasing Complex Network: Network Compression via Factor Transfer |
| 336 |
10 |
Learning Hierarchical Semantic Image Manipulation through Structured Representations |
| 337 |
10 |
Leveraged volume sampling for linear regression |
| 338 |
10 |
3D-Aware Scene Manipulation via Inverse Graphics |
| 339 |
10 |
On Oracle-Efficient PAC RL with Rich Observations |
| 340 |
10 |
Adaptive Online Learning in Dynamic Environments |
| 341 |
10 |
Posterior Concentration for Sparse Deep Learning |
| 342 |
10 |
Deep Neural Nets with Interpolating Function as Output Activation |
| 343 |
10 |
Adapted Deep Embeddings: A Synthesis of Methods for k-Shot Inductive Transfer Learning |
| 344 |
9 |
Learning to Share and Hide Intentions using Information Regularization |
| 345 |
9 |
Deep Predictive Coding Network with Local Recurrent Processing for Object Recognition |
| 346 |
9 |
Reversible Recurrent Neural Networks |
| 347 |
9 |
Variational Inverse Control with Events: A General Framework for Data-Driven Reward Definition |
| 348 |
9 |
Transfer Learning with Neural AutoML |
| 349 |
9 |
Inference in Deep Gaussian Processes using Stochastic Gradient Hamiltonian Monte Carlo |
| 350 |
9 |
Training Neural Networks Using Features Replay |
| 351 |
9 |
On Coresets for Logistic Regression |
| 352 |
9 |
Multi-View Silhouette and Depth Decomposition for High Resolution 3D Object Representation |
| 353 |
9 |
Deep Generative Models for Distribution-Preserving Lossy Compression |
| 354 |
9 |
Learning Task Specifications from Demonstrations |
| 355 |
9 |
Deep Generative Markov State Models |
| 356 |
9 |
TopRank: A practical algorithm for online stochastic ranking |
| 357 |
9 |
Escaping Saddle Points in Constrained Optimization |
| 358 |
9 |
Zeroth-order (Non)-Convex Stochastic Optimization via Conditional Gradient and Gradient Updates |
| 359 |
9 |
Nearly tight sample complexity bounds for learning mixtures of Gaussians via sample compression schemes |
| 360 |
9 |
Multivariate Time Series Imputation with Generative Adversarial Networks |
| 361 |
9 |
Toddler-Inspired Visual Object Learning |
| 362 |
9 |
Image Inpainting via Generative Multi-column Convolutional Neural Networks |
| 363 |
8 |
Learning Temporal Point Processes via Reinforcement Learning |
| 364 |
8 |
With Friends Like These, Who Needs Adversaries? |
| 365 |
8 |
Analysis of Krylov Subspace Solutions of Regularized Non-Convex Quadratic Problems |
| 366 |
8 |
Learning Abstract Options |
| 367 |
8 |
Improving Simple Models with Confidence Profiles |
| 368 |
8 |
Robustness of conditional GANs to noisy labels |
| 369 |
8 |
Blockwise Parallel Decoding for Deep Autoregressive Models |
| 370 |
8 |
Persistence Fisher Kernel: A Riemannian Manifold Kernel for Persistence Diagrams |
| 371 |
8 |
Maximizing acquisition functions for Bayesian optimization |
| 372 |
8 |
Global Non-convex Optimization with Discretized Diffusions |
| 373 |
8 |
Towards Understanding Learning Representations: To What Extent Do Different Neural Networks Learn the Same Representation |
| 374 |
8 |
Beyond Grids: Learning Graph Representations for Visual Recognition |
| 375 |
8 |
Bayesian multi-domain learning for cancer subtype discovery from next-generation sequencing count data |
| 376 |
8 |
Efficient High Dimensional Bayesian Optimization with Additivity and Quadrature Fourier Features |
| 377 |
8 |
Online Learning of Quantum States |
| 378 |
8 |
Automatic differentiation in ML: Where we are and where we should be going |
| 379 |
8 |
Generalisation of structural knowledge in the hippocampal-entorhinal system |
| 380 |
8 |
Hamiltonian Variational Auto-Encoder |
| 381 |
8 |
Pipe-SGD: A Decentralized Pipelined SGD Framework for Distributed Deep Net Training |
| 382 |
8 |
Approximate Knowledge Compilation by Online Collapsed Importance Sampling |
| 383 |
8 |
Beyond Log-concavity: Provable Guarantees for Sampling Multi-modal Distributions using Simulated Tempering Langevin Monte Carlo |
| 384 |
8 |
Distributed k-Clustering for Data with Heavy Noise |
| 385 |
8 |
Learning Libraries of Subroutines for Neurally–Guided Bayesian Program Induction |
| 386 |
8 |
Learning Loop Invariants for Program Verification |
| 387 |
8 |
Towards Text Generation with Adversarially Learned Neural Outlines |
| 388 |
8 |
Out-of-Distribution Detection using Multiple Semantic Label Representations |
| 389 |
8 |
Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks |
| 390 |
8 |
M-Walk: Learning to Walk over Graphs using Monte Carlo Tree Search |
| 391 |
8 |
Incorporating Context into Language Encoding Models for fMRI |
| 392 |
8 |
Approximating Real-Time Recurrent Learning with Random Kronecker Factors |
| 393 |
8 |
Turbo Learning for CaptionBot and DrawingBot |
| 394 |
8 |
L4: Practical loss-based stepsize adaptation for deep learning |
| 395 |
8 |
Online convex optimization for cumulative constraints |
| 396 |
8 |
Stacked Semantics-Guided Attention Model for Fine-Grained Zero-Shot Learning |
| 397 |
8 |
CapProNet: Deep Feature Learning via Orthogonal Projections onto Capsule Subspaces |
| 398 |
8 |
Content preserving text generation with attribute controls |
| 399 |
8 |
On the Local Minima of the Empirical Risk |
| 400 |
8 |
End-to-end Symmetry Preserving Inter-atomic Potential Energy Model for Finite and Extended Systems |
| 401 |
8 |
Mean-field theory of graph neural networks in graph partitioning |
| 402 |
8 |
Differentially Private Uniformly Most Powerful Tests for Binomial Data |
| 403 |
8 |
Heterogeneous Bitwidth Binarization in Convolutional Neural Networks |
| 404 |
8 |
Acceleration through Optimistic No-Regret Dynamics |
| 405 |
8 |
Bayesian Inference of Temporal Task Specifications from Demonstrations |
| 406 |
8 |
BinGAN: Learning Compact Binary Descriptors with a Regularized GAN |
| 407 |
8 |
Neural Code Comprehension: A Learnable Representation of Code Semantics |
| 408 |
8 |
Inequity aversion improves cooperation in intertemporal social dilemmas |
| 409 |
8 |
Batch-Instance Normalization for Adaptively Style-Invariant Neural Networks |
| 410 |
8 |
Local Differential Privacy for Evolving Data |
| 411 |
8 |
Attention in Convolutional LSTM for Gesture Recognition |
| 412 |
8 |
Symbolic Graph Reasoning Meets Convolutions |
| 413 |
8 |
Collaborative Learning for Deep Neural Networks |
| 414 |
8 |
Understanding the Role of Adaptivity in Machine Teaching: The Case of Version Space Learners |
| 415 |
8 |
Global Geometry of Multichannel Sparse Blind Deconvolution on the Sphere |
| 416 |
8 |
MetaReg: Towards Domain Generalization using Meta-Regularization |
| 417 |
8 |
Low-shot Learning via Covariance-Preserving Adversarial Augmentation Networks |
| 418 |
8 |
LinkNet: Relational Embedding for Scene Graph |
| 419 |
8 |
Nonlocal Neural Networks, Nonlocal Diffusion and Nonlocal Modeling |
| 420 |
8 |
Deep Functional Dictionaries: Learning Consistent Semantic Structures on 3D Models from Functions |
| 421 |
8 |
Self-Supervised Generation of Spatial Audio for 360° Video |
| 422 |
8 |
See and Think: Disentangling Semantic Scene Completion |
| 423 |
8 |
Geometrically Coupled Monte Carlo Sampling |
| 424 |
7 |
Understanding Regularized Spectral Clustering via Graph Conductance |
| 425 |
7 |
Connecting Optimization and Regularization Paths |
| 426 |
7 |
Nonparametric Density Estimation under Adversarial Losses |
| 427 |
7 |
Backpropagation with Callbacks: Foundations for Efficient and Expressive Differentiable Programming |
| 428 |
7 |
Generalization Bounds for Uniformly Stable Algorithms |
| 429 |
7 |
Towards Deep Conversational Recommendations |
| 430 |
7 |
Ex ante coordination and collusion in zero-sum multi-player extensive-form games |
| 431 |
7 |
Optimal Algorithms for Continuous Non-monotone Submodular and DR-Submodular Maximization |
| 432 |
7 |
Fast Approximate Natural Gradient Descent in a Kronecker Factored Eigenbasis |
| 433 |
7 |
DAGs with NO TEARS: Continuous Optimization for Structure Learning |
| 434 |
7 |
Quadrature-based features for kernel approximation |
| 435 |
7 |
Differential Privacy for Growing Databases |
| 436 |
7 |
HOUDINI: Lifelong Learning as Program Synthesis |
| 437 |
7 |
A Likelihood-Free Inference Framework for Population Genetic Data using Exchangeable Neural Networks |
| 438 |
7 |
How SGD Selects the Global Minima in Over-parameterized Learning: A Dynamical Stability Perspective |
| 439 |
7 |
Robust Hypothesis Testing Using Wasserstein Uncertainty Sets |
| 440 |
7 |
Streaming Kernel PCA with \tilde{O}(\sqrt{n}) Random Features |
| 441 |
7 |
Learning Latent Subspaces in Variational Autoencoders |
| 442 |
7 |
Distributed Learning without Distress: Privacy-Preserving Empirical Risk Minimization |
| 443 |
7 |
Information Constraints on Auto-Encoding Variational Bayes |
| 444 |
7 |
Dual Swap Disentangling |
| 445 |
7 |
A Convex Duality Framework for GANs |
| 446 |
7 |
ChannelNets: Compact and Efficient Convolutional Neural Networks via Channel-Wise Convolutions |
| 447 |
7 |
Neural Networks Trained to Solve Differential Equations Learn General Representations |
| 448 |
7 |
Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning |
| 449 |
7 |
But How Does It Work in Theory? Linear SVM with Random Features |
| 450 |
7 |
Faithful Inversion of Generative Models for Effective Amortized Inference |
| 451 |
7 |
Weakly Supervised Dense Event Captioning in Videos |
| 452 |
7 |
Near Optimal Exploration-Exploitation in Non-Communicating Markov Decision Processes |
| 453 |
7 |
Wasserstein Variational Inference |
| 454 |
7 |
BourGAN: Generative Networks with Metric Embeddings |
| 455 |
7 |
The Description Length of Deep Learning models |
| 456 |
7 |
Trajectory Convolution for Action Recognition |
| 457 |
7 |
Distributed Stochastic Optimization via Adaptive SGD |
| 458 |
7 |
Bayesian Semi-supervised Learning with Graph Gaussian Processes |
| 459 |
7 |
Multi-Class Learning: From Theory to Algorithm |
| 460 |
7 |
Hybrid Knowledge Routed Modules for Large-scale Object Detection |
| 461 |
7 |
A Game-Theoretic Approach to Recommendation Systems with Strategic Content Providers |
| 462 |
7 |
Greedy Hash: Towards Fast Optimization for Accurate Hash Coding in CNN |
| 463 |
7 |
A Model for Learned Bloom Filters and Optimizing by Sandwiching |
| 464 |
7 |
How Many Samples are Needed to Estimate a Convolutional Neural Network? |
| 465 |
7 |
Doubly Robust Bayesian Inference for Non-Stationary Streaming Data with \beta-Divergences |
| 466 |
6 |
Forward Modeling for Partial Observation Strategy Games – A StarCraft Defogger |
| 467 |
6 |
The Sparse Manifold Transform |
| 468 |
6 |
Learning to Solve SMT Formulas |
| 469 |
6 |
Bayesian Nonparametric Spectral Estimation |
| 470 |
6 |
Thwarting Adversarial Examples: An L_0-Robust Sparse Fourier Transform |
| 471 |
6 |
Online Robust Policy Learning in the Presence of Unknown Adversaries |
| 472 |
6 |
Object-Oriented Dynamics Predictor |
| 473 |
6 |
Improving Explorability in Variational Inference with Annealed Variational Objectives |
| 474 |
6 |
Learning Compressed Transforms with Low Displacement Rank |
| 475 |
6 |
Orthogonally Decoupled Variational Gaussian Processes |
| 476 |
6 |
Wasserstein Distributionally Robust Kalman Filtering |
| 477 |
6 |
Teaching Inverse Reinforcement Learners via Features and Demonstrations |
| 478 |
6 |
Credit Assignment For Collective Multiagent RL With Global Rewards |
| 479 |
6 |
Learning to Repair Software Vulnerabilities with Generative Adversarial Networks |
| 480 |
6 |
Generative modeling for protein structures |
| 481 |
6 |
Disconnected Manifold Learning for Generative Adversarial Networks |
| 482 |
6 |
REFUEL: Exploring Sparse Features in Deep Reinforcement Learning for Fast Disease Diagnosis |
| 483 |
6 |
BRUNO: A Deep Recurrent Model for Exchangeable Data |
| 484 |
6 |
Manifold-tiling Localized Receptive Fields are Optimal in Similarity-preserving Neural Networks |
| 485 |
6 |
Bayesian Alignments of Warped Multi-Output Gaussian Processes |
| 486 |
6 |
Sharp Bounds for Generalized Uniformity Testing |
| 487 |
6 |
Constructing Fast Network through Deconstruction of Convolution |
| 488 |
6 |
Adversarially Robust Optimization with Gaussian Processes |
| 489 |
6 |
Bandit Learning in Concave N-Person Games |
| 490 |
6 |
Occam’s razor is insufficient to infer the preferences of irrational agents |
| 491 |
6 |
The Spectrum of the Fisher Information Matrix of a Single-Hidden-Layer Neural Network |
| 492 |
6 |
Unsupervised Adversarial Invariance |
| 493 |
6 |
Densely Connected Attention Propagation for Reading Comprehension |
| 494 |
6 |
Training deep learning based denoisers without ground truth data |
| 495 |
6 |
NAIS-Net: Stable Deep Networks from Non-Autonomous Differential Equations |
| 496 |
6 |
Norm-Ranging LSH for Maximum Inner Product Search |
| 497 |
6 |
Learning a High Fidelity Pose Invariant Model for High-resolution Face Frontalization |
| 498 |
6 |
Answerer in Questioner’s Mind: Information Theoretic Approach to Goal-Oriented Visual Dialog |
| 499 |
6 |
Model Agnostic Supervised Local Explanations |
| 500 |
6 |
Modular Networks: Learning to Decompose Neural Computation |
| 501 |
6 |
Structured Local Minima in Sparse Blind Deconvolution |
| 502 |
6 |
Smoothed analysis of the low-rank approach for smooth semidefinite programs |
| 503 |
6 |
Efficient Stochastic Gradient Hard Thresholding |
| 504 |
6 |
Random Feature Stein Discrepancies |
| 505 |
6 |
Variational Memory Encoder-Decoder |
| 506 |
6 |
On Misinformation Containment in Online Social Networks |
| 507 |
6 |
Deep Non-Blind Deconvolution via Generalized Low-Rank Approximation |
| 508 |
6 |
Sigsoftmax: Reanalysis of the Softmax Bottleneck |
| 509 |
6 |
Supervised autoencoders: Improving generalization performance with unsupervised regularizers |
| 510 |
6 |
Text-Adaptive Generative Adversarial Networks: Manipulating Images with Natural Language |
| 511 |
6 |
Structure-Aware Convolutional Neural Networks |
| 512 |
6 |
Efficient Algorithms for Non-convex Isotonic Regression through Submodular Optimization |
| 513 |
5 |
Bias and Generalization in Deep Generative Models: An Empirical Study |
| 514 |
5 |
Benefits of over-parameterization with EM |
| 515 |
5 |
Fully Neural Network Based Speech Recognition on Mobile and Embedded Devices |
| 516 |
5 |
Diversity-Driven Exploration Strategy for Deep Reinforcement Learning |
| 517 |
5 |
Gaussian Process Prior Variational Autoencoders |
| 518 |
5 |
Learning To Learn Around A Common Mean |
| 519 |
5 |
Low-Rank Tucker Decomposition of Large Tensors Using TensorSketch |
| 520 |
5 |
Blind Deconvolutional Phase Retrieval via Convex Programming |
| 521 |
5 |
Coupled Variational Bayes via Optimization Embedding |
| 522 |
5 |
Improving Online Algorithms via ML Predictions |
| 523 |
5 |
e-SNLI: Natural Language Inference with Natural Language Explanations |
| 524 |
5 |
Invariant Representations without Adversarial Training |
| 525 |
5 |
SING: Symbol-to-Instrument Neural Generator |
| 526 |
5 |
A Structured Prediction Approach for Label Ranking |
| 527 |
5 |
Uniform Convergence of Gradients for Non-Convex Learning and Optimization |
| 528 |
5 |
Almost Optimal Algorithms for Linear Stochastic Bandits with Heavy-Tailed Payoffs |
| 529 |
5 |
Deep Network for the Integrated 3D Sensing of Multiple People in Natural Images |
| 530 |
5 |
On preserving non-discrimination when combining expert advice |
| 531 |
5 |
Algorithms and Theory for Multiple-Source Adaptation |
| 532 |
5 |
Variational Bayesian Monte Carlo |
| 533 |
5 |
Adversarial Scene Editing: Automatic Object Removal from Weak Supervision |
| 534 |
5 |
Non-Adversarial Mapping with VAEs |
| 535 |
5 |
Stochastic Chebyshev Gradient Descent for Spectral Optimization |
| 536 |
5 |
Implicit Probabilistic Integrators for ODEs |
| 537 |
5 |
Provably Correct Automatic Sub-Differentiation for Qualified Programs |
| 538 |
5 |
Heterogeneous Multi-output Gaussian Process Prediction |
| 539 |
5 |
Contamination Attacks and Mitigation in Multi-Party Machine Learning |
| 540 |
5 |
Bayesian Distributed Stochastic Gradient Descent |
| 541 |
5 |
Sequential Test for the Lowest Mean: From Thompson to Murphy Sampling |
| 542 |
5 |
Navigating with Graph Representations for Fast and Scalable Decoding of Neural Language Models |
| 543 |
5 |
Dirichlet-based Gaussian Processes for Large-scale Calibrated Classification |
| 544 |
5 |
Automating Bayesian optimization with Bayesian optimization |
| 545 |
5 |
Exact natural gradient in deep linear networks and its application to the nonlinear case |
| 546 |
5 |
Binary Classification from Positive-Confidence Data |
| 547 |
5 |
Learning to Multitask |
| 548 |
5 |
Variational Inference with Tail-adaptive f-Divergence |
| 549 |
5 |
Learning Others’ Intentional Models in Multi-Agent Settings Using Interactive POMDPs |
| 550 |
5 |
Inference Aided Reinforcement Learning for Incentive Mechanism Design in Crowdsourcing |
| 551 |
5 |
Estimating Learnability in the Sublinear Data Regime |
| 552 |
5 |
Semi-supervised Deep Kernel Learning: Regression with Unlabeled Data by Minimizing Predictive Variance |
| 553 |
5 |
Multiple-Step Greedy Policies in Approximate and Online Reinforcement Learning |
| 554 |
5 |
Supervising Unsupervised Learning |
| 555 |
5 |
The Physical Systems Behind Optimization Algorithms |
| 556 |
5 |
The Price of Privacy for Low-rank Factorization |
| 557 |
5 |
Distributed Weight Consolidation: A Brain Segmentation Case Study |
| 558 |
5 |
Learning sparse neural networks via sensitivity-driven regularization |
| 559 |
5 |
Lipschitz regularity of deep neural networks: analysis and efficient estimation |
| 560 |
5 |
A Bandit Approach to Sequential Experimental Design with False Discovery Control |
| 561 |
5 |
Optimal Subsampling with Influence Functions |
| 562 |
5 |
Modern Neural Networks Generalize on Small Data Sets |
| 563 |
5 |
Boosting Black Box Variational Inference |
| 564 |
5 |
Single-Agent Policy Tree Search With Guarantees |
| 565 |
5 |
Q-learning with Nearest Neighbors |
| 566 |
5 |
Near-Optimal Policies for Dynamic Multinomial Logit Assortment Selection Models |
| 567 |
5 |
Dialog-to-Action: Conversational Question Answering Over a Large-Scale Knowledge Base |
| 568 |
5 |
Mirrored Langevin Dynamics |
| 569 |
5 |
Computing Higher Order Derivatives of Matrix and Tensor Expressions |
| 570 |
5 |
Gaussian Process Conditional Density Estimation |
| 571 |
5 |
Sequential Context Encoding for Duplicate Removal |
| 572 |
5 |
Precision and Recall for Time Series |
| 573 |
5 |
Partially-Supervised Image Captioning |
| 574 |
5 |
Temporal Regularization for Markov Decision Process |
| 575 |
5 |
Neural Guided Constraint Logic Programming for Program Synthesis |
| 576 |
5 |
Learning Versatile Filters for Efficient Convolutional Neural Networks |
| 577 |
5 |
Found Graph Data and Planted Vertex Covers |
| 578 |
5 |
Generative Neural Machine Translation |
| 579 |
5 |
Revisiting Multi-Task Learning with ROCK: a Deep Residual Auxiliary Block for Visual Detection |
| 580 |
5 |
Unsupervised Learning of View-invariant Action Representations |
| 581 |
5 |
A flexible model for training action localization with varying levels of supervision |
| 582 |
5 |
Solving Large Sequential Games with the Excessive Gap Technique |
| 583 |
5 |
On Learning Markov Chains |
| 584 |
5 |
Rest-Katyusha: Exploiting the Solution’s Structure via Scheduled Restart Schemes |
| 585 |
5 |
Bayesian Pose Graph Optimization via Bingham Distributions and Tempered Geodesic MCMC |
| 586 |
5 |
Chain of Reasoning for Visual Question Answering |
| 587 |
5 |
Snap ML: A Hierarchical Framework for Machine Learning |
| 588 |
5 |
Cooperative Holistic Scene Understanding: Unifying 3D Object, Layout, and Camera Pose Estimation |
| 589 |
4 |
GroupReduce: Block-Wise Low-Rank Approximation for Neural Language Model Shrinking |
| 590 |
4 |
Smoothed Analysis of Discrete Tensor Decomposition and Assemblies of Neurons |
| 591 |
4 |
Autoconj: Recognizing and Exploiting Conjugacy Without a Domain-Specific Language |
| 592 |
4 |
Data-Driven Clustering via Parameterized Lloyd’s Families |
| 593 |
4 |
Complex Gated Recurrent Neural Networks |
| 594 |
4 |
Regret bounds for meta Bayesian optimization with an unknown Gaussian process prior |
| 595 |
4 |
Temporal alignment and latent Gaussian process factor inference in population spike trains |
| 596 |
4 |
PCA of high dimensional random walks with comparison to neural network training |
| 597 |
4 |
Using Large Ensembles of Control Variates for Variational Inference |
| 598 |
4 |
Non-delusional Q-learning and value-iteration |
| 599 |
4 |
Adaptive Skip Intervals: Temporal Abstraction for Recurrent Dynamical Models |
| 600 |
4 |
Entropy Rate Estimation for Markov Chains with Large State Space |
| 601 |
4 |
Invertibility of Convolutional Generative Networks from Partial Measurements |
| 602 |
4 |
Multi-objective Maximization of Monotone Submodular Functions with Cardinality Constraint |
| 603 |
4 |
Learning and Testing Causal Models with Interventions |
| 604 |
4 |
The Global Anchor Method for Quantifying Linguistic Shifts and Domain Adaptation |
| 605 |
4 |
Learning Attractor Dynamics for Generative Memory |
| 606 |
4 |
PAC-Bayes bounds for stable algorithms with instance-dependent priors |
| 607 |
4 |
Learning Safe Policies with Expert Guidance |
| 608 |
4 |
Policy-Conditioned Uncertainty Sets for Robust Markov Decision Processes |
| 609 |
4 |
Exploration in Structured Reinforcement Learning |
| 610 |
4 |
Data Amplification: A Unified and Competitive Approach to Property Estimation |
| 611 |
4 |
Contextual Stochastic Block Models |
| 612 |
4 |
Robust Detection of Adversarial Attacks by Modeling the Intrinsic Properties of Deep Neural Networks |
| 613 |
4 |
Diffusion Maps for Textual Network Embedding |
| 614 |
4 |
Constrained Cross-Entropy Method for Safe Reinforcement Learning |
| 615 |
4 |
Bandit Learning with Implicit Feedback |
| 616 |
4 |
Model-Agnostic Private Learning |
| 617 |
4 |
Causal Inference via Kernel Deviance Measures |
| 618 |
4 |
Scaling Gaussian Process Regression with Derivatives |
| 619 |
4 |
A no-regret generalization of hierarchical softmax to extreme multi-label classification |
| 620 |
4 |
Deep Structured Prediction with Nonlinear Output Transformations |
| 621 |
4 |
Transfer of Value Functions via Variational Methods |
| 622 |
4 |
Variational Learning on Aggregate Outputs with Gaussian Processes |
| 623 |
4 |
Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks |
| 624 |
4 |
Multi-Task Zipping via Layer-wise Neuron Sharing |
| 625 |
4 |
Computing Kantorovich-Wasserstein Distances on d-dimensional histograms using (d+1)-partite graphs |
| 626 |
4 |
Reparameterization Gradient for Non-differentiable Models |
| 627 |
4 |
Dropping Symmetry for Fast Symmetric Nonnegative Matrix Factorization |
| 628 |
4 |
Geometry-Aware Recurrent Neural Networks for Active Visual Recognition |
| 629 |
4 |
Bandit Learning with Positive Externalities |
| 630 |
4 |
Interpreting Neural Network Judgments via Minimal, Stable, and Symbolic Corrections |
| 631 |
4 |
Differentially Private Contextual Linear Bandits |
| 632 |
4 |
Scalable Coordinated Exploration in Concurrent Reinforcement Learning |
| 633 |
4 |
Bilevel Distance Metric Learning for Robust Image Recognition |
| 634 |
4 |
An Information-Theoretic Analysis for Thompson Sampling with Many Actions |
| 635 |
4 |
GumBolt: Extending Gumbel trick to Boltzmann priors |
| 636 |
4 |
Variational PDEs for Acceleration on Manifolds and Application to Diffeomorphisms |
| 637 |
4 |
Direct Estimation of Differences in Causal Graphs |
| 638 |
4 |
Convergence of Cubic Regularization for Nonconvex Optimization under KL Property |
| 639 |
4 |
Tight Bounds for Collaborative PAC Learning via Multiplicative Weights |
| 640 |
4 |
Differentially Private Bayesian Inference for Exponential Families |
| 641 |
4 |
Representation Learning for Treatment Effect Estimation from Observational Data |
| 642 |
4 |
Revisiting Decomposable Submodular Function Minimization with Incidence Relations |
| 643 |
4 |
SEGA: Variance Reduction via Gradient Sketching |
| 644 |
4 |
Virtual Class Enhanced Discriminative Embedding Learning |
| 645 |
4 |
Relating Leverage Scores and Density using Regularized Christoffel Functions |
| 646 |
4 |
DifNet: Semantic Segmentation by Diffusion Networks |
| 647 |
4 |
Regularization Learning Networks: Deep Learning for Tabular Datasets |
| 648 |
4 |
Joint Active Feature Acquisition and Classification with Variable-Size Set Encoding |
| 649 |
4 |
Quadratic Decomposable Submodular Function Minimization |
| 650 |
4 |
A Deep Bayesian Policy Reuse Approach Against Non-Stationary Agents |
| 651 |
4 |
Uncertainty-Aware Attention for Reliable Interpretation and Prediction |
| 652 |
4 |
Generalizing Graph Matching beyond Quadratic Assignment Model |
| 653 |
4 |
Informative Features for Model Comparison |
| 654 |
4 |
Training DNNs with Hybrid Block Floating Point |
| 655 |
4 |
Learning Pipelines with Limited Data and Domain Knowledge: A Study in Parsing Physics Problems |
| 656 |
4 |
An Off-policy Policy Gradient Theorem Using Emphatic Weightings |
| 657 |
4 |
Generalized Inverse Optimization through Online Learning |
| 658 |
4 |
Kalman Normalization: Normalizing Internal Representations Across Network Layers |
| 659 |
3 |
Transfer of Deep Reactive Policies for MDP Planning |
| 660 |
3 |
Multiple Instance Learning for Efficient Sequential Data Classification on Resource-constrained Devices |
| 661 |
3 |
Point process latent variable models of larval zebrafish behavior |
| 662 |
3 |
Differentially Private Change-Point Detection |
| 663 |
3 |
Learning Beam Search Policies via Imitation Learning |
| 664 |
3 |
Learning a Warping Distance from Unlabeled Time Series Using Sequence Autoencoders |
| 665 |
3 |
A Simple Cache Model for Image Recognition |
| 666 |
3 |
On Markov Chain Gradient Descent |
| 667 |
3 |
Unsupervised Depth Estimation, 3D Face Rotation and Replacement |
| 668 |
3 |
Learning convex bounds for linear quadratic control policy synthesis |
| 669 |
3 |
The Effect of Network Width on the Performance of Large-batch Training |
| 670 |
3 |
The Importance of Sampling inMeta-Reinforcement Learning |
| 671 |
3 |
Coordinate Descent with Bandit Sampling |
| 672 |
3 |
Multilingual Anchoring: Interactive Topic Modeling and Alignment Across Languages |
| 673 |
3 |
Learning without the Phase: Regularized PhaseMax Achieves Optimal Sample Complexity |
| 674 |
3 |
A convex program for bilinear inversion of sparse vectors |
| 675 |
3 |
The promises and pitfalls of Stochastic Gradient Langevin Dynamics |
| 676 |
3 |
Efficient Online Portfolio with Logarithmic Regret |
| 677 |
3 |
Proximal Graphical Event Models |
| 678 |
3 |
Learning Signed Determinantal Point Processes through the Principal Minor Assignment Problem |
| 679 |
3 |
GILBO: One Metric to Measure Them All |
| 680 |
3 |
Bayesian Adversarial Learning |
| 681 |
3 |
Extracting Relationships by Multi-Domain Matching |
| 682 |
3 |
Unsupervised Learning of Artistic Styles with Archetypal Style Analysis |
| 683 |
3 |
The Limits of Post-Selection Generalization |
| 684 |
3 |
SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient |
| 685 |
3 |
Deep Neural Networks with Box Convolutions |
| 686 |
3 |
Graphical Generative Adversarial Networks |
| 687 |
3 |
Neural Interaction Transparency (NIT): Disentangling Learned Interactions for Improved Interpretability |
| 688 |
3 |
Algorithmic Assurance: An Active Approach to Algorithmic Testing using Bayesian Optimisation |
| 689 |
3 |
Learning a latent manifold of odor representations from neural responses in piriform cortex |
| 690 |
3 |
GradiVeQ: Vector Quantization for Bandwidth-Efficient Gradient Aggregation in Distributed CNN Training |
| 691 |
3 |
Scalable Robust Matrix Factorization with Nonconvex Loss |
| 692 |
3 |
Practical exact algorithm for trembling-hand equilibrium refinements in games |
| 693 |
3 |
Nonparametric Bayesian Lomax delegate racing for survival analysis with competing risks |
| 694 |
3 |
Adaptive Negative Curvature Descent with Applications in Non-convex Optimization |
| 695 |
3 |
Fast Rates of ERM and Stochastic Approximation: Adaptive to Error Bound Conditions |
| 696 |
3 |
Exponentiated Strongly Rayleigh Distributions |
| 697 |
3 |
A Bridging Framework for Model Optimization and Deep Propagation |
| 698 |
3 |
Integrated accounts of behavioral and neuroimaging data using flexible recurrent neural network models |
| 699 |
3 |
Meta-Learning MCMC Proposals |
| 700 |
3 |
The streaming rollout of deep networks – towards fully model-parallel execution |
| 701 |
3 |
Solving Non-smooth Constrained Programs with Lower Complexity than \mathcal{O}(1/\varepsilon): A Primal-Dual Homotopy Smoothing Approach |
| 702 |
3 |
Learning from discriminative feature feedback |
| 703 |
3 |
Bipartite Stochastic Block Models with Tiny Clusters |
| 704 |
3 |
Equality of Opportunity in Classification: A Causal Approach |
| 705 |
3 |
Sequence-to-Segment Networks for Segment Detection |
| 706 |
3 |
Hybrid-MST: A Hybrid Active Sampling Strategy for Pairwise Preference Aggregation |
| 707 |
3 |
Step Size Matters in Deep Learning |
| 708 |
3 |
From Stochastic Planning to Marginal MAP |
| 709 |
3 |
Constructing Deep Neural Networks by Bayesian Network Structure Learning |
| 710 |
3 |
Optimization over Continuous and Multi-dimensional Decisions with Observational Data |
| 711 |
3 |
Metric on Nonlinear Dynamical Systems with Perron-Frobenius Operators |
| 712 |
3 |
Safe Active Learning for Time-Series Modeling with Gaussian Processes |
| 713 |
3 |
Processing of missing data by neural networks |
| 714 |
3 |
A Practical Algorithm for Distributed Clustering and Outlier Detection |
| 715 |
3 |
Dual Principal Component Pursuit: Improved Analysis and Efficient Algorithms |
| 716 |
3 |
DeepExposure: Learning to Expose Photos with Asynchronously Reinforced Adversarial Learning |
| 717 |
3 |
Regularizing by the Variance of the Activations’ Sample-Variances |
| 718 |
3 |
Automatic Program Synthesis of Long Programs with a Learned Garbage Collector |
| 719 |
3 |
Nonparametric learning from Bayesian models with randomized objective functions |
| 720 |
3 |
Learning Optimal Reserve Price against Non-myopic Bidders |
| 721 |
3 |
Enhancing the Accuracy and Fairness of Human Decision Making |
| 722 |
3 |
Learning to Exploit Stability for 3D Scene Parsing |
| 723 |
3 |
Parsimonious Quantile Regression of Financial Asset Tail Dynamics via Sequential Learning |
| 724 |
3 |
Geometry Based Data Generation |
| 725 |
3 |
New Insight into Hybrid Stochastic Gradient Descent: Beyond With-Replacement Sampling and Convexity |
| 726 |
3 |
Alternating optimization of decision trees, with application to learning sparse oblique trees |
| 727 |
3 |
Synthesized Policies for Transfer and Adaptation across Tasks and Environments |
| 728 |
3 |
Interactive Structure Learning with Structural Query-by-Committee |
| 729 |
3 |
Efficient nonmyopic batch active search |
| 730 |
3 |
\ell_1-regression with Heavy-tailed Distributions |
| 731 |
3 |
Frequency-Domain Dynamic Pruning for Convolutional Neural Networks |
| 732 |
3 |
Visual Memory for Robust Path Following |
| 733 |
3 |
Maximum-Entropy Fine Grained Classification |
| 734 |
3 |
A Unified Framework for Extensive-Form Game Abstraction with Bounds |
| 735 |
3 |
HitNet: Hybrid Ternary Recurrent Neural Network |
| 736 |
3 |
Joint Sub-bands Learning with Clique Structures for Wavelet Domain Super-Resolution |
| 737 |
3 |
HOGWILD!-Gibbs can be PanAccurate |
| 738 |
2 |
Topkapi: Parallel and Fast Sketches for Finding Top-K Frequent Elements |
| 739 |
2 |
Multi-value Rule Sets for Interpretable Classification with Feature-Efficient Representations |
| 740 |
2 |
Mean Field for the Stochastic Blockmodel: Optimization Landscape and Convergence Issues |
| 741 |
2 |
Robust Subspace Approximation in a Stream |
| 742 |
2 |
Bayesian Structure Learning by Recursive Bootstrap |
| 743 |
2 |
Total stochastic gradient algorithms and applications in reinforcement learning |
| 744 |
2 |
Synaptic Strength For Convolutional Neural Network |
| 745 |
2 |
A Spectral View of Adversarially Robust Features |
| 746 |
2 |
Testing for Families of Distributions via the Fourier Transform |
| 747 |
2 |
Scalable Laplacian K-modes |
| 748 |
2 |
Learning to Reason with Third Order Tensor Products |
| 749 |
2 |
Post: Device Placement with Cross-Entropy Minimization and Proximal Policy Optimization |
| 750 |
2 |
Reinforcement Learning with Multiple Experts: A Bayesian Model Combination Approach |
| 751 |
2 |
Identification and Estimation of Causal Effects from Dependent Data |
| 752 |
2 |
Representer Point Selection for Explaining Deep Neural Networks |
| 753 |
2 |
Learning SMaLL Predictors |
| 754 |
2 |
Iterative Value-Aware Model Learning |
| 755 |
2 |
Improving Neural Program Synthesis with Inferred Execution Traces |
| 756 |
2 |
Estimators for Multivariate Information Measures in General Probability Spaces |
| 757 |
2 |
Stochastic Primal-Dual Method for Empirical Risk Minimization with O(1) Per-Iteration Complexity |
| 758 |
2 |
Distributionally Robust Graphical Models |
| 759 |
2 |
Bilevel learning of the Group Lasso structure |
| 760 |
2 |
Graphical model inference: Sequential Monte Carlo meets deterministic approximations |
| 761 |
2 |
Learning to Specialize with Knowledge Distillation for Visual Question Answering |
| 762 |
2 |
Cluster Variational Approximations for Structure Learning of Continuous-Time Bayesian Networks from Incomplete Data |
| 763 |
2 |
A General Method for Amortizing Variational Filtering |
| 764 |
2 |
Scalar Posterior Sampling with Applications |
| 765 |
2 |
Improved Algorithms for Collaborative PAC Learning |
| 766 |
2 |
Forecasting Treatment Responses Over Time Using Recurrent Marginal Structural Networks |
| 767 |
2 |
Training Deep Models Faster with Robust, Approximate Importance Sampling |
| 768 |
2 |
Efficient Loss-Based Decoding on Graphs for Extreme Classification |
| 769 |
2 |
Hierarchical Reinforcement Learning for Zero-shot Generalization with Subtask Dependencies |
| 770 |
2 |
Online Structure Learning for Feed-Forward and Recurrent Sum-Product Networks |
| 771 |
2 |
Provable Gaussian Embedding with One Observation |
| 772 |
2 |
Model-based targeted dimensionality reduction for neuronal population data |
| 773 |
2 |
Representation Learning of Compositional Data |
| 774 |
2 |
Modeling Dynamic Missingness of Implicit Feedback for Recommendation |
| 775 |
2 |
Query K-means Clustering and the Double Dixie Cup Problem |
| 776 |
2 |
On the Local Hessian in Back-propagation |
| 777 |
2 |
On Controllable Sparse Alternatives to Softmax |
| 778 |
2 |
Multi-domain Causal Structure Learning in Linear Systems |
| 779 |
2 |
Deep State Space Models for Unconditional Word Generation |
| 780 |
2 |
Predict Responsibly: Improving Fairness and Accuracy by Learning to Defer |
| 781 |
2 |
Diverse Ensemble Evolution: Curriculum Data-Model Marriage |
| 782 |
2 |
Loss Functions for Multiset Prediction |
| 783 |
2 |
Efficient inference for time-varying behavior during learning |
| 784 |
2 |
Contextual Pricing for Lipschitz Buyers |
| 785 |
2 |
Manifold Structured Prediction |
| 786 |
2 |
Middle-Out Decoding |
| 787 |
2 |
Differentially Private k-Means with Constant Multiplicative Error |
| 788 |
2 |
Fully Understanding The Hashing Trick |
| 789 |
2 |
Contour location via entropy reduction leveraging multiple information sources |
| 790 |
2 |
Why so gloomy? A Bayesian explanation of human pessimism bias in the multi-armed bandit task |
| 791 |
2 |
Porcupine Neural Networks: Approximating Neural Network Landscapes |
| 792 |
2 |
Non-Ergodic Alternating Proximal Augmented Lagrangian Algorithms with Optimal Rates |
| 793 |
2 |
Context-dependent upper-confidence bounds for directed exploration |
| 794 |
2 |
Recurrently Controlled Recurrent Networks |
| 795 |
2 |
Hunting for Discriminatory Proxies in Linear Regression Models |
| 796 |
2 |
Third-order Smoothness Helps: Faster Stochastic Optimization Algorithms for Finding Local Minima |
| 797 |
2 |
Explaining Deep Learning Models — A Bayesian Non-parametric Approach |
| 798 |
2 |
Semi-Supervised Learning with Declaratively Specified Entropy Constraints |
| 799 |
2 |
Maximum Causal Tsallis Entropy Imitation Learning |
| 800 |
2 |
Mallows Models for Top-k Lists |
| 801 |
2 |
Optimization of Smooth Functions with Noisy Observations: Local Minimax Rates |
| 802 |
2 |
Binary Rating Estimation with Graph Side Information |
| 803 |
2 |
Inexact trust-region algorithms on Riemannian manifolds |
| 804 |
2 |
Differentially Private Robust Low-Rank Approximation |
| 805 |
2 |
Probabilistic Neural Programmed Networks for Scene Generation |
| 806 |
2 |
Faster Online Learning of Optimal Threshold for Consistent F-measure Optimization |
| 807 |
2 |
Sublinear Time Low-Rank Approximation of Distance Matrices |
| 808 |
2 |
Scaling the Poisson GLM to massive neural datasets through polynomial approximations |
| 809 |
2 |
Infinite-Horizon Gaussian Processes |
| 810 |
2 |
Learning Gaussian Processes by Minimizing PAC-Bayesian Generalization Bounds |
| 811 |
2 |
Deep, complex, invertible networks for inversion of transmission effects in multimode optical fibres |
| 812 |
2 |
Contextual Combinatorial Multi-armed Bandits with Volatile Arms and Submodular Reward |
| 813 |
2 |
Learning latent variable structured prediction models with Gaussian perturbations |
| 814 |
2 |
Practical Methods for Graph Two-Sample Testing |
| 815 |
2 |
Demystifying excessively volatile human learning: A Bayesian persistent prior and a neural approximation |
| 816 |
2 |
Causal Discovery from Discrete Data using Hidden Compact Representation |
| 817 |
2 |
Contextual bandits with surrogate losses: Margin bounds and efficient algorithms |
| 818 |
2 |
Structural Causal Bandits: Where to Intervene? |
| 819 |
2 |
Active Learning for Non-Parametric Regression Using Purely Random Trees |
| 820 |
2 |
Breaking the Span Assumption Yields Fast Finite-Sum Minimization |
| 821 |
2 |
Universal Growth in Production Economies |
| 822 |
2 |
High Dimensional Linear Regression using Lattice Basis Reduction |
| 823 |
2 |
Fighting Boredom in Recommender Systems with Linear Reinforcement Learning |
| 824 |
2 |
Generalizing Tree Probability Estimation via Bayesian Networks |
| 825 |
2 |
Global Gated Mixture of Second-order Pooling for Improving Deep Convolutional Neural Networks |
| 826 |
2 |
A Block Coordinate Ascent Algorithm for Mean-Variance Optimization |
| 827 |
2 |
Boosted Sparse and Low-Rank Tensor Regression |
| 828 |
2 |
DropMax: Adaptive Variational Softmax |
| 829 |
2 |
Connectionist Temporal Classification with Maximum Entropy Regularization |
| 830 |
2 |
A Neural Compositional Paradigm for Image Captioning |
| 831 |
2 |
An Efficient Pruning Algorithm for Robust Isotonic Regression |
| 832 |
2 |
Understanding Weight Normalized Deep Neural Networks with Rectified Linear Units |
| 833 |
1 |
Sparse PCA from Sparse Linear Regression |
| 834 |
1 |
Computationally and statistically efficient learning of causal Bayes nets using path queries |
| 835 |
1 |
Removing Hidden Confounding by Experimental Grounding |
| 836 |
1 |
MixLasso: Generalized Mixed Regression via Convex Atomic-Norm Regularization |
| 837 |
1 |
Thermostat-assisted continuously-tempered Hamiltonian Monte Carlo for Bayesian learning |
| 838 |
1 |
Fast deep reinforcement learning using online adjustments from the past |
| 839 |
1 |
Streamlining Variational Inference for Constraint Satisfaction Problems |
| 840 |
1 |
Convex Elicitation of Continuous Properties |
| 841 |
1 |
Learning and Inference in Hilbert Space with Quantum Graphical Models |
| 842 |
1 |
Uplift Modeling from Separate Labels |
| 843 |
1 |
Dynamic Network Model from Partial Observations |
| 844 |
1 |
Theoretical guarantees for EM under misspecified Gaussian mixture models |
| 845 |
1 |
Statistical and Computational Trade-Offs in Kernel K-Means |
| 846 |
1 |
GLoMo: Unsupervised Learning of Transferable Relational Graphs |
| 847 |
1 |
Adaptive Path-Integral Autoencoders: Representation Learning and Planning for Dynamical Systems |
| 848 |
1 |
A Statistical Recurrent Model on the Manifold of Symmetric Positive Definite Matrices |
| 849 |
1 |
Stein Variational Gradient Descent as Moment Matching |
| 850 |
1 |
A Bayesian Nonparametric View on Count-Min Sketch |
| 851 |
1 |
Deep Poisson gamma dynamical systems |
| 852 |
1 |
Information-theoretic Limits for Community Detection in Network Models |
| 853 |
1 |
Online Reciprocal Recommendation with Theoretical Performance Guarantees |
| 854 |
1 |
Statistical mechanics of low-rank tensor decomposition |
| 855 |
1 |
Modelling and unsupervised learning of symmetric deformable object categories |
| 856 |
1 |
Efficient Anomaly Detection via Matrix Sketching |
| 857 |
1 |
Improved Expressivity Through Dendritic Neural Networks |
| 858 |
1 |
Stochastic Expectation Maximization with Variance Reduction |
| 859 |
1 |
Monte-Carlo Tree Search for Constrained POMDPs |
| 860 |
1 |
Breaking the Activation Function Bottleneck through Adaptive Parameterization |
| 861 |
1 |
Rectangular Bounding Process |
| 862 |
1 |
Adaptive Learning with Unknown Information Flows |
| 863 |
1 |
A Bayesian Approach to Generative Adversarial Imitation Learning |
| 864 |
1 |
Constant Regret, Generalized Mixability, and Mirror Descent |
| 865 |
1 |
How to tell when a clustering is (approximately) correct using convex relaxations |
| 866 |
1 |
Stimulus domain transfer in recurrent models for large scale cortical population prediction on video |
| 867 |
1 |
Efficient online algorithms for fast-rate regret bounds under sparsity |
| 868 |
1 |
Gen-Oja: Simple & Efficient Algorithm for Streaming Generalized Eigenvector Computation |
| 869 |
1 |
Unorganized Malicious Attacks Detection |
| 870 |
1 |
Uncertainty Sampling is Preconditioned Stochastic Gradient Descent on Zero-One Loss |
| 871 |
1 |
rho-POMDPs have Lipschitz-Continuous epsilon-Optimal Value Functions |
| 872 |
1 |
Maximizing Induced Cardinality Under a Determinantal Point Process |
| 873 |
1 |
Efficient Convex Completion of Coupled Tensors using Coupled Nuclear Norms |
| 874 |
1 |
Stochastic Nonparametric Event-Tensor Decomposition |
| 875 |
1 |
Diminishing Returns Shape Constraints for Interpretability and Regularization |
| 876 |
1 |
Policy Regret in Repeated Games |
| 877 |
1 |
Large-Scale Stochastic Sampling from the Probability Simplex |
| 878 |
1 |
An Improved Analysis of Alternating Minimization for Structured Multi-Response Regression |
| 879 |
1 |
Proximal SCOPE for Distributed Sparse Learning |
| 880 |
1 |
The Everlasting Database: Statistical Validity at a Fair Price |
| 881 |
1 |
Size-Noise Tradeoffs in Generative Networks |
| 882 |
1 |
Exponentially Weighted Imitation Learning for Batched Historical Data |
| 883 |
1 |
The Cluster Description Problem – Complexity Results, Formulations and Approximations |
| 884 |
1 |
MacNet: Transferring Knowledge from Machine Comprehension to Sequence-to-Sequence Models |
| 885 |
1 |
Approximation algorithms for stochastic clustering |
| 886 |
1 |
Gamma-Poisson Dynamic Matrix Factorization Embedded with Metadata Influence |
| 887 |
1 |
Mental Sampling in Multimodal Representations |
| 888 |
1 |
Critical initialisation for deep signal propagation in noisy rectifier neural networks |
| 889 |
1 |
Learning convex polytopes with margin |
| 890 |
1 |
Low-rank Interaction with Sparse Additive Effects Model for Large Data Frames |
| 891 |
1 |
Sample Efficient Stochastic Gradient Iterative Hard Thresholding Method for Stochastic Sparse Linear Regression with Limited Attribute Observation |
| 892 |
1 |
Horizon-Independent Minimax Linear Regression |
| 893 |
1 |
Causal Inference and Mechanism Clustering of A Mixture of Additive Noise Models |
| 894 |
1 |
Learning in Games with Lossy Feedback |
| 895 |
1 |
Learning Confidence Sets using Support Vector Machines |
| 896 |
1 |
Fast greedy algorithms for dictionary selection with generalized sparsity constraints |
| 897 |
1 |
Non-metric Similarity Graphs for Maximum Inner Product Search |
| 898 |
1 |
A Mathematical Model For Optimal Decisions In A Representative Democracy |
| 899 |
1 |
Learning Bounds for Greedy Approximation with Explicit Feature Maps from Multiple Kernels |
| 900 |
1 |
PG-TS: Improved Thompson Sampling for Logistic Contextual Bandits |
| 901 |
1 |
Learning filter widths of spectral decompositions with wavelets |
| 902 |
1 |
Lifelong Inverse Reinforcement Learning |
| 903 |
1 |
Expanding Holographic Embeddings for Knowledge Completion |
| 904 |
1 |
Submodular Field Grammars: Representation, Inference, and Application to Image Parsing |
| 905 |
1 |
BML: A High-performance, Low-cost Gradient Synchronization Algorithm for DML Training |
| 906 |
1 |
Flexible and accurate inference and learning for deep generative models |
| 907 |
1 |
KONG: Kernels for ordered-neighborhood graphs |
| 908 |
1 |
Minimax Estimation of Neural Net Distance |
| 909 |
1 |
Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization |
| 910 |
1 |
Multiplicative Weights Updates with Constant Step-Size in Graphical Constant-Sum Games |
| 911 |
1 |
Dimensionality Reduction for Stationary Time Series via Stochastic Nonconvex Optimization |
| 912 |
1 |
Stochastic Spectral and Conjugate Descent Methods |
| 913 |
1 |
Semi-crowdsourced Clustering with Deep Generative Models |
| 914 |
1 |
Parsimonious Bayesian deep networks |
| 915 |
1 |
Asymptotic optimality of adaptive importance sampling |
| 916 |
1 |
When do random forests fail? |
| 917 |
1 |
Adaptation to Easy Data in Prediction with Limited Advice |
| 918 |
1 |
Gradient Descent Meets Shift-and-Invert Preconditioning for Eigenvector Computation |
| 919 |
1 |
Continuous-time Value Function Approximation in Reproducing Kernel Hilbert Spaces |
| 920 |
1 |
Provable Variational Inference for Constrained Log-Submodular Models |
| 921 |
1 |
Ridge Regression and Provable Deterministic Ridge Leverage Score Sampling |
| 922 |
1 |
Zero-Shot Transfer with Deictic Object-Oriented Representation in Reinforcement Learning |
| 923 |
1 |
Mixture Matrix Completion |
| 924 |
1 |
Algorithmic Linearly Constrained Gaussian Processes |
| 925 |
1 |
SplineNets: Continuous Neural Decision Graphs |
| 926 |
1 |
The Pessimistic Limits and Possibilities of Margin-based Losses in Semi-supervised Learning |
| 927 |
1 |
Video Prediction via Selective Sampling |
| 928 |
1 |
Stochastic Composite Mirror Descent: Optimal Bounds with High Probabilities |
| 929 |
1 |
A loss framework for calibrated anomaly detection |
| 930 |
1 |
Designing by Training: Acceleration Neural Network for Fast High-Dimensional Convolution |
| 931 |
1 |
Multitask Boosting for Survival Analysis with Competing Risks |
| 932 |
1 |
The Lingering of Gradients: How to Reuse Gradients Over Time |
| 933 |
1 |
(Probably) Concave Graph Matching |
| 934 |
1 |
Fast Similarity Search via Optimal Sparse Lifting |
| 935 |
0 |
Contrastive Learning from Pairwise Measurements |
| 936 |
0 |
Support Recovery for Orthogonal Matching Pursuit: Upper and Lower bounds |
| 937 |
0 |
Sketching Method for Large Scale Combinatorial Inference |
| 938 |
0 |
Regret Bounds for Online Portfolio Selection with a Cardinality Constraint |
| 939 |
0 |
Improved Network Robustness with Adversary Critic |
| 940 |
0 |
Discretely Relaxing Continuous Variables for tractable Variational Inference |
| 941 |
0 |
Bounded-Loss Private Prediction Markets |
| 942 |
0 |
Lifted Weighted Mini-Bucket |
| 943 |
0 |
Predictive Approximate Bayesian Computation via Saddle Points |
| 944 |
0 |
Learning Invariances using the Marginal Likelihood |
| 945 |
0 |
Variance-Reduced Stochastic Gradient Descent on Streaming Data |
| 946 |
0 |
Trading robust representations for sample complexity through self-supervised visual experience |
| 947 |
0 |
PAC-Bayes Tree: weighted subtrees with guarantees |
| 948 |
0 |
The emergence of multiple retinal cell types through efficient coding of natural movies |
| 949 |
0 |
The Sample Complexity of Semi-Supervised Learning with Nonparametric Mixture Models |
| 950 |
0 |
Inferring Latent Velocities from Weather Radar Data using Gaussian Processes |
| 951 |
0 |
Wavelet regression and additive models for irregularly spaced data |
| 952 |
0 |
Distributed Multitask Reinforcement Learning with Quadratic Convergence |
| 953 |
0 |
Legendre Decomposition for Tensors |
| 954 |
0 |
Compact Representation of Uncertainty in Clustering |
| 955 |
0 |
Clustering Redemption–Beyond the Impossibility of Kleinberg’s Axioms |
| 956 |
0 |
Dimensionality Reduction has Quantifiable Imperfections: Two Geometric Bounds |
| 957 |
0 |
Submodular Maximization via Gradient Ascent: The Case of Deep Submodular Functions |
| 958 |
0 |
Dirichlet belief networks for topic structure learning |
| 959 |
0 |
A Reduction for Efficient LDA Topic Reconstruction |
| 960 |
0 |
Preference Based Adaptation for Learning Objectives |
| 961 |
0 |
On Neuronal Capacity |
| 962 |
0 |
Revisiting (\epsilon, \gamma, \tau)-similarity learning for domain adaptation |
| 963 |
0 |
Deep Homogeneous Mixture Models: Representation, Separation, and Approximation |
| 964 |
0 |
A probabilistic population code based on neural samples |
| 965 |
0 |
Efficient Gradient Computation for Structured Output Learning with Rational and Tropical Losses |
| 966 |
0 |
A Theory-Based Evaluation of Nearest Neighbor Models Put Into Practice |
| 967 |
0 |
Algebraic tests of general Gaussian latent tree models |
| 968 |
0 |
Online Improper Learning with an Approximation Oracle |
| 969 |
0 |
Community Exploration: From Offline Optimization to Online Learning |
| 970 |
0 |
Learning Concave Conditional Likelihood Models for Improved Analysis of Tandem Mass Spectra |
| 971 |
0 |
Experimental Design for Cost-Aware Learning of Causal Graphs |
| 972 |
0 |
Exploiting Numerical Sparsity for Efficient Learning : Faster Eigenvector Computation and Regression |
| 973 |
0 |
Multi-armed Bandits with Compensation |
| 974 |
0 |
Power-law efficient neural codes provide general link between perceptual bias and discriminability |
| 975 |
0 |
Learning from Group Comparisons: Exploiting Higher Order Interactions |
| 976 |
0 |
Objective and efficient inference for couplings in neuronal networks |
| 977 |
0 |
Neural Edit Operations for Biological Sequences |
| 978 |
0 |
Measures of distortion for machine learning |
| 979 |
0 |
Information-based Adaptive Stimulus Selection to Optimize Communication Efficiency in Brain-Computer Interfaces |
| 980 |
0 |
A Smoother Way to Train Structured Prediction Models |
| 981 |
0 |
Negotiable Reinforcement Learning for Pareto Optimal Sequential Decision-Making |
| 982 |
0 |
Active Matting |
| 983 |
0 |
Limited Memory Kelley’s Method Converges for Composite Convex and Submodular Objectives |
| 984 |
0 |
Completing State Representations using Spectral Learning |
| 985 |
0 |
Cooperative neural networks (CoNN): Exploiting prior independence structure for improved classification |
| 986 |
0 |
TETRIS: TilE-matching the TRemendous Irregular Sparsity |
| 987 |
0 |
Efficient Projection onto the Perfect Phylogeny Model |
| 988 |
0 |
Beauty-in-averageness and its contextual modulations: A Bayesian statistical account |
| 989 |
0 |
Early Stopping for Nonparametric Testing |
| 990 |
0 |
Inferring Networks From Random Walk-Based Node Similarities |
| 991 |
0 |
Communication Efficient Parallel Algorithms for Optimization on Manifolds |
| 992 |
0 |
Latent Gaussian Activity Propagation: Using Smoothness and Structure to Separate and Localize Sounds in Large Noisy Environments |
| 993 |
0 |
On Binary Classification in Extreme Regions |
| 994 |
0 |
Optimistic optimization of a Brownian |
| 995 |
0 |
Fast Estimation of Causal Interactions using Wold Processes |
| 996 |
0 |
Factored Bandits |
| 997 |
0 |
Analytic solution and stationary phase approximation for the Bayesian lasso and elastic net |
| 998 |
0 |
Query Complexity of Bayesian Private Learning |
| 999 |
0 |
Modelling sparsity, heterogeneity, reciprocity and community structure in temporal interaction data |
| 1000 |
0 |
MULAN: A Blind and Off-Grid Method for Multichannel Echo Retrieval |
| 1001 |
0 |
Overlapping Clustering Models, and One (class) SVM to Bind Them All |
| 1002 |
0 |
Bayesian Model Selection Approach to Boundary Detection with Non-Local Priors |
| 1003 |
0 |
Genetic-Gated Networks for Deep Reinforcement Learning |
| 1004 |
0 |
Foreground Clustering for Joint Segmentation and Localization in Videos and Images |
| 1005 |
0 |
Learning semantic similarity in a continuous space |
| 1006 |
0 |
Quantifying Learning Guarantees for Convex but Inconsistent Surrogates |
| 1007 |
0 |
Removing the Feature Correlation Effect of Multiplicative Noise |
| 1008 |
0 |
Optimization for Approximate Submodularity |