| Rank |
Cited by |
Paper name |
| 0 |
280 |
Self-Attention Generative Adversarial Networks |
| 1 |
95 |
A Convergence Theory for Deep Learning via Over-Parameterization |
| 2 |
95 |
Gradient Descent Finds Global Minima of Deep Neural Networks |
| 3 |
50 |
Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks |
| 4 |
49 |
Learning Latent Dynamics for Planning from Pixels |
| 5 |
41 |
Adversarial examples from computational constraints |
| 6 |
38 |
Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations |
| 7 |
35 |
Quantifying Generalization in Reinforcement Learning |
| 8 |
33 |
Theoretically Principled Trade-off between Robustness and Accuracy |
| 9 |
32 |
Sever: A Robust Meta-Algorithm for Stochastic Optimization |
| 10 |
27 |
Invertible Residual Networks |
| 11 |
27 |
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks |
| 12 |
27 |
AdaGrad stepsizes: sharp convergence over nonconvex landscapes |
| 13 |
26 |
TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing |
| 14 |
26 |
Certified Adversarial Robustness via Randomized Smoothing |
| 15 |
25 |
Graphite: Iterative Generative Modeling of Graphs |
| 16 |
24 |
Do ImageNet Classifiers Generalize to ImageNet? |
| 17 |
22 |
AReS and MaRS – Adversarial and MMD-Minimizing Regression for SDEs |
| 18 |
20 |
Adversarial Examples Are a Natural Consequence of Test Error in Noise |
| 19 |
20 |
Simplifying Graph Convolutional Networks |
| 20 |
19 |
On the Spectral Bias of Neural Networks |
| 21 |
18 |
Optimal Auctions through Deep Learning |
| 22 |
17 |
The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects |
| 23 |
15 |
Adaptive Neural Trees |
| 24 |
14 |
MASS: Masked Sequence to Sequence Pre-training for Language Generation |
| 25 |
14 |
Obtaining Fairness using Optimal Transport Theory |
| 26 |
14 |
Overparameterized Nonlinear Learning: Gradient Descent Takes the Shortest Path? |
| 27 |
13 |
NAS-Bench-101: Towards Reproducible Neural Architecture Search |
| 28 |
13 |
Rademacher Complexity for Adversarially Robust Generalization |
| 29 |
12 |
Multi-Object Representation Learning with Iterative Variational Inference |
| 30 |
12 |
Imitating Latent Policies from Observation |
| 31 |
12 |
The Evolved Transformer |
| 32 |
12 |
SGD: General Analysis and Improved Rates |
| 33 |
11 |
Actor-Attention-Critic for Multi-Agent Reinforcement Learning |
| 34 |
11 |
Noise2Self: Blind Denoising by Self-Supervision |
| 35 |
11 |
Flow++: Improving Flow-Based Generative Models with Variational Dequantization and Architecture Design |
| 36 |
11 |
Stochastic Gradient Push for Distributed Deep Learning |
| 37 |
11 |
Optimal Transport for structured data with application on graphs |
| 38 |
11 |
On the Universality of Invariant Networks |
| 39 |
10 |
Random Shuffling Beats SGD after Finite Epochs |
| 40 |
10 |
Analyzing Federated Learning through an Adversarial Lens |
| 41 |
10 |
Learning a Prior over Intent via Meta-Inverse Reinforcement Learning |
| 42 |
10 |
Online Meta-Learning |
| 43 |
10 |
On Efficient Optimal Transport: An Analysis of Greedy and Accelerated Mirror Descent Algorithms |
| 44 |
9 |
Learning to Generalize from Sparse and Underspecified Rewards |
| 45 |
9 |
Insertion Transformer: Flexible Sequence Generation via Insertion Operations |
| 46 |
9 |
CoT: Cooperative Training for Generative Modeling of Discrete Data |
| 47 |
9 |
Variational Implicit Processes |
| 48 |
9 |
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables |
| 49 |
9 |
Emerging Convolutions for Generative Normalizing Flows |
| 50 |
9 |
Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously |
| 51 |
9 |
Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition |
| 52 |
9 |
FloWaveNet : A Generative Flow for Raw Audio |
| 53 |
9 |
Policy Certificates: Towards Accountable Reinforcement Learning |
| 54 |
9 |
Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds |
| 55 |
9 |
Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication |
| 56 |
9 |
Gauge Equivariant Convolutional Networks and the Icosahedral CNN |
| 57 |
9 |
High-Fidelity Image Generation With Fewer Labels |
| 58 |
9 |
Safe Policy Improvement with Baseline Bootstrapping |
| 59 |
9 |
Off-Policy Deep Reinforcement Learning without Exploration |
| 60 |
9 |
Using Pre-Training Can Improve Model Robustness and Uncertainty |
| 61 |
9 |
Manifold Mixup: Better Representations by Interpolating Hidden States |
| 62 |
8 |
Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning |
| 63 |
8 |
Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret |
| 64 |
8 |
Defending Against Saddle Point Attack in Byzantine-Robust Distributed Learning |
| 65 |
8 |
On the Linear Speedup Analysis of Communication Efficient Momentum SGD for Distributed Non-Convex Optimization |
| 66 |
8 |
Open-ended learning in symmetric zero-sum games |
| 67 |
8 |
Error Feedback Fixes SignSGD and other Gradient Compression Schemes |
| 68 |
7 |
TarMAC: Targeted Multi-Agent Communication |
| 69 |
7 |
Latent Normalizing Flows for Discrete Sequences |
| 70 |
7 |
Provably Efficient Maximum Entropy Exploration |
| 71 |
7 |
Sorting Out Lipschitz Function Approximation |
| 72 |
7 |
Understanding Geometry of Encoder-Decoder CNNs |
| 73 |
7 |
A Theory of Regularized Markov Decision Processes |
| 74 |
7 |
Graph U-Nets |
| 75 |
7 |
A Kernel Theory of Modern Data Augmentation |
| 76 |
7 |
Learning deep kernels for exponential family densities |
| 77 |
7 |
On Learning Invariant Representations for Domain Adaptation |
| 78 |
7 |
Towards a Unified Analysis of Random Fourier Features |
| 79 |
7 |
Deep Counterfactual Regret Minimization |
| 80 |
7 |
Training Neural Networks with Local Error Signals |
| 81 |
7 |
HOList: An Environment for Machine Learning of Higher Order Logic Theorem Proving |
| 82 |
7 |
ELF OpenGo: an analysis and open reimplementation of AlphaZero |
| 83 |
6 |
Geometry and Symmetry in Short-and-Sparse Deconvolution |
| 84 |
6 |
Agnostic Federated Learning |
| 85 |
6 |
On the Limitations of Representing Functions on Sets |
| 86 |
6 |
Parameter-Efficient Transfer Learning for NLP |
| 87 |
6 |
Escaping Saddle Points with Adaptive Gradient Methods |
| 88 |
6 |
Batch Policy Learning under Constraints |
| 89 |
6 |
Understanding the Impact of Entropy on Policy Optimization |
| 90 |
6 |
An Instability in Variational Inference for Topic Models |
| 91 |
6 |
Understanding the Origins of Bias in Word Embeddings |
| 92 |
6 |
Making Convolutional Networks Shift-Invariant Again |
| 93 |
6 |
Fast Context Adaptation via Meta-Learning |
| 94 |
6 |
SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning |
| 95 |
6 |
The Odds are Odd: A Statistical Test for Detecting Adversarial Examples |
| 96 |
6 |
Complexity of Linear Regions in Deep Networks |
| 97 |
6 |
Training Well-Generalizing Classifiers for Fairness Metrics and Other Data-Dependent Constraints |
| 98 |
6 |
Scalable Fair Clustering |
| 99 |
6 |
Learning Action Representations for Reinforcement Learning |
| 100 |
6 |
An Investigation into Neural Net Optimization via Hessian Eigenvalue Density |
| 101 |
6 |
Natural Analysts in Adaptive Data Analysis |
| 102 |
6 |
Collaborative Evolutionary Reinforcement Learning |
| 103 |
6 |
Katalyst: Boosting Convex Katayusha for Non-Convex Problems with a Large Condition Number |
| 104 |
6 |
Nonconvex Variance Reduced Optimization with Arbitrary Sampling |
| 105 |
5 |
Loss Landscapes of Regularized Linear Autoencoders |
| 106 |
5 |
A Theoretical Analysis of Contrastive Unsupervised Representation Learning |
| 107 |
5 |
Guarantees for Spectral Clustering with Fairness Constraints |
| 108 |
5 |
Online Control with Adversarial Disturbances |
| 109 |
5 |
Width Provably Matters in Optimization for Deep Linear Neural Networks |
| 110 |
5 |
Sliced-Wasserstein Flows: Nonparametric Generative Modeling via Optimal Transport and Diffusions |
| 111 |
5 |
MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing |
| 112 |
5 |
Remember and Forget for Experience Replay |
| 113 |
5 |
The advantages of multiple classes for reducing overfitting from test set reuse |
| 114 |
5 |
Model-Based Active Exploration |
| 115 |
5 |
Efficient Dictionary Learning with Gradient Descent |
| 116 |
5 |
Near optimal finite time identification of arbitrary linear dynamical systems |
| 117 |
5 |
EDDI: Efficient Dynamic Discovery of High-Value Information with Partial VAE |
| 118 |
5 |
On the Impact of the Activation function on Deep Neural Networks Training |
| 119 |
5 |
Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits |
| 120 |
5 |
Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning |
| 121 |
5 |
Variational Inference for sparse network reconstruction from count data |
| 122 |
5 |
GEOMetrics: Exploiting Geometric Structure for Graph-Encoded Objects |
| 123 |
5 |
SAGA with Arbitrary Sampling |
| 124 |
5 |
Robust Decision Trees Against Adversarial Examples |
| 125 |
5 |
First-Order Adversarial Vulnerability of Neural Networks and Input Dimension |
| 126 |
4 |
On Variational Bounds of Mutual Information |
| 127 |
4 |
Differentially Private Fair Learning |
| 128 |
4 |
Fair k-Center Clustering for Data Summarization |
| 129 |
4 |
Mixture Models for Diverse Machine Translation: Tricks of the Trade |
| 130 |
4 |
Non-Monotonic Sequential Text Generation |
| 131 |
4 |
Gromov-Wasserstein Learning for Graph Matching and Node Embedding |
| 132 |
4 |
Counterfactual Visual Explanations |
| 133 |
4 |
Optimal Mini-Batch and Step Sizes for SAGA |
| 134 |
4 |
Infinite Mixture Prototypes for Few-shot Learning |
| 135 |
4 |
A Dynamical Systems Perspective on Nesterov Acceleration |
| 136 |
4 |
On the Complexity of Approximating Wasserstein Barycenters |
| 137 |
4 |
SGD without Replacement: Sharper Rates for General Smooth Convex Functions |
| 138 |
4 |
Learning interpretable continuous-time models of latent stochastic dynamical systems |
| 139 |
4 |
Bayesian Nonparametric Federated Learning of Neural Networks |
| 140 |
4 |
BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning |
| 141 |
4 |
Stochastic Optimization for DC Functions and Non-smooth Non-convex Regularizers with Non-asymptotic Convergence |
| 142 |
4 |
Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations |
| 143 |
4 |
Provable Guarantees for Gradient-Based Meta-Learning |
| 144 |
4 |
Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules |
| 145 |
4 |
Generalized Majorization-Minimization |
| 146 |
4 |
Simple Black-box Adversarial Attacks |
| 147 |
4 |
Parameter efficient training of deep convolutional neural networks by dynamic sparse reparameterization |
| 148 |
4 |
NATTACK: Learning the Distributions of Adversarial Examples for an Improved Black-Box Attack on Deep Neural Networks |
| 149 |
4 |
Are Generative Classifiers More Robust to Adversarial Attacks? |
| 150 |
4 |
Information-Theoretic Considerations in Batch Reinforcement Learning |
| 151 |
4 |
Provably efficient RL with Rich Observations via Latent State Decoding |
| 152 |
4 |
Locally Private Bayesian Inference for Count Models |
| 153 |
4 |
Bayesian Joint Spike-and-Slab Graphical Lasso |
| 154 |
4 |
Graph Matching Networks for Learning the Similarity of Graph Structured Objects |
| 155 |
4 |
Diagnosing Bottlenecks in Deep Q-learning Algorithms |
| 156 |
4 |
An Investigation of Model-Free Planning |
| 157 |
4 |
Contextual Memory Trees |
| 158 |
4 |
Ithemal: Accurate, Portable and Fast Basic Block Throughput Estimation using Deep Neural Networks |
| 159 |
4 |
Data Shapley: Equitable Valuation of Data for Machine Learning |
| 160 |
4 |
SelectiveNet: A Deep Neural Network with an Integrated Reject Option |
| 161 |
3 |
Multi-Frequency Phase Synchronization |
| 162 |
3 |
Sublinear quantum algorithms for training linear and kernel-based classifiers |
| 163 |
3 |
Probabilistic Neural Symbolic Models for Interpretable Visual Question Answering |
| 164 |
3 |
Similarity of Neural Network Representations Revisited |
| 165 |
3 |
What is the Effect of Importance Weighting in Deep Learning? |
| 166 |
3 |
Analyzing and Improving Representations with the Soft Nearest Neighbor Loss |
| 167 |
3 |
Cautious Regret Minimization: Online Optimization with Long-Term Budget Constraints |
| 168 |
3 |
A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks |
| 169 |
3 |
Geometric Scattering for Graph Data Analysis |
| 170 |
3 |
Stable and Fair Classification |
| 171 |
3 |
Analogies Explained: Towards Understanding Word Embeddings |
| 172 |
3 |
Finding Options that Minimize Planning Time |
| 173 |
3 |
Hybrid Models with Deep and Invertible Features |
| 174 |
3 |
Transfer Learning for Related Reinforcement Learning Tasks via Image-to-Image Translation |
| 175 |
3 |
Distributed Learning over Unreliable Networks |
| 176 |
3 |
Learning Optimal Fair Policies |
| 177 |
3 |
Metropolis-Hastings Generative Adversarial Networks |
| 178 |
3 |
Non-Asymptotic Analysis of Fractional Langevin Monte Carlo for Non-Convex Optimization |
| 179 |
3 |
Multi-Frequency Vector Diffusion Maps |
| 180 |
3 |
Fairwashing: the risk of rationalization |
| 181 |
3 |
Finding Mixed Nash Equilibria of Generative Adversarial Networks |
| 182 |
3 |
Learning Generative Models across Incomparable Spaces |
| 183 |
3 |
Learning-to-Learn Stochastic Gradient Descent with Biased Regularization |
| 184 |
3 |
Plug-and-Play Methods Provably Converge with Properly Trained Denoisers |
| 185 |
3 |
Control Regularization for Reduced Variance Reinforcement Learning |
| 186 |
3 |
The Natural Language of Actions |
| 187 |
3 |
Almost surely constrained convex optimization |
| 188 |
3 |
Traditional and Heavy Tailed Self Regularization in Neural Network Models |
| 189 |
3 |
Self-Supervised Exploration via Disagreement |
| 190 |
3 |
Direct Uncertainty Prediction for Medical Second Opinions |
| 191 |
3 |
Wasserstein Adversarial Examples via Projected Sinkhorn Iterations |
| 192 |
3 |
Conditioning by adaptive sampling for robust design |
| 193 |
3 |
Does Data Augmentation Lead to Positive Margin? |
| 194 |
3 |
Greedy Layerwise Learning Can Scale To ImageNet |
| 195 |
3 |
DL2: Training and Querying Neural Networks with Logic |
| 196 |
3 |
The Value Function Polytope in Reinforcement Learning |
| 197 |
3 |
Action Robust Reinforcement Learning and Applications in Continuous Control |
| 198 |
3 |
Automatic Posterior Transformation for Likelihood-Free Inference |
| 199 |
3 |
Rao-Blackwellized Stochastic Gradients for Discrete Distributions |
| 200 |
3 |
Subspace Robust Wasserstein Distances |
| 201 |
3 |
Importance Sampling Policy Evaluation with an Estimated Behavior Policy |
| 202 |
3 |
Lipschitz Generative Adversarial Nets |
| 203 |
3 |
Homomorphic Sensing |
| 204 |
3 |
A Conditional-Gradient-Based Augmented Lagrangian Framework |
| 205 |
3 |
Deep Factors for Forecasting |
| 206 |
3 |
Learning to bid in revenue-maximizing auctions |
| 207 |
3 |
Molecular Hypergraph Grammar with Its Application to Molecular Optimization |
| 208 |
3 |
Topological Data Analysis of Decision Boundaries with Application to Model Selection |
| 209 |
3 |
Statistical Foundations of Virtual Democracy |
| 210 |
3 |
Lower Bounds for Smooth Nonconvex Finite-Sum Optimization |
| 211 |
3 |
Improving Adversarial Robustness via Promoting Ensemble Diversity |
| 212 |
3 |
Metric-Optimized Example Weights |
| 213 |
3 |
Nonlinear Distributional Gradient Temporal-Difference Learning |
| 214 |
2 |
Finite-Time Analysis of Distributed TD(0) with Linear Function Approximation on Multi-Agent Reinforcement Learning |
| 215 |
2 |
Domain Adaptation with Asymmetrically-Relaxed Distribution Alignment |
| 216 |
2 |
Adaptive and Safe Bayesian Optimization in High Dimensions via One-Dimensional Subspaces |
| 217 |
2 |
Robustly Disentangled Causal Mechanisms: Validating Deep Representations for Interventional Robustness |
| 218 |
2 |
Guided evolutionary strategies: augmenting random search with surrogate gradients |
| 219 |
2 |
Autoregressive Energy Machines |
| 220 |
2 |
Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback |
| 221 |
2 |
Online Algorithms for Rent-Or-Buy with Expert Advice |
| 222 |
2 |
Submodular Maximization beyond Non-negativity: Guarantees, Fast Algorithms, and Applications |
| 223 |
2 |
Rates of Convergence for Sparse Variational Gaussian Process Regression |
| 224 |
2 |
CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network |
| 225 |
2 |
MeanSum: A Neural Model for Unsupervised Multi-Document Abstractive Summarization |
| 226 |
2 |
Adaptive Sensor Placement for Continuous Spaces |
| 227 |
2 |
Global Convergence of Block Coordinate Descent in Deep Learning |
| 228 |
2 |
Repairing without Retraining: Avoiding Disparate Impact with Counterfactual Distributions |
| 229 |
2 |
Discovering Context Effects from Raw Choice Data |
| 230 |
2 |
Fairness without Harm: Decoupled Classifiers with Preference Guarantees |
| 231 |
2 |
POLITEX: Regret Bounds for Policy Iteration using Expert Prediction |
| 232 |
2 |
Fair Regression: Quantitative Definitions and Reduction-Based Algorithms |
| 233 |
2 |
Flexibly Fair Representation Learning by Disentanglement |
| 234 |
2 |
Proportionally Fair Clustering |
| 235 |
2 |
Stochastic Beams and Where To Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement |
| 236 |
2 |
On the Connection Between Adversarial Robustness and Saliency Map Interpretability |
| 237 |
2 |
Understanding Impacts of High-Order Loss Approximations and Features in Deep Learning Interpretation |
| 238 |
2 |
$\texttt{DoubleSqueeze}$: Parallel Stochastic Gradient Descent with Double-pass Error-Compensated Compression |
| 239 |
2 |
Almost Unsupervised Text to Speech and Automatic Speech Recognition |
| 240 |
2 |
Target-Based Temporal-Difference Learning |
| 241 |
2 |
Scalable Metropolis-Hastings for Exact Bayesian Inference with Large Datasets |
| 242 |
2 |
Toward Controlling Discrimination in Online Ad Auctions |
| 243 |
2 |
Learning to Infer Program Sketches |
| 244 |
2 |
Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Value Approximation |
| 245 |
2 |
Classification from Positive, Unlabeled and Biased Negative Data |
| 246 |
2 |
Neural Network Attributions: A Causal Perspective |
| 247 |
2 |
Learning Discrete Structures for Graph Neural Networks |
| 248 |
2 |
Cheap Orthogonal Constraints in Neural Networks: A Simple Parametrization of the Orthogonal and Unitary Group |
| 249 |
2 |
CompILE: Compositional Imitation Learning and Execution |
| 250 |
2 |
Statistics and Samples in Distributional Reinforcement Learning |
| 251 |
2 |
Exploring the Landscape of Spatial Robustness |
| 252 |
2 |
EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis |
| 253 |
2 |
Provably Efficient Imitation Learning from Observation Alone |
| 254 |
2 |
Alternating Minimizations Converge to Second-Order Optimal Solutions |
| 255 |
2 |
On the statistical rate of nonlinear recovery in generative models with heavy-tailed data |
| 256 |
2 |
Sensitivity Analysis of Linear Structural Causal Models |
| 257 |
2 |
Simple Stochastic Gradient Methods for Non-Smooth Non-Convex Regularized Optimization |
| 258 |
2 |
Beyond Adaptive Submodularity: Approximation Guarantees of Greedy Policy with Adaptive Submodularity Ratio |
| 259 |
2 |
Band-limited Training and Inference for Convolutional Neural Networks |
| 260 |
2 |
Multivariate Submodular Optimization |
| 261 |
2 |
Domain Agnostic Learning with Disentangled Representations |
| 262 |
2 |
Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling |
| 263 |
2 |
Parsimonious Black-Box Adversarial Attacks via Efficient Combinatorial Optimization |
| 264 |
2 |
Robust Learning from Untrusted Sources |
| 265 |
2 |
Trading Redundancy for Communication: Speeding up Distributed SGD for Non-convex Optimization |
| 266 |
2 |
On Connected Sublevel Sets in Deep Learning |
| 267 |
2 |
Sum-of-Squares Polynomial Flow |
| 268 |
2 |
On the Convergence and Robustness of Adversarial Training |
| 269 |
2 |
Active Learning for Decision-Making from Imbalanced Observational Data |
| 270 |
2 |
Low Latency Privacy Preserving Inference |
| 271 |
2 |
Weak Detection of Signal in the Spiked Wigner Model |
| 272 |
2 |
The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study |
| 273 |
2 |
Same, Same But Different: Recovering Neural Network Quantization Error Through Weight Factorization |
| 274 |
2 |
Graphical-model based estimation and inference for differential privacy |
| 275 |
2 |
Differentiable Linearized ADMM |
| 276 |
2 |
CapsAndRuns: An Improved Method for Approximately Optimal Algorithm Configuration |
| 277 |
2 |
Composable Core-sets for Determinant Maximization: A Simple Near-Optimal Algorithm |
| 278 |
2 |
Fingerprint Policy Optimisation for Robust Reinforcement Learning |
| 279 |
2 |
Safe Grid Search with Optimal Complexity |
| 280 |
2 |
Dynamic Weights in Multi-Objective Deep Reinforcement Learning |
| 281 |
2 |
DeepMDP: Learning Continuous Latent Space Models for Representation Learning |
| 282 |
2 |
On Symmetric Losses for Learning from Corrupted Labels |
| 283 |
2 |
A Kernel Perspective for Regularizing Deep Neural Networks |
| 284 |
2 |
Random Matrix Improved Covariance Estimation for a Large Class of Metrics |
| 285 |
2 |
Task-Agnostic Dynamics Priors for Deep Reinforcement Learning |
| 286 |
2 |
Adversarial Generation of Time-Frequency Features with application in audio synthesis |
| 287 |
2 |
Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models |
| 288 |
2 |
Correlated Variational Auto-Encoders |
| 289 |
2 |
Maximum Likelihood Estimation for Learning Populations of Parameters |
| 290 |
2 |
Self-Attention Graph Pooling |
| 291 |
2 |
Fast Rates for a kNN Classifier Robust to Unknown Asymmetric Label Noise |
| 292 |
2 |
Learning to Prove Theorems via Interacting with Proof Assistants |
| 293 |
2 |
A Composite Randomized Incremental Gradient Method |
| 294 |
2 |
GMNN: Graph Markov Neural Networks |
| 295 |
2 |
Grid-Wise Control for Multi-Agent Reinforcement Learning in Video Game AI |
| 296 |
2 |
When Samples Are Strategically Selected |
| 297 |
2 |
Processing Megapixel Images with Deep Attention-Sampling Models |
| 298 |
2 |
Passed & Spurious: Descent Algorithms and Local Minima in Spiked Matrix-Tensor Models |
| 299 |
2 |
PA-GD: On the Convergence of Perturbed Alternating Gradient Descent to Second-Order Stationary Points for Structured Nonconvex Optimization |
| 300 |
2 |
A Contrastive Divergence for Combining Variational Inference and MCMC |
| 301 |
2 |
Adversarial Attacks on Node Embeddings via Graph Poisoning |
| 302 |
1 |
Understanding Priors in Bayesian Neural Networks at the Unit Level |
| 303 |
1 |
Semi-Cyclic Stochastic Gradient Descent |
| 304 |
1 |
Learning Dependency Structures for Weak Supervision Models |
| 305 |
1 |
Faster Attend-Infer-Repeat with Tractable Probabilistic Models |
| 306 |
1 |
Hierarchical Importance Weighted Autoencoders |
| 307 |
1 |
Unsupervised Label Noise Modeling and Loss Correction |
| 308 |
1 |
QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning |
| 309 |
1 |
The information-theoretic value of unlabeled data in semi-supervised learning |
| 310 |
1 |
Cross-Domain 3D Equivariant Image Embeddings |
| 311 |
1 |
Neural Collaborative Subspace Clustering |
| 312 |
1 |
PAC Identification of Many Good Arms in Stochastic Multi-Armed Bandits |
| 313 |
1 |
Sequential Facility Location: Approximate Submodularity and Greedy Algorithm |
| 314 |
1 |
Good Initializations of Variational Bayes for Deep Models |
| 315 |
1 |
Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-linearities |
| 316 |
1 |
Nonparametric Bayesian Deep Networks with Local Competition |
| 317 |
1 |
Communication-Constrained Inference and the Role of Shared Randomness |
| 318 |
1 |
Decentralized Exploration in Multi-Armed Bandits |
| 319 |
1 |
Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations |
| 320 |
1 |
On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference |
| 321 |
1 |
Measurements of Three-Level Hierarchical Structure in the Outliers in the Spectrum of Deepnet Hessians |
| 322 |
1 |
DAG-GNN: DAG Structure Learning with Graph Neural Networks |
| 323 |
1 |
Optimal Kronecker-Sum Approximation of Real Time Recurrent Learning |
| 324 |
1 |
Partially Linear Additive Gaussian Graphical Models |
| 325 |
1 |
Learning Context-dependent Label Permutations for Multi-label Classification |
| 326 |
1 |
Approximation and non-parametric estimation of ResNet-type convolutional neural networks |
| 327 |
1 |
Robust Inference via Generative Classifiers for Handling Noisy Labels |
| 328 |
1 |
Robust Estimation of Tree Structured Gaussian Graphical Models |
| 329 |
1 |
Graph Resistance and Learning from Pairwise Comparisons |
| 330 |
1 |
Coresets for Ordered Weighted Clustering |
| 331 |
1 |
Efficient Nonconvex Regularized Tensor Completion with Structure-aware Proximal Iterations |
| 332 |
1 |
Zero-Shot Knowledge Distillation in Deep Networks |
| 333 |
1 |
Breaking the gridlock in Mixture-of-Experts: Consistent and Efficient Algorithms |
| 334 |
1 |
Spectral Clustering of Signed Graphs via Matrix Power Means |
| 335 |
1 |
Adaptive Regret of Convex and Smooth Functions |
| 336 |
1 |
Scaling Up Ordinal Embedding: A Landmark Approach |
| 337 |
1 |
Understanding and correcting pathologies in the training of learned optimizers |
| 338 |
1 |
On Scalable and Efficient Computation of Large Scale Optimal Transport |
| 339 |
1 |
A fully differentiable beam search decoder |
| 340 |
1 |
Online Variance Reduction with Mixtures |
| 341 |
1 |
MIWAE: Deep Generative Modelling and Imputation of Incomplete Data Sets |
| 342 |
1 |
A Polynomial Time MCMC Method for Sampling from Continuous Determinantal Point Processes |
| 343 |
1 |
Fairness risk measures |
| 344 |
1 |
Fairness-Aware Learning for Continuous Attributes and Treatments |
| 345 |
1 |
Neural Separation of Observed and Unobserved Distributions |
| 346 |
1 |
Reinforcement Learning in Configurable Continuous Environments |
| 347 |
1 |
Bit-Swap: Recursive Bits-Back Coding for Lossless Compression with Hierarchical Latent Variables |
| 348 |
1 |
Adaptive Scale-Invariant Online Algorithms for Learning Linear Models |
| 349 |
1 |
Bridging Theory and Algorithm for Domain Adaptation |
| 350 |
1 |
MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement |
| 351 |
1 |
Learning Discrete and Continuous Factors of Data via Alternating Disentanglement |
| 352 |
1 |
CAB: Continuous Adaptive Blending for Policy Evaluation and Learning |
| 353 |
1 |
Learning Structured Decision Problems with Unawareness |
| 354 |
1 |
Adaptive Monte Carlo Multiple Testing via Multi-Armed Bandits |
| 355 |
1 |
Competing Against Nash Equilibria in Adversarially Changing Zero-Sum Games |
| 356 |
1 |
Complementary-Label Learning for Arbitrary Losses and Models |
| 357 |
1 |
Neuron birth-death dynamics accelerates gradient descent and converges asymptotically |
| 358 |
1 |
Unifying Orthogonal Monte Carlo Methods |
| 359 |
1 |
Differentially Private Empirical Risk Minimization with Non-convex Loss Functions |
| 360 |
1 |
Towards a Deep and Unified Understanding of Deep Neural Models in NLP |
| 361 |
1 |
State-Reification Networks: Improving Generalization by Modeling the Distribution of Hidden Representations |
| 362 |
1 |
Geometric Losses for Distributional Learning |
| 363 |
1 |
Learn to Grow: A Continual Structure Learning Framework for Overcoming Catastrophic Forgetting |
| 364 |
1 |
Co-manifold learning with missing data |
| 365 |
1 |
Compositional Fairness Constraints for Graph Embeddings |
| 366 |
1 |
Improved Convergence for $\ell_1$ and $\ell_\infty$ Regression via Iteratively Reweighted Least Squares |
| 367 |
1 |
Transfer of Samples in Policy Search via Multiple Importance Sampling |
| 368 |
1 |
Sample-Optimal Parametric Q-Learning Using Linearly Additive Features |
| 369 |
1 |
Bias Also Matters: Bias Attribution for Deep Neural Network Explanation |
| 370 |
1 |
Combining parametric and nonparametric models for off-policy evaluation |
| 371 |
1 |
Disentangled Graph Convolutional Networks |
| 372 |
1 |
Differentiable Dynamic Normalization for Learning Deep Representation |
| 373 |
1 |
Relational Pooling for Graph Representations |
| 374 |
1 |
Hessian Aided Policy Gradient |
| 375 |
1 |
Estimate Sequences for Variance-Reduced Stochastic Composite Optimization |
| 376 |
1 |
Addressing the Loss-Metric Mismatch with Adaptive Loss Alignment |
| 377 |
1 |
Detecting Overlapping and Correlated Communities without Pure Nodes: Identifiability and Algorithm |
| 378 |
1 |
Tensor Variable Elimination for Plated Factor Graphs |
| 379 |
1 |
Accelerated Linear Convergence of Stochastic Momentum Methods in Wasserstein Distances |
| 380 |
1 |
Position-aware Graph Neural Networks |
| 381 |
1 |
How does Disagreement Help Generalization against Label Corruption? |
| 382 |
1 |
IMEXnet – A Forward Stable Deep Neural Network |
| 383 |
1 |
Inferring Heterogeneous Causal Effects in Presence of Spatial Confounding |
| 384 |
1 |
Bayesian Optimization Meets Bayesian Optimal Stopping |
| 385 |
1 |
Submodular Streaming in All Its Glory: Tight Approximation, Minimum Memory and Low Adaptive Complexity |
| 386 |
1 |
Equivariant Transformer Networks |
| 387 |
1 |
Submodular Observation Selection and Information Gathering for Quadratic Models |
| 388 |
1 |
Conditional Independence in Testing Bayesian Networks |
| 389 |
1 |
MONK — Outlier-Robust Mean Embedding Estimation by Median-of-Means |
| 390 |
1 |
Improved Parallel Algorithms for Density-Based Network Clustering |
| 391 |
1 |
Graph Element Networks: adaptive, structured computation and memory |
| 392 |
1 |
Learning Models from Data with Measurement Error: Tackling Underreporting |
| 393 |
1 |
Surrogate Losses for Online Learning of Stepsizes in Stochastic Non-Convex Optimization |
| 394 |
1 |
A Deep Reinforcement Learning Perspective on Internet Congestion Control |
| 395 |
1 |
Orthogonal Random Forest for Causal Inference |
| 396 |
1 |
Classifying Treatment Responders Under Causal Effect Monotonicity |
| 397 |
1 |
On the Generalization Gap in Reparameterizable Reinforcement Learning |
| 398 |
1 |
Approximated Oracle Filter Pruning for Destructive CNN Width Optimization |
| 399 |
1 |
Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models |
| 400 |
1 |
Better generalization with less data using robust gradient descent |
| 401 |
1 |
Monge blunts Bayes: Hardness Results for Adversarial Training |
| 402 |
1 |
Beyond the Chinese Restaurant and Pitman-Yor processes: Statistical Models with double power-law behavior |
| 403 |
1 |
Variational Annealing of GANs: A Langevin Perspective |
| 404 |
1 |
On the Design of Estimators for Bandit Off-Policy Evaluation |
| 405 |
1 |
A Large-Scale Study on Regularization and Normalization in GANs |
| 406 |
1 |
Automated Model Selection with Bayesian Quadrature |
| 407 |
1 |
Zeno: Distributed Stochastic Gradient Descent with Suspicion-based Fault-tolerance |
| 408 |
1 |
Deep Gaussian Processes with Importance-Weighted Variational Inference |
| 409 |
1 |
Noisy Dual Principal Component Pursuit |
| 410 |
1 |
Transferable Clean-Label Poisoning Attacks on Deep Neural Nets |
| 411 |
1 |
Bilinear Bandits with Low-rank Structure |
| 412 |
1 |
Structured agents for physical construction |
| 413 |
1 |
Estimating Information Flow in Deep Neural Networks |
| 414 |
1 |
Understanding and Utilizing Deep Neural Networks Trained with Noisy Labels |
| 415 |
1 |
GOODE: A Gaussian Off-The-Shelf Ordinary Differential Equation Solver |
| 416 |
1 |
Maximum Entropy-Regularized Multi-Goal Reinforcement Learning |
| 417 |
1 |
Optimal Algorithms for Lipschitz Bandits with Heavy-tailed Rewards |
| 418 |
1 |
Distribution calibration for regression |
| 419 |
1 |
Distributed Learning with Sublinear Communication |
| 420 |
1 |
Temporal Gaussian Mixture Layer for Videos |
| 421 |
1 |
Stochastic Deep Networks |
| 422 |
1 |
Benefits and Pitfalls of the Exponential Mechanism with Applications to Hilbert Spaces and Functional PCA |
| 423 |
1 |
Efficient optimization of loops and limits with randomized telescoping sums |
| 424 |
1 |
Robust Influence Maximization for Hyperparametric Models |
| 425 |
1 |
Communication Complexity in Locally Private Distribution Estimation and Heavy Hitters |
| 426 |
1 |
Convolutional Poisson Gamma Belief Network |
| 427 |
1 |
SWALP : Stochastic Weight Averaging in Low Precision Training |
| 428 |
1 |
Improving Neural Network Quantization without Retraining using Outlier Channel Splitting |
| 429 |
1 |
Beyond Backprop: Online Alternating Minimization with Auxiliary Variables |
| 430 |
1 |
Discovering Options for Exploration by Minimizing Cover Time |
| 431 |
1 |
Static Automatic Batching In TensorFlow |
| 432 |
1 |
Rotation Invariant Householder Parameterization for Bayesian PCA |
| 433 |
1 |
Fault Tolerance in Iterative-Convergent Machine Learning |
| 434 |
1 |
SATNet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver |
| 435 |
1 |
Fast Incremental von Neumann Graph Entropy Computation: Theory, Algorithm, and Applications |
| 436 |
1 |
Generalized Linear Rule Models |
| 437 |
1 |
Optimal Minimal Margin Maximization with Boosting |
| 438 |
1 |
GDPP: Learning Diverse Generations using Determinantal Point Processes |
| 439 |
1 |
Per-Decision Option Discounting |
| 440 |
1 |
Adaptive Stochastic Natural Gradient Method for One-Shot Neural Architecture Search |
| 441 |
1 |
BayesNAS: A Bayesian Approach for Neural Architecture Search |
| 442 |
1 |
Collaborative Channel Pruning for Deep Networks |
| 443 |
1 |
Rethinking Lossy Compression: The Rate-Distortion-Perception Tradeoff |
| 444 |
1 |
Learning from a Learner |
| 445 |
1 |
Rate Distortion For Model Compression:From Theory To Practice |
| 446 |
1 |
Curiosity-Bottleneck: Exploration By Distilling Task-Specific Novelty |
| 447 |
1 |
Imitation Learning from Imperfect Demonstration |
| 448 |
1 |
Switching Linear Dynamics for Variational Bayes Filtering |
| 449 |
1 |
Feature-Critic Networks for Heterogeneous Domain Generalization |
| 450 |
1 |
Entropic GANs meet VAEs: A Statistical Approach to Compute Sample Likelihoods in GANs |
| 451 |
1 |
Predictor-Corrector Policy Optimization |
| 452 |
1 |
EMI: Exploration with Mutual Information |
| 453 |
1 |
Wasserstein of Wasserstein Loss for Learning Generative Models |
| 454 |
1 |
Learning Optimal Linear Regularizers |
| 455 |
1 |
A Statistical Investigation of Long Memory in Language and Music |
| 456 |
1 |
Characterization of Convex Objective Functions and Optimal Expected Convergence Rates for SGD |
| 457 |
1 |
Generative Adversarial User Model for Reinforcement Learning Based Recommendation System |
| 458 |
1 |
Inference and Sampling of $K_{33}$-free Ising Models |
| 459 |
1 |
CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning |
| 460 |
1 |
A Block Coordinate Descent Proximal Method for Simultaneous Filtering and Parameter Estimation |
| 461 |
1 |
Learning to Optimize Multigrid PDE Solvers |
| 462 |
1 |
LGM-Net: Learning to Generate Matching Networks for Few-Shot Learning |
| 463 |
1 |
Combating Label Noise in Deep Learning using Abstention |
| 464 |
1 |
On The Power of Curriculum Learning in Training Deep Networks |
| 465 |
1 |
Learning to Clear the Market |
| 466 |
1 |
Online learning with kernel losses |
| 467 |
1 |
Teaching a black-box learner |
| 468 |
1 |
Learning to Groove with Inverse Sequence Transformations |
| 469 |
1 |
Stable-Predictive Optimistic Counterfactual Regret Minimization |
| 470 |
1 |
Faster Stochastic Alternating Direction Method of Multipliers for Nonconvex Optimization |
| 471 |
1 |
Improved Zeroth-Order Variance Reduced Algorithms and Analysis for Nonconvex Optimization |
| 472 |
1 |
Making Deep Q-learning methods robust to time discretization |
| 473 |
1 |
Validating Causal Inference Models via Influence Functions |
| 474 |
0 |
Lorentzian Distance Learning for Hyperbolic Representations |
| 475 |
0 |
Pareto Optimal Streaming Unsupervised Classification |
| 476 |
0 |
LatentGNN: Learning Efficient Non-local Relations for Visual Recognition |
| 477 |
0 |
Greedy Orthogonal Pivoting Algorithm for Non-Negative Matrix Factorization |
| 478 |
0 |
Partially Exchangeable Networks and Architectures for Learning Summary Statistics in Approximate Bayesian Computation |
| 479 |
0 |
Hyperbolic Disk Embeddings for Directed Acyclic Graphs |
| 480 |
0 |
Faster Algorithms for Binary Matrix Factorization |
| 481 |
0 |
Contextual Multi-armed Bandit Algorithm for Semiparametric Reward Model |
| 482 |
0 |
ARSM: Augment-REINFORCE-Swap-Merge Estimator for Gradient Backpropagation Through Categorical Variables |
| 483 |
0 |
Unsupervised Deep Learning by Neighbourhood Discovery |
| 484 |
0 |
Discovering Conditionally Salient Features with Statistical Guarantees |
| 485 |
0 |
Dropout as a Structured Shrinkage Prior |
| 486 |
0 |
Categorical Feature Compression via Submodular Optimization |
| 487 |
0 |
Exploiting structure of uncertainty for efficient matroid semi-bandits |
| 488 |
0 |
Non-monotone Submodular Maximization with Nearly Optimal Adaptivity and Query Complexity |
| 489 |
0 |
Learning and Data Selection in Big Datasets |
| 490 |
0 |
The Wasserstein Transform |
| 491 |
0 |
Distributed, Egocentric Representations of Graphs for Detecting Critical Structures |
| 492 |
0 |
COMIC: Multi-view Clustering Without Parameter Selection |
| 493 |
0 |
Random Walks on Hypergraphs with Edge-Dependent Vertex Weights |
| 494 |
0 |
Supervised Hierarchical Clustering with Exponential Linkage |
| 495 |
0 |
Scale-free adaptive planning for deterministic dynamics & discounted rewards |
| 496 |
0 |
Learning Distance for Sequences by Learning a Ground Metric |
| 497 |
0 |
Efficient Training of BERT by Progressively Stacking |
| 498 |
0 |
Making Decisions that Reduce Discriminatory Impacts |
| 499 |
0 |
On the Long-term Impact of Algorithmic Decision Policies: Effort Unfairness and Feature Segregation through Social Learning |
| 500 |
0 |
Kernel Normalized Cut: a Theoretical Revisit |
| 501 |
0 |
Humor in Word Embeddings: Cockamamie Gobbledegook for Nincompoops |
| 502 |
0 |
Trainable Decoding of Sets of Sequences for Neural Sequence Models |
| 503 |
0 |
Spectral Approximate Inference |
| 504 |
0 |
Empirical Analysis of Beam Search Performance Degradation in Neural Sequence Models |
| 505 |
0 |
LIT: Learned Intermediate Representation Training for Model Compression |
| 506 |
0 |
A Better k-means++ Algorithm via Local Search |
| 507 |
0 |
Anytime Online-to-Batch, Optimism and Acceleration |
| 508 |
0 |
Improving Neural Language Modeling via Adversarial Training |
| 509 |
0 |
Fast Algorithm for Generalized Multinomial Models with Ranking Data |
| 510 |
0 |
Fast and Stable Maximum Likelihood Estimation for Incomplete Multinomial Models |
| 511 |
0 |
Unreproducible Research is Reproducible |
| 512 |
0 |
Deep Residual Output Layers for Neural Language Generation |
| 513 |
0 |
Online Adaptive Principal Component Analysis and Its extensions |
| 514 |
0 |
Meta-Learning Neural Bloom Filters |
| 515 |
0 |
Efficient Full-Matrix Adaptive Regularization |
| 516 |
0 |
Recursive Sketches for Modular Deep Learning |
| 517 |
0 |
Efficient On-Device Models using Neural Projections |
| 518 |
0 |
Ladder Capsule Network |
| 519 |
0 |
Mallows ranking models: maximum likelihood estimate and regeneration |
| 520 |
0 |
Learning to select for a predefined ranking |
| 521 |
0 |
Dimensionality Reduction for Tukey Regression |
| 522 |
0 |
Learning from Delayed Outcomes via Proxies with Applications to Recommender Systems |
| 523 |
0 |
Demystifying Dropout |
| 524 |
0 |
Learning to Exploit Long-term Relational Dependencies in Knowledge Graphs |
| 525 |
0 |
Concrete Autoencoders: Differentiable Feature Selection and Reconstruction |
| 526 |
0 |
Why do Larger Models Generalize Better? A Theoretical Perspective via the XOR Problem |
| 527 |
0 |
Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel $k$-means Clustering |
| 528 |
0 |
DBSCAN++: Towards fast and scalable density clustering |
| 529 |
0 |
Bandit Multiclass Linear Classification: Efficient Algorithms for the Separable Case |
| 530 |
0 |
Accelerated Flow for Probability Distributions |
| 531 |
0 |
Model Function Based Conditional Gradient Method with Armijo-like Line Search |
| 532 |
0 |
Iterative Linearized Control: Stable Algorithms and Complexity Guarantees |
| 533 |
0 |
AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss |
| 534 |
0 |
Adaptive Antithetic Sampling for Variance Reduction |
| 535 |
0 |
State-Regularized Recurrent Neural Networks |
| 536 |
0 |
Learning What and Where to Transfer |
| 537 |
0 |
Adversarial Online Learning with noise |
| 538 |
0 |
Replica Conditional Sequential Monte Carlo |
| 539 |
0 |
Gaining Free or Low-Cost Interpretability with Interpretable Partial Substitute |
| 540 |
0 |
Calibrated Model-Based Deep Reinforcement Learning |
| 541 |
0 |
Power k-Means Clustering |
| 542 |
0 |
Hierarchically Structured Meta-learning |
| 543 |
0 |
Incremental Randomized Sketching for Online Kernel Learning |
| 544 |
0 |
Exploring interpretable LSTM neural networks over multi-variable data |
| 545 |
0 |
RaFM: Rank-Aware Factorization Machines |
| 546 |
0 |
Functional Transparency for Structured Data: a Game-Theoretic Approach |
| 547 |
0 |
Projections for Approximate Policy Iteration Algorithms |
| 548 |
0 |
Differentially Private Learning of Geometric Concepts |
| 549 |
0 |
Online Learning with Sleeping Experts and Feedback Graphs |
| 550 |
0 |
Multi-objective training of Generative Adversarial Networks with multiple discriminators |
| 551 |
0 |
Bounding User Contributions: A Bias-Variance Trade-off in Differential Privacy |
| 552 |
0 |
Model Comparison for Semantic Grouping |
| 553 |
0 |
Linear-Complexity Data-Parallel Earth Mover’s Distance Approximations |
| 554 |
0 |
Variational Laplace Autoencoders |
| 555 |
0 |
Online Convex Optimization in Adversarial Markov Decision Processes |
| 556 |
0 |
Stochastic Iterative Hard Thresholding for Graph-structured Sparsity Optimization |
| 557 |
0 |
Doubly Robust Joint Learning for Recommendation on Data Missing Not at Random |
| 558 |
0 |
On Sparse Linear Regression in the Local Differential Privacy Model |
| 559 |
0 |
Matrix-Free Preconditioning in Online Learning |
| 560 |
0 |
Data Poisoning Attacks on Stochastic Bandits |
| 561 |
0 |
Learning Neurosymbolic Generative Models via Program Synthesis |
| 562 |
0 |
Kernel-Based Reinforcement Learning in Robust Markov Decision Processes |
| 563 |
0 |
Probability Functional Descent: A Unifying Perspective on GANs, Variational Inference, and Reinforcement Learning |
| 564 |
0 |
Differential Inclusions for Modeling Nonsmooth ADMM Variants: A Continuous Limit Theory |
| 565 |
0 |
Stochastic Blockmodels meet Graph Neural Networks |
| 566 |
0 |
Quantile Stein Variational Gradient Descent for Batch Bayesian Optimization |
| 567 |
0 |
A Recurrent Neural Cascade-based Model for Continuous-Time Diffusion |
| 568 |
0 |
Exploration Conscious Reinforcement Learning Revisited |
| 569 |
0 |
The Kernel Interaction Trick: Fast Bayesian Discovery of Pairwise Interactions in High Dimensions |
| 570 |
0 |
Interpreting Adversarially Trained Convolutional Neural Networks |
| 571 |
0 |
Deep Generative Learning via Variational Gradient Flow |
| 572 |
0 |
Breaking Inter-Layer Co-Adaptation by Classifier Anonymization |
| 573 |
0 |
Bayesian Optimization of Composite Functions |
| 574 |
0 |
First-Order Algorithms Converge Faster than $O(1/k)$ on Convex Problems |
| 575 |
0 |
Sparse Multi-Channel Variational Autoencoder for the Joint Analysis of Heterogeneous Data |
| 576 |
0 |
Open Vocabulary Learning on Source Code with a Graph-Structured Cache |
| 577 |
0 |
Toward Understanding the Importance of Noise in Training Neural Networks |
| 578 |
0 |
Invariant-Equivariant Representation Learning for Multi-Class Data |
| 579 |
0 |
Active Learning with Disagreement Graphs |
| 580 |
0 |
Scalable Nonparametric Sampling from Multimodal Posteriors with the Posterior Bootstrap |
| 581 |
0 |
Learning to Route in Similarity Graphs |
| 582 |
0 |
Active Learning for Probabilistic Structured Prediction of Cuts and Matchings |
| 583 |
0 |
The Variational Predictive Natural Gradient |
| 584 |
0 |
Deep Compressed Sensing |
| 585 |
0 |
Minimal Achievable Sufficient Statistic Learning |
| 586 |
0 |
Bayesian Generative Active Deep Learning |
| 587 |
0 |
Hierarchical Decompositional Mixtures of Variational Autoencoders |
| 588 |
0 |
Efficient learning of smooth probability functions from Bernoulli tests with guarantees |
| 589 |
0 |
Myopic Posterior Sampling for Adaptive Goal Oriented Design of Experiments |
| 590 |
0 |
Discriminative Regularization for Latent Variable Models with Applications to Electrocardiography |
| 591 |
0 |
Understanding and Accelerating Particle-Based Variational Inference |
| 592 |
0 |
Connectivity-Optimized Representation Learning via Persistent Homology |
| 593 |
0 |
Nonlinear Stein Variational Gradient Descent for Learning Diversified Mixture Models |
| 594 |
0 |
Dead-ends and Secure Exploration in Reinforcement Learning |
| 595 |
0 |
Predicate Exchange: Inference with Declarative Knowledge |
| 596 |
0 |
Fast Direct Search in an Optimally Compressed Continuous Target Space for Efficient Multi-Label Active Learning |
| 597 |
0 |
Adversarially Learned Representations for Information Obfuscation and Inference |
| 598 |
0 |
Active Embedding Search via Noisy Paired Comparisons |
| 599 |
0 |
A Tree-Based Method for Fast Repeated Sampling of Determinantal Point Processes |
| 600 |
0 |
Hiring Under Uncertainty |
| 601 |
0 |
On Medians of (Randomized) Pairwise Means |
| 602 |
0 |
Towards Accurate Model Selection in Deep Unsupervised Domain Adaptation |
| 603 |
0 |
Overcoming Multi-model Forgetting |
| 604 |
0 |
Generative Modeling of Infinite Occluded Objects for Compositional Scene Representation |
| 605 |
0 |
Phase transition in PCA with missing data: Reduced signal-to-noise ratio, not sample size! |
| 606 |
0 |
More Efficient Off-Policy Evaluation through Regularized Targeted Learning |
| 607 |
0 |
A Baseline for Any Order Gradient Estimation in Stochastic Computation Graphs |
| 608 |
0 |
Scalable Training of Inference Networks for Gaussian-Process Models |
| 609 |
0 |
Submodular Cost Submodular Cover with an Approximate Oracle |
| 610 |
0 |
Riemannian adaptive stochastic gradient algorithms on matrix manifolds |
| 611 |
0 |
Distributional Multivariate Policy Evaluation and Exploration with the Bellman GAN |
| 612 |
0 |
Training CNNs with Selective Allocation of Channels |
| 613 |
0 |
Neural Inverse Knitting: From Images to Manufacturing Instructions |
| 614 |
0 |
Discovering Latent Covariance Structures for Multiple Time Series |
| 615 |
0 |
Transferability vs. Discriminability: Batch Spectral Penalization for Adversarial Domain Adaptation |
| 616 |
0 |
Transferable Adversarial Training: A General Approach to Adapting Deep Classifiers |
| 617 |
0 |
Adjustment Criteria for Generalizing Experimental Findings |
| 618 |
0 |
Kernel Mean Matching for Content Addressability of GANs |
| 619 |
0 |
Incorporating Grouping Information into Bayesian Decision Tree Ensembles |
| 620 |
0 |
Towards Understanding Knowledge Distillation |
| 621 |
0 |
New results on information theoretic clustering |
| 622 |
0 |
Anomaly Detection With Multiple-Hypotheses Predictions |
| 623 |
0 |
Trajectory-Based Off-Policy Deep Reinforcement Learning |
| 624 |
0 |
LegoNet: Efficient Convolutional Neural Networks with Lego Filters |
| 625 |
0 |
Lossless or Quantized Boosting with Integer Arithmetic |
| 626 |
0 |
Variational Russian Roulette for Deep Bayesian Nonparametrics |
| 627 |
0 |
Approximating Orthogonal Matrices with Effective Givens Factorization |
| 628 |
0 |
Random Function Priors for Correlation Modeling |
| 629 |
0 |
Learning Classifiers for Target Domain with Limited or No Labels |
| 630 |
0 |
On the Computation and Communication Complexity of Parallel SGD with Dynamic Batch Sizes for Stochastic Non-Convex Optimization |
| 631 |
0 |
Causal Discovery and Forecasting in Nonstationary Environments with State-Space Models |
| 632 |
0 |
Composing Value Functions in Reinforcement Learning |
| 633 |
0 |
DP-GP-LVM: A Bayesian Non-Parametric Model for Learning Multivariate Dependency Structures |
| 634 |
0 |
Distributed Weighted Matching via Randomized Composable Coresets |
| 635 |
0 |
Causal Identification under Markov Equivalence: Completeness Results |
| 636 |
0 |
Context-Aware Zero-Shot Learning for Object Recognition |
| 637 |
0 |
Dynamic Learning with Frequent New Product Launches: A Sequential Multinomial Logit Bandit Problem |
| 638 |
0 |
DeepNose: Using artificial neural networks to represent the space of odorants |
| 639 |
0 |
Data Poisoning Attacks in Multi-Party Learning |
| 640 |
0 |
Screening rules for Lasso with non-convex Sparse Regularizers |
| 641 |
0 |
Concentration Inequalities for Conditional Value at Risk |
| 642 |
0 |
Characterizing Well-Behaved vs. Pathological Deep Neural Networks |
| 643 |
0 |
Dynamic Measurement Scheduling for Event Forecasting using Deep RL |
| 644 |
0 |
Taming MAML: Efficient unbiased meta-reinforcement learning |
| 645 |
0 |
Online Learning to Rank with Features |
| 646 |
0 |
A Wrapped Normal Distribution on Hyperbolic Space for Gradient-Based Learning |
| 647 |
0 |
Compressed Factorization: Fast and Accurate Low-Rank Factorization of Compressively-Sensed Data |
| 648 |
0 |
SELFIE: Refurbishing Unclean Samples for Robust Deep Learning |
| 649 |
0 |
Learning Novel Policies For Tasks |
| 650 |
0 |
End-to-End Probabilistic Inference for Nonstationary Audio Analysis |
| 651 |
0 |
Trimming the $\ell_1$ Regularizer: Statistical Analysis, Optimization, and Applications to Deep Learning |
| 652 |
0 |
Disentangling Disentanglement in Variational Autoencoders |
| 653 |
0 |
Dimension-Wise Importance Sampling Weight Clipping for Sample-Efficient Reinforcement Learning |
| 654 |
0 |
Cognitive model priors for predicting human decisions |
| 655 |
0 |
Overcoming Mean-Field Approximations in Recurrent Gaussian Process Models |
| 656 |
0 |
A Gradual, Semi-Discrete Approach to Generative Network Training via Explicit Wasserstein Minimization |
| 657 |
0 |
Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging |
| 658 |
0 |
Fast and Flexible Inference of Joint Distributions from their Marginals |
| 659 |
0 |
Collective Model Fusion for Multiple Black-Box Experts |
| 660 |
0 |
Correlated bandits or: How to minimize mean-squared error online |
| 661 |
0 |
On discriminative learning of prediction uncertainty |
| 662 |
0 |
A Multitask Multiple Kernel Learning Algorithm for Survival Analysis with Application to Cancer Biology |
| 663 |
0 |
Asynchronous Batch Bayesian Optimisation with Improved Local Penalisation |
| 664 |
0 |
ME-Net: Towards Effective Adversarial Robustness with Matrix Estimation |
| 665 |
0 |
Learning with Bad Training Data via Iterative Trimmed Loss Minimization |
| 666 |
0 |
Target Tracking for Contextual Bandits: Application to Demand Side Management |
| 667 |
0 |
Efficient Amortised Bayesian Inference for Hierarchical and Nonlinear Dynamical Systems |
| 668 |
0 |
Graph Convolutional Gaussian Processes |
| 669 |
0 |
Exploiting Worker Correlation for Label Aggregation in Crowdsourcing |
| 670 |
0 |
Self-similar Epochs: Value in arrangement |
| 671 |
0 |
HyperGAN: A Generative Model for Diverse, Performant Neural Networks |
| 672 |
0 |
A Personalized Affective Memory Model for Improving Emotion Recognition |
| 673 |
0 |
Memory-Optimal Direct Convolutions for Maximizing Classification Accuracy in Embedded Applications |
| 674 |
0 |
Poission Subsampled R\’enyi Differential Privacy |
| 675 |
0 |
Jumpout : Improved Dropout for Deep Neural Networks with ReLUs |
| 676 |
0 |
Geometry Aware Convolutional Filters for Omnidirectional Images Representation |
| 677 |
0 |
A Framework for Bayesian Optimization in Embedded Subspaces |
| 678 |
0 |
Area Attention |
| 679 |
0 |
The Implicit Fairness Criterion of Unconstrained Learning |
| 680 |
0 |
Co-Representation Network for Generalized Zero-Shot Learning |
| 681 |
0 |
Sublinear Space Private Algorithms Under the Sliding Window Model |
| 682 |
0 |
Optimality Implies Kernel Sum Classifiers are Statistically Efficient |
| 683 |
0 |
Conditional Gradient Methods via Stochastic Path-Integrated Differential Estimator |
| 684 |
0 |
Shallow-Deep Networks: Understanding and Mitigating Network Overthinking |
| 685 |
0 |
Neurally-Guided Structure Inference |
| 686 |
0 |
An Optimal Private Stochastic-MAB Algorithm based on Optimal Private Stopping Rule |
| 687 |
0 |
A Quantitative Analysis of the Effect of Batch Normalization on Gradient Descent |
| 688 |
0 |
Active Manifolds: A non-linear analogue to Active Subspaces |
| 689 |
0 |
Bayesian Counterfactual Risk Minimization |
| 690 |
0 |
Compressing Gradient Optimizers via Count-Sketches |
| 691 |
0 |
Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks |
| 692 |
0 |
White-box vs Black-box: Bayes Optimal Strategies for Membership Inference |
| 693 |
0 |
Leveraging Low-Rank Relations Between Surrogate Tasks in Structured Prediction |
| 694 |
0 |
Decomposing feature-level variation with Covariate Gaussian Process Latent Variable Models |
| 695 |
0 |
Sublinear Time Nearest Neighbor Search over Generalized Weighted Space |
| 696 |
0 |
Bayesian leave-one-out cross-validation for large data |
| 697 |
0 |
Formal Privacy for Functional Data with Gaussian Perturbations |
| 698 |
0 |
Separable value functions across time-scales |
| 699 |
0 |
Dirichlet Simplex Nest and Geometric Inference |
| 700 |
0 |
Scalable Learning in Reproducing Kernel Krein Spaces |
| 701 |
0 |
Heterogeneous Model Reuse via Optimizing Multiparty Multiclass Margin |
| 702 |
0 |
HexaGAN: Generative Adversarial Nets for Real World Classification |
| 703 |
0 |
Recurrent Kalman Networks: Factorized Inference in High-Dimensional Deep Feature Spaces |
| 704 |
0 |
On Dropout and Nuclear Norm Regularization |
| 705 |
0 |
Phaseless PCA: Low-Rank Matrix Recovery from Column-wise Phaseless Measurements |
| 706 |
0 |
Understanding and Controlling Memory in Recurrent Neural Networks |
| 707 |
0 |
kernelPSI: a Post-Selection Inference Framework for Nonlinear Variable Selection |
| 708 |
0 |
Improved Dynamic Graph Learning through Fault-Tolerant Sparsification |
| 709 |
0 |
Non-Parametric Priors For Generative Adversarial Networks |
| 710 |
0 |
Regularization in directable environments with application to Tetris |
| 711 |
0 |
Imputing Missing Events in Continuous-Time Event Streams |
| 712 |
0 |
Learning to Convolve: A Generalized Weight-Tying Approach |
| 713 |
0 |
Large-Scale Sparse Kernel Canonical Correlation Analysis |
| 714 |
0 |
Curvature-Exploiting Acceleration of Elastic Net Computations |
| 715 |
0 |
Doubly-Competitive Distribution Estimation |
| 716 |
0 |
AUCµ: A Performance Metric for Multi-Class Machine Learning Models |
| 717 |
0 |
Neural Joint Source-Channel Coding |
| 718 |
0 |
Flat Metric Minimization with Applications in Generative Modeling |
| 719 |
0 |
Weakly-Supervised Temporal Localization via Occurrence Count Learning |
| 720 |
0 |
Rehashing Kernel Evaluation in High Dimensions |
| 721 |
0 |
Learning to Collaborate in Markov Decision Processes |
| 722 |
0 |
Dual Entangled Polynomial Code: Three-Dimensional Coding for Distributed Matrix Multiplication |
| 723 |
0 |
A Persistent Weisfeiler–Lehman Procedure for Graph Classification |
| 724 |
0 |
Neural Logic Reinforcement Learning |
| 725 |
0 |
Revisiting precision recall definition for generative modeling |
| 726 |
0 |
Acceleration of SVRG and Katyusha X by Inexact Preconditioning |
| 727 |
0 |
Look Ma, No Latent Variables: Accurate Cutset Networks via Compilation |
| 728 |
0 |
Bayesian Deconditional Kernel Mean Embeddings |
| 729 |
0 |
Optimistic Policy Optimization via Multiple Importance Sampling |
| 730 |
0 |
Multivariate-Information Adversarial Ensemble for Scalable Joint Distribution Matching |
| 731 |
0 |
Learning Hawkes Processes Under Synchronization Noise |
| 732 |
0 |
Automatic Classifiers as Scientific Instruments: One Step Further Away from Ground-Truth |
| 733 |
0 |
Blended Conditonal Gradients |
| 734 |
0 |
Boosted Density Estimation Remastered |
| 735 |
0 |
Distributional Reinforcement Learning for Efficient Exploration |
| 736 |
0 |
Generalized Approximate Survey Propagation for High-Dimensional Estimation |
| 737 |
0 |
Projection onto Minkowski Sums with Application to Constrained Learning |
| 738 |
0 |
Revisiting the Softmax Bellman Operator: New Benefits and New Perspective |
| 739 |
0 |
Voronoi Boundary Classification: A High-Dimensional Geometric Approach via Weighted Monte Carlo Integration |
| 740 |
0 |
PROVEN: Verifying Robustness of Neural Networks with a Probabilistic Approach |
| 741 |
0 |
Uniform Convergence Rate of the Kernel Density Estimator Adaptive to Intrinsic Volume Dimension |
| 742 |
0 |
Circuit-GNN: Graph Neural Networks for Distributed Circuit Design |
| 743 |
0 |
Particle Flow Bayes’ Rule |
| 744 |
0 |
Multiplicative Weights Updates as a distributed constrained optimization algorithm: Convergence to second-order stationary points almost always |
| 745 |
0 |
Generalized No Free Lunch Theorem for Adversarial Robustness |
| 746 |
0 |
Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation |
| 747 |
0 |
Fast and Simple Natural-Gradient Variational Inference with Mixture of Exponential-family Approximations |
| 748 |
0 |
Shape Constraints for Set Functions |
| 749 |
0 |
Optimal Continuous DR-Submodular Maximization and Applications to Provable Mean Field Inference |
| 750 |
0 |
Sparse Extreme Multi-label Learning with Oracle Property |
| 751 |
0 |
Stein Point Markov Chain Monte Carlo |
| 752 |
0 |
Nearest Neighbor and Kernel Survival Analysis: Nonasymptotic Error Bounds and Strong Consistency Rates |
| 753 |
0 |
Graph Neural Network for Music Score Data and Modeling Expressive Piano Performance |
| 754 |
0 |
Policy Consolidation for Continual Reinforcement Learning |
| 755 |
0 |
POPQORN: Quantifying Robustness of Recurrent Neural Networks |
| 756 |
0 |
Multi-Agent Adversarial Inverse Reinforcement Learning |
| 757 |
0 |
Amortized Monte Carlo Integration |
| 758 |
0 |
LR-GLM: High-Dimensional Bayesian Inference Using Low-Rank Data Approximations |
| 759 |
0 |
PAC Learnability of Node Functions in Networked Dynamical Systems |
| 760 |
0 |
TibGM: A Transferable and Information-Based Graphical Model Approach for Reinforcement Learning |
| 761 |
0 |
Adversarial camera stickers: A physical camera-based attack on deep learning systems |
| 762 |
0 |
Composing Entropic Policies using Divergence Correction |
| 763 |
0 |
TapNet: Neural Network Augmented with Task-Adaptive Projection for Few-Shot Learning |
| 764 |
0 |
Improving Model Selection by Employing the Test Data |
| 765 |
0 |
Understanding MCMC Dynamics as Flows on the Wasserstein Space |
| 766 |
0 |
On Certifying Non-Uniform Bounds against Adversarial Attacks |
| 767 |
0 |
Moment-Based Variational Inference for Markov Jump Processes |
| 768 |
0 |
Calibrated Approximate Bayesian Inference |
| 769 |
0 |
Feature Grouping as a Stochastic Regularizer for High-Dimensional Structured Data |
| 770 |
0 |
Game Theoretic Optimization via Gradient-based Nikaido-Isoda Function |
| 771 |
0 |
Refined Complexity of PCA with Outliers |
| 772 |
0 |
Regret Circuits: Composability of Regret Minimizers |