Rank |
Cited by |
Paper name |
0 |
280 |
Self-Attention Generative Adversarial Networks |
1 |
95 |
A Convergence Theory for Deep Learning via Over-Parameterization |
2 |
95 |
Gradient Descent Finds Global Minima of Deep Neural Networks |
3 |
50 |
Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks |
4 |
49 |
Learning Latent Dynamics for Planning from Pixels |
5 |
41 |
Adversarial examples from computational constraints |
6 |
38 |
Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations |
7 |
35 |
Quantifying Generalization in Reinforcement Learning |
8 |
33 |
Theoretically Principled Trade-off between Robustness and Accuracy |
9 |
32 |
Sever: A Robust Meta-Algorithm for Stochastic Optimization |
10 |
27 |
Invertible Residual Networks |
11 |
27 |
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks |
12 |
27 |
AdaGrad stepsizes: sharp convergence over nonconvex landscapes |
13 |
26 |
TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing |
14 |
26 |
Certified Adversarial Robustness via Randomized Smoothing |
15 |
25 |
Graphite: Iterative Generative Modeling of Graphs |
16 |
24 |
Do ImageNet Classifiers Generalize to ImageNet? |
17 |
22 |
AReS and MaRS – Adversarial and MMD-Minimizing Regression for SDEs |
18 |
20 |
Adversarial Examples Are a Natural Consequence of Test Error in Noise |
19 |
20 |
Simplifying Graph Convolutional Networks |
20 |
19 |
On the Spectral Bias of Neural Networks |
21 |
18 |
Optimal Auctions through Deep Learning |
22 |
17 |
The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects |
23 |
15 |
Adaptive Neural Trees |
24 |
14 |
MASS: Masked Sequence to Sequence Pre-training for Language Generation |
25 |
14 |
Obtaining Fairness using Optimal Transport Theory |
26 |
14 |
Overparameterized Nonlinear Learning: Gradient Descent Takes the Shortest Path? |
27 |
13 |
NAS-Bench-101: Towards Reproducible Neural Architecture Search |
28 |
13 |
Rademacher Complexity for Adversarially Robust Generalization |
29 |
12 |
Multi-Object Representation Learning with Iterative Variational Inference |
30 |
12 |
Imitating Latent Policies from Observation |
31 |
12 |
The Evolved Transformer |
32 |
12 |
SGD: General Analysis and Improved Rates |
33 |
11 |
Actor-Attention-Critic for Multi-Agent Reinforcement Learning |
34 |
11 |
Noise2Self: Blind Denoising by Self-Supervision |
35 |
11 |
Flow++: Improving Flow-Based Generative Models with Variational Dequantization and Architecture Design |
36 |
11 |
Stochastic Gradient Push for Distributed Deep Learning |
37 |
11 |
Optimal Transport for structured data with application on graphs |
38 |
11 |
On the Universality of Invariant Networks |
39 |
10 |
Random Shuffling Beats SGD after Finite Epochs |
40 |
10 |
Analyzing Federated Learning through an Adversarial Lens |
41 |
10 |
Learning a Prior over Intent via Meta-Inverse Reinforcement Learning |
42 |
10 |
Online Meta-Learning |
43 |
10 |
On Efficient Optimal Transport: An Analysis of Greedy and Accelerated Mirror Descent Algorithms |
44 |
9 |
Learning to Generalize from Sparse and Underspecified Rewards |
45 |
9 |
Insertion Transformer: Flexible Sequence Generation via Insertion Operations |
46 |
9 |
CoT: Cooperative Training for Generative Modeling of Discrete Data |
47 |
9 |
Variational Implicit Processes |
48 |
9 |
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables |
49 |
9 |
Emerging Convolutions for Generative Normalizing Flows |
50 |
9 |
Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously |
51 |
9 |
Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition |
52 |
9 |
FloWaveNet : A Generative Flow for Raw Audio |
53 |
9 |
Policy Certificates: Towards Accountable Reinforcement Learning |
54 |
9 |
Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds |
55 |
9 |
Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication |
56 |
9 |
Gauge Equivariant Convolutional Networks and the Icosahedral CNN |
57 |
9 |
High-Fidelity Image Generation With Fewer Labels |
58 |
9 |
Safe Policy Improvement with Baseline Bootstrapping |
59 |
9 |
Off-Policy Deep Reinforcement Learning without Exploration |
60 |
9 |
Using Pre-Training Can Improve Model Robustness and Uncertainty |
61 |
9 |
Manifold Mixup: Better Representations by Interpolating Hidden States |
62 |
8 |
Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning |
63 |
8 |
Learning Linear-Quadratic Regulators Efficiently with only $\sqrt{T}$ Regret |
64 |
8 |
Defending Against Saddle Point Attack in Byzantine-Robust Distributed Learning |
65 |
8 |
On the Linear Speedup Analysis of Communication Efficient Momentum SGD for Distributed Non-Convex Optimization |
66 |
8 |
Open-ended learning in symmetric zero-sum games |
67 |
8 |
Error Feedback Fixes SignSGD and other Gradient Compression Schemes |
68 |
7 |
TarMAC: Targeted Multi-Agent Communication |
69 |
7 |
Latent Normalizing Flows for Discrete Sequences |
70 |
7 |
Provably Efficient Maximum Entropy Exploration |
71 |
7 |
Sorting Out Lipschitz Function Approximation |
72 |
7 |
Understanding Geometry of Encoder-Decoder CNNs |
73 |
7 |
A Theory of Regularized Markov Decision Processes |
74 |
7 |
Graph U-Nets |
75 |
7 |
A Kernel Theory of Modern Data Augmentation |
76 |
7 |
Learning deep kernels for exponential family densities |
77 |
7 |
On Learning Invariant Representations for Domain Adaptation |
78 |
7 |
Towards a Unified Analysis of Random Fourier Features |
79 |
7 |
Deep Counterfactual Regret Minimization |
80 |
7 |
Training Neural Networks with Local Error Signals |
81 |
7 |
HOList: An Environment for Machine Learning of Higher Order Logic Theorem Proving |
82 |
7 |
ELF OpenGo: an analysis and open reimplementation of AlphaZero |
83 |
6 |
Geometry and Symmetry in Short-and-Sparse Deconvolution |
84 |
6 |
Agnostic Federated Learning |
85 |
6 |
On the Limitations of Representing Functions on Sets |
86 |
6 |
Parameter-Efficient Transfer Learning for NLP |
87 |
6 |
Escaping Saddle Points with Adaptive Gradient Methods |
88 |
6 |
Batch Policy Learning under Constraints |
89 |
6 |
Understanding the Impact of Entropy on Policy Optimization |
90 |
6 |
An Instability in Variational Inference for Topic Models |
91 |
6 |
Understanding the Origins of Bias in Word Embeddings |
92 |
6 |
Making Convolutional Networks Shift-Invariant Again |
93 |
6 |
Fast Context Adaptation via Meta-Learning |
94 |
6 |
SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning |
95 |
6 |
The Odds are Odd: A Statistical Test for Detecting Adversarial Examples |
96 |
6 |
Complexity of Linear Regions in Deep Networks |
97 |
6 |
Training Well-Generalizing Classifiers for Fairness Metrics and Other Data-Dependent Constraints |
98 |
6 |
Scalable Fair Clustering |
99 |
6 |
Learning Action Representations for Reinforcement Learning |
100 |
6 |
An Investigation into Neural Net Optimization via Hessian Eigenvalue Density |
101 |
6 |
Natural Analysts in Adaptive Data Analysis |
102 |
6 |
Collaborative Evolutionary Reinforcement Learning |
103 |
6 |
Katalyst: Boosting Convex Katayusha for Non-Convex Problems with a Large Condition Number |
104 |
6 |
Nonconvex Variance Reduced Optimization with Arbitrary Sampling |
105 |
5 |
Loss Landscapes of Regularized Linear Autoencoders |
106 |
5 |
A Theoretical Analysis of Contrastive Unsupervised Representation Learning |
107 |
5 |
Guarantees for Spectral Clustering with Fairness Constraints |
108 |
5 |
Online Control with Adversarial Disturbances |
109 |
5 |
Width Provably Matters in Optimization for Deep Linear Neural Networks |
110 |
5 |
Sliced-Wasserstein Flows: Nonparametric Generative Modeling via Optimal Transport and Diffusions |
111 |
5 |
MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing |
112 |
5 |
Remember and Forget for Experience Replay |
113 |
5 |
The advantages of multiple classes for reducing overfitting from test set reuse |
114 |
5 |
Model-Based Active Exploration |
115 |
5 |
Efficient Dictionary Learning with Gradient Descent |
116 |
5 |
Near optimal finite time identification of arbitrary linear dynamical systems |
117 |
5 |
EDDI: Efficient Dynamic Discovery of High-Value Information with Partial VAE |
118 |
5 |
On the Impact of the Activation function on Deep Neural Networks Training |
119 |
5 |
Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits |
120 |
5 |
Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning |
121 |
5 |
Variational Inference for sparse network reconstruction from count data |
122 |
5 |
GEOMetrics: Exploiting Geometric Structure for Graph-Encoded Objects |
123 |
5 |
SAGA with Arbitrary Sampling |
124 |
5 |
Robust Decision Trees Against Adversarial Examples |
125 |
5 |
First-Order Adversarial Vulnerability of Neural Networks and Input Dimension |
126 |
4 |
On Variational Bounds of Mutual Information |
127 |
4 |
Differentially Private Fair Learning |
128 |
4 |
Fair k-Center Clustering for Data Summarization |
129 |
4 |
Mixture Models for Diverse Machine Translation: Tricks of the Trade |
130 |
4 |
Non-Monotonic Sequential Text Generation |
131 |
4 |
Gromov-Wasserstein Learning for Graph Matching and Node Embedding |
132 |
4 |
Counterfactual Visual Explanations |
133 |
4 |
Optimal Mini-Batch and Step Sizes for SAGA |
134 |
4 |
Infinite Mixture Prototypes for Few-shot Learning |
135 |
4 |
A Dynamical Systems Perspective on Nesterov Acceleration |
136 |
4 |
On the Complexity of Approximating Wasserstein Barycenters |
137 |
4 |
SGD without Replacement: Sharper Rates for General Smooth Convex Functions |
138 |
4 |
Learning interpretable continuous-time models of latent stochastic dynamical systems |
139 |
4 |
Bayesian Nonparametric Federated Learning of Neural Networks |
140 |
4 |
BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning |
141 |
4 |
Stochastic Optimization for DC Functions and Non-smooth Non-convex Regularizers with Non-asymptotic Convergence |
142 |
4 |
Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations |
143 |
4 |
Provable Guarantees for Gradient-Based Meta-Learning |
144 |
4 |
Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules |
145 |
4 |
Generalized Majorization-Minimization |
146 |
4 |
Simple Black-box Adversarial Attacks |
147 |
4 |
Parameter efficient training of deep convolutional neural networks by dynamic sparse reparameterization |
148 |
4 |
NATTACK: Learning the Distributions of Adversarial Examples for an Improved Black-Box Attack on Deep Neural Networks |
149 |
4 |
Are Generative Classifiers More Robust to Adversarial Attacks? |
150 |
4 |
Information-Theoretic Considerations in Batch Reinforcement Learning |
151 |
4 |
Provably efficient RL with Rich Observations via Latent State Decoding |
152 |
4 |
Locally Private Bayesian Inference for Count Models |
153 |
4 |
Bayesian Joint Spike-and-Slab Graphical Lasso |
154 |
4 |
Graph Matching Networks for Learning the Similarity of Graph Structured Objects |
155 |
4 |
Diagnosing Bottlenecks in Deep Q-learning Algorithms |
156 |
4 |
An Investigation of Model-Free Planning |
157 |
4 |
Contextual Memory Trees |
158 |
4 |
Ithemal: Accurate, Portable and Fast Basic Block Throughput Estimation using Deep Neural Networks |
159 |
4 |
Data Shapley: Equitable Valuation of Data for Machine Learning |
160 |
4 |
SelectiveNet: A Deep Neural Network with an Integrated Reject Option |
161 |
3 |
Multi-Frequency Phase Synchronization |
162 |
3 |
Sublinear quantum algorithms for training linear and kernel-based classifiers |
163 |
3 |
Probabilistic Neural Symbolic Models for Interpretable Visual Question Answering |
164 |
3 |
Similarity of Neural Network Representations Revisited |
165 |
3 |
What is the Effect of Importance Weighting in Deep Learning? |
166 |
3 |
Analyzing and Improving Representations with the Soft Nearest Neighbor Loss |
167 |
3 |
Cautious Regret Minimization: Online Optimization with Long-Term Budget Constraints |
168 |
3 |
A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks |
169 |
3 |
Geometric Scattering for Graph Data Analysis |
170 |
3 |
Stable and Fair Classification |
171 |
3 |
Analogies Explained: Towards Understanding Word Embeddings |
172 |
3 |
Finding Options that Minimize Planning Time |
173 |
3 |
Hybrid Models with Deep and Invertible Features |
174 |
3 |
Transfer Learning for Related Reinforcement Learning Tasks via Image-to-Image Translation |
175 |
3 |
Distributed Learning over Unreliable Networks |
176 |
3 |
Learning Optimal Fair Policies |
177 |
3 |
Metropolis-Hastings Generative Adversarial Networks |
178 |
3 |
Non-Asymptotic Analysis of Fractional Langevin Monte Carlo for Non-Convex Optimization |
179 |
3 |
Multi-Frequency Vector Diffusion Maps |
180 |
3 |
Fairwashing: the risk of rationalization |
181 |
3 |
Finding Mixed Nash Equilibria of Generative Adversarial Networks |
182 |
3 |
Learning Generative Models across Incomparable Spaces |
183 |
3 |
Learning-to-Learn Stochastic Gradient Descent with Biased Regularization |
184 |
3 |
Plug-and-Play Methods Provably Converge with Properly Trained Denoisers |
185 |
3 |
Control Regularization for Reduced Variance Reinforcement Learning |
186 |
3 |
The Natural Language of Actions |
187 |
3 |
Almost surely constrained convex optimization |
188 |
3 |
Traditional and Heavy Tailed Self Regularization in Neural Network Models |
189 |
3 |
Self-Supervised Exploration via Disagreement |
190 |
3 |
Direct Uncertainty Prediction for Medical Second Opinions |
191 |
3 |
Wasserstein Adversarial Examples via Projected Sinkhorn Iterations |
192 |
3 |
Conditioning by adaptive sampling for robust design |
193 |
3 |
Does Data Augmentation Lead to Positive Margin? |
194 |
3 |
Greedy Layerwise Learning Can Scale To ImageNet |
195 |
3 |
DL2: Training and Querying Neural Networks with Logic |
196 |
3 |
The Value Function Polytope in Reinforcement Learning |
197 |
3 |
Action Robust Reinforcement Learning and Applications in Continuous Control |
198 |
3 |
Automatic Posterior Transformation for Likelihood-Free Inference |
199 |
3 |
Rao-Blackwellized Stochastic Gradients for Discrete Distributions |
200 |
3 |
Subspace Robust Wasserstein Distances |
201 |
3 |
Importance Sampling Policy Evaluation with an Estimated Behavior Policy |
202 |
3 |
Lipschitz Generative Adversarial Nets |
203 |
3 |
Homomorphic Sensing |
204 |
3 |
A Conditional-Gradient-Based Augmented Lagrangian Framework |
205 |
3 |
Deep Factors for Forecasting |
206 |
3 |
Learning to bid in revenue-maximizing auctions |
207 |
3 |
Molecular Hypergraph Grammar with Its Application to Molecular Optimization |
208 |
3 |
Topological Data Analysis of Decision Boundaries with Application to Model Selection |
209 |
3 |
Statistical Foundations of Virtual Democracy |
210 |
3 |
Lower Bounds for Smooth Nonconvex Finite-Sum Optimization |
211 |
3 |
Improving Adversarial Robustness via Promoting Ensemble Diversity |
212 |
3 |
Metric-Optimized Example Weights |
213 |
3 |
Nonlinear Distributional Gradient Temporal-Difference Learning |
214 |
2 |
Finite-Time Analysis of Distributed TD(0) with Linear Function Approximation on Multi-Agent Reinforcement Learning |
215 |
2 |
Domain Adaptation with Asymmetrically-Relaxed Distribution Alignment |
216 |
2 |
Adaptive and Safe Bayesian Optimization in High Dimensions via One-Dimensional Subspaces |
217 |
2 |
Robustly Disentangled Causal Mechanisms: Validating Deep Representations for Interventional Robustness |
218 |
2 |
Guided evolutionary strategies: augmenting random search with surrogate gradients |
219 |
2 |
Autoregressive Energy Machines |
220 |
2 |
Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback |
221 |
2 |
Online Algorithms for Rent-Or-Buy with Expert Advice |
222 |
2 |
Submodular Maximization beyond Non-negativity: Guarantees, Fast Algorithms, and Applications |
223 |
2 |
Rates of Convergence for Sparse Variational Gaussian Process Regression |
224 |
2 |
CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network |
225 |
2 |
MeanSum: A Neural Model for Unsupervised Multi-Document Abstractive Summarization |
226 |
2 |
Adaptive Sensor Placement for Continuous Spaces |
227 |
2 |
Global Convergence of Block Coordinate Descent in Deep Learning |
228 |
2 |
Repairing without Retraining: Avoiding Disparate Impact with Counterfactual Distributions |
229 |
2 |
Discovering Context Effects from Raw Choice Data |
230 |
2 |
Fairness without Harm: Decoupled Classifiers with Preference Guarantees |
231 |
2 |
POLITEX: Regret Bounds for Policy Iteration using Expert Prediction |
232 |
2 |
Fair Regression: Quantitative Definitions and Reduction-Based Algorithms |
233 |
2 |
Flexibly Fair Representation Learning by Disentanglement |
234 |
2 |
Proportionally Fair Clustering |
235 |
2 |
Stochastic Beams and Where To Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement |
236 |
2 |
On the Connection Between Adversarial Robustness and Saliency Map Interpretability |
237 |
2 |
Understanding Impacts of High-Order Loss Approximations and Features in Deep Learning Interpretation |
238 |
2 |
$\texttt{DoubleSqueeze}$: Parallel Stochastic Gradient Descent with Double-pass Error-Compensated Compression |
239 |
2 |
Almost Unsupervised Text to Speech and Automatic Speech Recognition |
240 |
2 |
Target-Based Temporal-Difference Learning |
241 |
2 |
Scalable Metropolis-Hastings for Exact Bayesian Inference with Large Datasets |
242 |
2 |
Toward Controlling Discrimination in Online Ad Auctions |
243 |
2 |
Learning to Infer Program Sketches |
244 |
2 |
Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Value Approximation |
245 |
2 |
Classification from Positive, Unlabeled and Biased Negative Data |
246 |
2 |
Neural Network Attributions: A Causal Perspective |
247 |
2 |
Learning Discrete Structures for Graph Neural Networks |
248 |
2 |
Cheap Orthogonal Constraints in Neural Networks: A Simple Parametrization of the Orthogonal and Unitary Group |
249 |
2 |
CompILE: Compositional Imitation Learning and Execution |
250 |
2 |
Statistics and Samples in Distributional Reinforcement Learning |
251 |
2 |
Exploring the Landscape of Spatial Robustness |
252 |
2 |
EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis |
253 |
2 |
Provably Efficient Imitation Learning from Observation Alone |
254 |
2 |
Alternating Minimizations Converge to Second-Order Optimal Solutions |
255 |
2 |
On the statistical rate of nonlinear recovery in generative models with heavy-tailed data |
256 |
2 |
Sensitivity Analysis of Linear Structural Causal Models |
257 |
2 |
Simple Stochastic Gradient Methods for Non-Smooth Non-Convex Regularized Optimization |
258 |
2 |
Beyond Adaptive Submodularity: Approximation Guarantees of Greedy Policy with Adaptive Submodularity Ratio |
259 |
2 |
Band-limited Training and Inference for Convolutional Neural Networks |
260 |
2 |
Multivariate Submodular Optimization |
261 |
2 |
Domain Agnostic Learning with Disentangled Representations |
262 |
2 |
Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling |
263 |
2 |
Parsimonious Black-Box Adversarial Attacks via Efficient Combinatorial Optimization |
264 |
2 |
Robust Learning from Untrusted Sources |
265 |
2 |
Trading Redundancy for Communication: Speeding up Distributed SGD for Non-convex Optimization |
266 |
2 |
On Connected Sublevel Sets in Deep Learning |
267 |
2 |
Sum-of-Squares Polynomial Flow |
268 |
2 |
On the Convergence and Robustness of Adversarial Training |
269 |
2 |
Active Learning for Decision-Making from Imbalanced Observational Data |
270 |
2 |
Low Latency Privacy Preserving Inference |
271 |
2 |
Weak Detection of Signal in the Spiked Wigner Model |
272 |
2 |
The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study |
273 |
2 |
Same, Same But Different: Recovering Neural Network Quantization Error Through Weight Factorization |
274 |
2 |
Graphical-model based estimation and inference for differential privacy |
275 |
2 |
Differentiable Linearized ADMM |
276 |
2 |
CapsAndRuns: An Improved Method for Approximately Optimal Algorithm Configuration |
277 |
2 |
Composable Core-sets for Determinant Maximization: A Simple Near-Optimal Algorithm |
278 |
2 |
Fingerprint Policy Optimisation for Robust Reinforcement Learning |
279 |
2 |
Safe Grid Search with Optimal Complexity |
280 |
2 |
Dynamic Weights in Multi-Objective Deep Reinforcement Learning |
281 |
2 |
DeepMDP: Learning Continuous Latent Space Models for Representation Learning |
282 |
2 |
On Symmetric Losses for Learning from Corrupted Labels |
283 |
2 |
A Kernel Perspective for Regularizing Deep Neural Networks |
284 |
2 |
Random Matrix Improved Covariance Estimation for a Large Class of Metrics |
285 |
2 |
Task-Agnostic Dynamics Priors for Deep Reinforcement Learning |
286 |
2 |
Adversarial Generation of Time-Frequency Features with application in audio synthesis |
287 |
2 |
Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models |
288 |
2 |
Correlated Variational Auto-Encoders |
289 |
2 |
Maximum Likelihood Estimation for Learning Populations of Parameters |
290 |
2 |
Self-Attention Graph Pooling |
291 |
2 |
Fast Rates for a kNN Classifier Robust to Unknown Asymmetric Label Noise |
292 |
2 |
Learning to Prove Theorems via Interacting with Proof Assistants |
293 |
2 |
A Composite Randomized Incremental Gradient Method |
294 |
2 |
GMNN: Graph Markov Neural Networks |
295 |
2 |
Grid-Wise Control for Multi-Agent Reinforcement Learning in Video Game AI |
296 |
2 |
When Samples Are Strategically Selected |
297 |
2 |
Processing Megapixel Images with Deep Attention-Sampling Models |
298 |
2 |
Passed & Spurious: Descent Algorithms and Local Minima in Spiked Matrix-Tensor Models |
299 |
2 |
PA-GD: On the Convergence of Perturbed Alternating Gradient Descent to Second-Order Stationary Points for Structured Nonconvex Optimization |
300 |
2 |
A Contrastive Divergence for Combining Variational Inference and MCMC |
301 |
2 |
Adversarial Attacks on Node Embeddings via Graph Poisoning |
302 |
1 |
Understanding Priors in Bayesian Neural Networks at the Unit Level |
303 |
1 |
Semi-Cyclic Stochastic Gradient Descent |
304 |
1 |
Learning Dependency Structures for Weak Supervision Models |
305 |
1 |
Faster Attend-Infer-Repeat with Tractable Probabilistic Models |
306 |
1 |
Hierarchical Importance Weighted Autoencoders |
307 |
1 |
Unsupervised Label Noise Modeling and Loss Correction |
308 |
1 |
QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning |
309 |
1 |
The information-theoretic value of unlabeled data in semi-supervised learning |
310 |
1 |
Cross-Domain 3D Equivariant Image Embeddings |
311 |
1 |
Neural Collaborative Subspace Clustering |
312 |
1 |
PAC Identification of Many Good Arms in Stochastic Multi-Armed Bandits |
313 |
1 |
Sequential Facility Location: Approximate Submodularity and Greedy Algorithm |
314 |
1 |
Good Initializations of Variational Bayes for Deep Models |
315 |
1 |
Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-linearities |
316 |
1 |
Nonparametric Bayesian Deep Networks with Local Competition |
317 |
1 |
Communication-Constrained Inference and the Role of Shared Randomness |
318 |
1 |
Decentralized Exploration in Multi-Armed Bandits |
319 |
1 |
Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations |
320 |
1 |
On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference |
321 |
1 |
Measurements of Three-Level Hierarchical Structure in the Outliers in the Spectrum of Deepnet Hessians |
322 |
1 |
DAG-GNN: DAG Structure Learning with Graph Neural Networks |
323 |
1 |
Optimal Kronecker-Sum Approximation of Real Time Recurrent Learning |
324 |
1 |
Partially Linear Additive Gaussian Graphical Models |
325 |
1 |
Learning Context-dependent Label Permutations for Multi-label Classification |
326 |
1 |
Approximation and non-parametric estimation of ResNet-type convolutional neural networks |
327 |
1 |
Robust Inference via Generative Classifiers for Handling Noisy Labels |
328 |
1 |
Robust Estimation of Tree Structured Gaussian Graphical Models |
329 |
1 |
Graph Resistance and Learning from Pairwise Comparisons |
330 |
1 |
Coresets for Ordered Weighted Clustering |
331 |
1 |
Efficient Nonconvex Regularized Tensor Completion with Structure-aware Proximal Iterations |
332 |
1 |
Zero-Shot Knowledge Distillation in Deep Networks |
333 |
1 |
Breaking the gridlock in Mixture-of-Experts: Consistent and Efficient Algorithms |
334 |
1 |
Spectral Clustering of Signed Graphs via Matrix Power Means |
335 |
1 |
Adaptive Regret of Convex and Smooth Functions |
336 |
1 |
Scaling Up Ordinal Embedding: A Landmark Approach |
337 |
1 |
Understanding and correcting pathologies in the training of learned optimizers |
338 |
1 |
On Scalable and Efficient Computation of Large Scale Optimal Transport |
339 |
1 |
A fully differentiable beam search decoder |
340 |
1 |
Online Variance Reduction with Mixtures |
341 |
1 |
MIWAE: Deep Generative Modelling and Imputation of Incomplete Data Sets |
342 |
1 |
A Polynomial Time MCMC Method for Sampling from Continuous Determinantal Point Processes |
343 |
1 |
Fairness risk measures |
344 |
1 |
Fairness-Aware Learning for Continuous Attributes and Treatments |
345 |
1 |
Neural Separation of Observed and Unobserved Distributions |
346 |
1 |
Reinforcement Learning in Configurable Continuous Environments |
347 |
1 |
Bit-Swap: Recursive Bits-Back Coding for Lossless Compression with Hierarchical Latent Variables |
348 |
1 |
Adaptive Scale-Invariant Online Algorithms for Learning Linear Models |
349 |
1 |
Bridging Theory and Algorithm for Domain Adaptation |
350 |
1 |
MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement |
351 |
1 |
Learning Discrete and Continuous Factors of Data via Alternating Disentanglement |
352 |
1 |
CAB: Continuous Adaptive Blending for Policy Evaluation and Learning |
353 |
1 |
Learning Structured Decision Problems with Unawareness |
354 |
1 |
Adaptive Monte Carlo Multiple Testing via Multi-Armed Bandits |
355 |
1 |
Competing Against Nash Equilibria in Adversarially Changing Zero-Sum Games |
356 |
1 |
Complementary-Label Learning for Arbitrary Losses and Models |
357 |
1 |
Neuron birth-death dynamics accelerates gradient descent and converges asymptotically |
358 |
1 |
Unifying Orthogonal Monte Carlo Methods |
359 |
1 |
Differentially Private Empirical Risk Minimization with Non-convex Loss Functions |
360 |
1 |
Towards a Deep and Unified Understanding of Deep Neural Models in NLP |
361 |
1 |
State-Reification Networks: Improving Generalization by Modeling the Distribution of Hidden Representations |
362 |
1 |
Geometric Losses for Distributional Learning |
363 |
1 |
Learn to Grow: A Continual Structure Learning Framework for Overcoming Catastrophic Forgetting |
364 |
1 |
Co-manifold learning with missing data |
365 |
1 |
Compositional Fairness Constraints for Graph Embeddings |
366 |
1 |
Improved Convergence for $\ell_1$ and $\ell_\infty$ Regression via Iteratively Reweighted Least Squares |
367 |
1 |
Transfer of Samples in Policy Search via Multiple Importance Sampling |
368 |
1 |
Sample-Optimal Parametric Q-Learning Using Linearly Additive Features |
369 |
1 |
Bias Also Matters: Bias Attribution for Deep Neural Network Explanation |
370 |
1 |
Combining parametric and nonparametric models for off-policy evaluation |
371 |
1 |
Disentangled Graph Convolutional Networks |
372 |
1 |
Differentiable Dynamic Normalization for Learning Deep Representation |
373 |
1 |
Relational Pooling for Graph Representations |
374 |
1 |
Hessian Aided Policy Gradient |
375 |
1 |
Estimate Sequences for Variance-Reduced Stochastic Composite Optimization |
376 |
1 |
Addressing the Loss-Metric Mismatch with Adaptive Loss Alignment |
377 |
1 |
Detecting Overlapping and Correlated Communities without Pure Nodes: Identifiability and Algorithm |
378 |
1 |
Tensor Variable Elimination for Plated Factor Graphs |
379 |
1 |
Accelerated Linear Convergence of Stochastic Momentum Methods in Wasserstein Distances |
380 |
1 |
Position-aware Graph Neural Networks |
381 |
1 |
How does Disagreement Help Generalization against Label Corruption? |
382 |
1 |
IMEXnet – A Forward Stable Deep Neural Network |
383 |
1 |
Inferring Heterogeneous Causal Effects in Presence of Spatial Confounding |
384 |
1 |
Bayesian Optimization Meets Bayesian Optimal Stopping |
385 |
1 |
Submodular Streaming in All Its Glory: Tight Approximation, Minimum Memory and Low Adaptive Complexity |
386 |
1 |
Equivariant Transformer Networks |
387 |
1 |
Submodular Observation Selection and Information Gathering for Quadratic Models |
388 |
1 |
Conditional Independence in Testing Bayesian Networks |
389 |
1 |
MONK — Outlier-Robust Mean Embedding Estimation by Median-of-Means |
390 |
1 |
Improved Parallel Algorithms for Density-Based Network Clustering |
391 |
1 |
Graph Element Networks: adaptive, structured computation and memory |
392 |
1 |
Learning Models from Data with Measurement Error: Tackling Underreporting |
393 |
1 |
Surrogate Losses for Online Learning of Stepsizes in Stochastic Non-Convex Optimization |
394 |
1 |
A Deep Reinforcement Learning Perspective on Internet Congestion Control |
395 |
1 |
Orthogonal Random Forest for Causal Inference |
396 |
1 |
Classifying Treatment Responders Under Causal Effect Monotonicity |
397 |
1 |
On the Generalization Gap in Reparameterizable Reinforcement Learning |
398 |
1 |
Approximated Oracle Filter Pruning for Destructive CNN Width Optimization |
399 |
1 |
Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models |
400 |
1 |
Better generalization with less data using robust gradient descent |
401 |
1 |
Monge blunts Bayes: Hardness Results for Adversarial Training |
402 |
1 |
Beyond the Chinese Restaurant and Pitman-Yor processes: Statistical Models with double power-law behavior |
403 |
1 |
Variational Annealing of GANs: A Langevin Perspective |
404 |
1 |
On the Design of Estimators for Bandit Off-Policy Evaluation |
405 |
1 |
A Large-Scale Study on Regularization and Normalization in GANs |
406 |
1 |
Automated Model Selection with Bayesian Quadrature |
407 |
1 |
Zeno: Distributed Stochastic Gradient Descent with Suspicion-based Fault-tolerance |
408 |
1 |
Deep Gaussian Processes with Importance-Weighted Variational Inference |
409 |
1 |
Noisy Dual Principal Component Pursuit |
410 |
1 |
Transferable Clean-Label Poisoning Attacks on Deep Neural Nets |
411 |
1 |
Bilinear Bandits with Low-rank Structure |
412 |
1 |
Structured agents for physical construction |
413 |
1 |
Estimating Information Flow in Deep Neural Networks |
414 |
1 |
Understanding and Utilizing Deep Neural Networks Trained with Noisy Labels |
415 |
1 |
GOODE: A Gaussian Off-The-Shelf Ordinary Differential Equation Solver |
416 |
1 |
Maximum Entropy-Regularized Multi-Goal Reinforcement Learning |
417 |
1 |
Optimal Algorithms for Lipschitz Bandits with Heavy-tailed Rewards |
418 |
1 |
Distribution calibration for regression |
419 |
1 |
Distributed Learning with Sublinear Communication |
420 |
1 |
Temporal Gaussian Mixture Layer for Videos |
421 |
1 |
Stochastic Deep Networks |
422 |
1 |
Benefits and Pitfalls of the Exponential Mechanism with Applications to Hilbert Spaces and Functional PCA |
423 |
1 |
Efficient optimization of loops and limits with randomized telescoping sums |
424 |
1 |
Robust Influence Maximization for Hyperparametric Models |
425 |
1 |
Communication Complexity in Locally Private Distribution Estimation and Heavy Hitters |
426 |
1 |
Convolutional Poisson Gamma Belief Network |
427 |
1 |
SWALP : Stochastic Weight Averaging in Low Precision Training |
428 |
1 |
Improving Neural Network Quantization without Retraining using Outlier Channel Splitting |
429 |
1 |
Beyond Backprop: Online Alternating Minimization with Auxiliary Variables |
430 |
1 |
Discovering Options for Exploration by Minimizing Cover Time |
431 |
1 |
Static Automatic Batching In TensorFlow |
432 |
1 |
Rotation Invariant Householder Parameterization for Bayesian PCA |
433 |
1 |
Fault Tolerance in Iterative-Convergent Machine Learning |
434 |
1 |
SATNet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver |
435 |
1 |
Fast Incremental von Neumann Graph Entropy Computation: Theory, Algorithm, and Applications |
436 |
1 |
Generalized Linear Rule Models |
437 |
1 |
Optimal Minimal Margin Maximization with Boosting |
438 |
1 |
GDPP: Learning Diverse Generations using Determinantal Point Processes |
439 |
1 |
Per-Decision Option Discounting |
440 |
1 |
Adaptive Stochastic Natural Gradient Method for One-Shot Neural Architecture Search |
441 |
1 |
BayesNAS: A Bayesian Approach for Neural Architecture Search |
442 |
1 |
Collaborative Channel Pruning for Deep Networks |
443 |
1 |
Rethinking Lossy Compression: The Rate-Distortion-Perception Tradeoff |
444 |
1 |
Learning from a Learner |
445 |
1 |
Rate Distortion For Model Compression:From Theory To Practice |
446 |
1 |
Curiosity-Bottleneck: Exploration By Distilling Task-Specific Novelty |
447 |
1 |
Imitation Learning from Imperfect Demonstration |
448 |
1 |
Switching Linear Dynamics for Variational Bayes Filtering |
449 |
1 |
Feature-Critic Networks for Heterogeneous Domain Generalization |
450 |
1 |
Entropic GANs meet VAEs: A Statistical Approach to Compute Sample Likelihoods in GANs |
451 |
1 |
Predictor-Corrector Policy Optimization |
452 |
1 |
EMI: Exploration with Mutual Information |
453 |
1 |
Wasserstein of Wasserstein Loss for Learning Generative Models |
454 |
1 |
Learning Optimal Linear Regularizers |
455 |
1 |
A Statistical Investigation of Long Memory in Language and Music |
456 |
1 |
Characterization of Convex Objective Functions and Optimal Expected Convergence Rates for SGD |
457 |
1 |
Generative Adversarial User Model for Reinforcement Learning Based Recommendation System |
458 |
1 |
Inference and Sampling of $K_{33}$-free Ising Models |
459 |
1 |
CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning |
460 |
1 |
A Block Coordinate Descent Proximal Method for Simultaneous Filtering and Parameter Estimation |
461 |
1 |
Learning to Optimize Multigrid PDE Solvers |
462 |
1 |
LGM-Net: Learning to Generate Matching Networks for Few-Shot Learning |
463 |
1 |
Combating Label Noise in Deep Learning using Abstention |
464 |
1 |
On The Power of Curriculum Learning in Training Deep Networks |
465 |
1 |
Learning to Clear the Market |
466 |
1 |
Online learning with kernel losses |
467 |
1 |
Teaching a black-box learner |
468 |
1 |
Learning to Groove with Inverse Sequence Transformations |
469 |
1 |
Stable-Predictive Optimistic Counterfactual Regret Minimization |
470 |
1 |
Faster Stochastic Alternating Direction Method of Multipliers for Nonconvex Optimization |
471 |
1 |
Improved Zeroth-Order Variance Reduced Algorithms and Analysis for Nonconvex Optimization |
472 |
1 |
Making Deep Q-learning methods robust to time discretization |
473 |
1 |
Validating Causal Inference Models via Influence Functions |
474 |
0 |
Lorentzian Distance Learning for Hyperbolic Representations |
475 |
0 |
Pareto Optimal Streaming Unsupervised Classification |
476 |
0 |
LatentGNN: Learning Efficient Non-local Relations for Visual Recognition |
477 |
0 |
Greedy Orthogonal Pivoting Algorithm for Non-Negative Matrix Factorization |
478 |
0 |
Partially Exchangeable Networks and Architectures for Learning Summary Statistics in Approximate Bayesian Computation |
479 |
0 |
Hyperbolic Disk Embeddings for Directed Acyclic Graphs |
480 |
0 |
Faster Algorithms for Binary Matrix Factorization |
481 |
0 |
Contextual Multi-armed Bandit Algorithm for Semiparametric Reward Model |
482 |
0 |
ARSM: Augment-REINFORCE-Swap-Merge Estimator for Gradient Backpropagation Through Categorical Variables |
483 |
0 |
Unsupervised Deep Learning by Neighbourhood Discovery |
484 |
0 |
Discovering Conditionally Salient Features with Statistical Guarantees |
485 |
0 |
Dropout as a Structured Shrinkage Prior |
486 |
0 |
Categorical Feature Compression via Submodular Optimization |
487 |
0 |
Exploiting structure of uncertainty for efficient matroid semi-bandits |
488 |
0 |
Non-monotone Submodular Maximization with Nearly Optimal Adaptivity and Query Complexity |
489 |
0 |
Learning and Data Selection in Big Datasets |
490 |
0 |
The Wasserstein Transform |
491 |
0 |
Distributed, Egocentric Representations of Graphs for Detecting Critical Structures |
492 |
0 |
COMIC: Multi-view Clustering Without Parameter Selection |
493 |
0 |
Random Walks on Hypergraphs with Edge-Dependent Vertex Weights |
494 |
0 |
Supervised Hierarchical Clustering with Exponential Linkage |
495 |
0 |
Scale-free adaptive planning for deterministic dynamics & discounted rewards |
496 |
0 |
Learning Distance for Sequences by Learning a Ground Metric |
497 |
0 |
Efficient Training of BERT by Progressively Stacking |
498 |
0 |
Making Decisions that Reduce Discriminatory Impacts |
499 |
0 |
On the Long-term Impact of Algorithmic Decision Policies: Effort Unfairness and Feature Segregation through Social Learning |
500 |
0 |
Kernel Normalized Cut: a Theoretical Revisit |
501 |
0 |
Humor in Word Embeddings: Cockamamie Gobbledegook for Nincompoops |
502 |
0 |
Trainable Decoding of Sets of Sequences for Neural Sequence Models |
503 |
0 |
Spectral Approximate Inference |
504 |
0 |
Empirical Analysis of Beam Search Performance Degradation in Neural Sequence Models |
505 |
0 |
LIT: Learned Intermediate Representation Training for Model Compression |
506 |
0 |
A Better k-means++ Algorithm via Local Search |
507 |
0 |
Anytime Online-to-Batch, Optimism and Acceleration |
508 |
0 |
Improving Neural Language Modeling via Adversarial Training |
509 |
0 |
Fast Algorithm for Generalized Multinomial Models with Ranking Data |
510 |
0 |
Fast and Stable Maximum Likelihood Estimation for Incomplete Multinomial Models |
511 |
0 |
Unreproducible Research is Reproducible |
512 |
0 |
Deep Residual Output Layers for Neural Language Generation |
513 |
0 |
Online Adaptive Principal Component Analysis and Its extensions |
514 |
0 |
Meta-Learning Neural Bloom Filters |
515 |
0 |
Efficient Full-Matrix Adaptive Regularization |
516 |
0 |
Recursive Sketches for Modular Deep Learning |
517 |
0 |
Efficient On-Device Models using Neural Projections |
518 |
0 |
Ladder Capsule Network |
519 |
0 |
Mallows ranking models: maximum likelihood estimate and regeneration |
520 |
0 |
Learning to select for a predefined ranking |
521 |
0 |
Dimensionality Reduction for Tukey Regression |
522 |
0 |
Learning from Delayed Outcomes via Proxies with Applications to Recommender Systems |
523 |
0 |
Demystifying Dropout |
524 |
0 |
Learning to Exploit Long-term Relational Dependencies in Knowledge Graphs |
525 |
0 |
Concrete Autoencoders: Differentiable Feature Selection and Reconstruction |
526 |
0 |
Why do Larger Models Generalize Better? A Theoretical Perspective via the XOR Problem |
527 |
0 |
Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel $k$-means Clustering |
528 |
0 |
DBSCAN++: Towards fast and scalable density clustering |
529 |
0 |
Bandit Multiclass Linear Classification: Efficient Algorithms for the Separable Case |
530 |
0 |
Accelerated Flow for Probability Distributions |
531 |
0 |
Model Function Based Conditional Gradient Method with Armijo-like Line Search |
532 |
0 |
Iterative Linearized Control: Stable Algorithms and Complexity Guarantees |
533 |
0 |
AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss |
534 |
0 |
Adaptive Antithetic Sampling for Variance Reduction |
535 |
0 |
State-Regularized Recurrent Neural Networks |
536 |
0 |
Learning What and Where to Transfer |
537 |
0 |
Adversarial Online Learning with noise |
538 |
0 |
Replica Conditional Sequential Monte Carlo |
539 |
0 |
Gaining Free or Low-Cost Interpretability with Interpretable Partial Substitute |
540 |
0 |
Calibrated Model-Based Deep Reinforcement Learning |
541 |
0 |
Power k-Means Clustering |
542 |
0 |
Hierarchically Structured Meta-learning |
543 |
0 |
Incremental Randomized Sketching for Online Kernel Learning |
544 |
0 |
Exploring interpretable LSTM neural networks over multi-variable data |
545 |
0 |
RaFM: Rank-Aware Factorization Machines |
546 |
0 |
Functional Transparency for Structured Data: a Game-Theoretic Approach |
547 |
0 |
Projections for Approximate Policy Iteration Algorithms |
548 |
0 |
Differentially Private Learning of Geometric Concepts |
549 |
0 |
Online Learning with Sleeping Experts and Feedback Graphs |
550 |
0 |
Multi-objective training of Generative Adversarial Networks with multiple discriminators |
551 |
0 |
Bounding User Contributions: A Bias-Variance Trade-off in Differential Privacy |
552 |
0 |
Model Comparison for Semantic Grouping |
553 |
0 |
Linear-Complexity Data-Parallel Earth Mover’s Distance Approximations |
554 |
0 |
Variational Laplace Autoencoders |
555 |
0 |
Online Convex Optimization in Adversarial Markov Decision Processes |
556 |
0 |
Stochastic Iterative Hard Thresholding for Graph-structured Sparsity Optimization |
557 |
0 |
Doubly Robust Joint Learning for Recommendation on Data Missing Not at Random |
558 |
0 |
On Sparse Linear Regression in the Local Differential Privacy Model |
559 |
0 |
Matrix-Free Preconditioning in Online Learning |
560 |
0 |
Data Poisoning Attacks on Stochastic Bandits |
561 |
0 |
Learning Neurosymbolic Generative Models via Program Synthesis |
562 |
0 |
Kernel-Based Reinforcement Learning in Robust Markov Decision Processes |
563 |
0 |
Probability Functional Descent: A Unifying Perspective on GANs, Variational Inference, and Reinforcement Learning |
564 |
0 |
Differential Inclusions for Modeling Nonsmooth ADMM Variants: A Continuous Limit Theory |
565 |
0 |
Stochastic Blockmodels meet Graph Neural Networks |
566 |
0 |
Quantile Stein Variational Gradient Descent for Batch Bayesian Optimization |
567 |
0 |
A Recurrent Neural Cascade-based Model for Continuous-Time Diffusion |
568 |
0 |
Exploration Conscious Reinforcement Learning Revisited |
569 |
0 |
The Kernel Interaction Trick: Fast Bayesian Discovery of Pairwise Interactions in High Dimensions |
570 |
0 |
Interpreting Adversarially Trained Convolutional Neural Networks |
571 |
0 |
Deep Generative Learning via Variational Gradient Flow |
572 |
0 |
Breaking Inter-Layer Co-Adaptation by Classifier Anonymization |
573 |
0 |
Bayesian Optimization of Composite Functions |
574 |
0 |
First-Order Algorithms Converge Faster than $O(1/k)$ on Convex Problems |
575 |
0 |
Sparse Multi-Channel Variational Autoencoder for the Joint Analysis of Heterogeneous Data |
576 |
0 |
Open Vocabulary Learning on Source Code with a Graph-Structured Cache |
577 |
0 |
Toward Understanding the Importance of Noise in Training Neural Networks |
578 |
0 |
Invariant-Equivariant Representation Learning for Multi-Class Data |
579 |
0 |
Active Learning with Disagreement Graphs |
580 |
0 |
Scalable Nonparametric Sampling from Multimodal Posteriors with the Posterior Bootstrap |
581 |
0 |
Learning to Route in Similarity Graphs |
582 |
0 |
Active Learning for Probabilistic Structured Prediction of Cuts and Matchings |
583 |
0 |
The Variational Predictive Natural Gradient |
584 |
0 |
Deep Compressed Sensing |
585 |
0 |
Minimal Achievable Sufficient Statistic Learning |
586 |
0 |
Bayesian Generative Active Deep Learning |
587 |
0 |
Hierarchical Decompositional Mixtures of Variational Autoencoders |
588 |
0 |
Efficient learning of smooth probability functions from Bernoulli tests with guarantees |
589 |
0 |
Myopic Posterior Sampling for Adaptive Goal Oriented Design of Experiments |
590 |
0 |
Discriminative Regularization for Latent Variable Models with Applications to Electrocardiography |
591 |
0 |
Understanding and Accelerating Particle-Based Variational Inference |
592 |
0 |
Connectivity-Optimized Representation Learning via Persistent Homology |
593 |
0 |
Nonlinear Stein Variational Gradient Descent for Learning Diversified Mixture Models |
594 |
0 |
Dead-ends and Secure Exploration in Reinforcement Learning |
595 |
0 |
Predicate Exchange: Inference with Declarative Knowledge |
596 |
0 |
Fast Direct Search in an Optimally Compressed Continuous Target Space for Efficient Multi-Label Active Learning |
597 |
0 |
Adversarially Learned Representations for Information Obfuscation and Inference |
598 |
0 |
Active Embedding Search via Noisy Paired Comparisons |
599 |
0 |
A Tree-Based Method for Fast Repeated Sampling of Determinantal Point Processes |
600 |
0 |
Hiring Under Uncertainty |
601 |
0 |
On Medians of (Randomized) Pairwise Means |
602 |
0 |
Towards Accurate Model Selection in Deep Unsupervised Domain Adaptation |
603 |
0 |
Overcoming Multi-model Forgetting |
604 |
0 |
Generative Modeling of Infinite Occluded Objects for Compositional Scene Representation |
605 |
0 |
Phase transition in PCA with missing data: Reduced signal-to-noise ratio, not sample size! |
606 |
0 |
More Efficient Off-Policy Evaluation through Regularized Targeted Learning |
607 |
0 |
A Baseline for Any Order Gradient Estimation in Stochastic Computation Graphs |
608 |
0 |
Scalable Training of Inference Networks for Gaussian-Process Models |
609 |
0 |
Submodular Cost Submodular Cover with an Approximate Oracle |
610 |
0 |
Riemannian adaptive stochastic gradient algorithms on matrix manifolds |
611 |
0 |
Distributional Multivariate Policy Evaluation and Exploration with the Bellman GAN |
612 |
0 |
Training CNNs with Selective Allocation of Channels |
613 |
0 |
Neural Inverse Knitting: From Images to Manufacturing Instructions |
614 |
0 |
Discovering Latent Covariance Structures for Multiple Time Series |
615 |
0 |
Transferability vs. Discriminability: Batch Spectral Penalization for Adversarial Domain Adaptation |
616 |
0 |
Transferable Adversarial Training: A General Approach to Adapting Deep Classifiers |
617 |
0 |
Adjustment Criteria for Generalizing Experimental Findings |
618 |
0 |
Kernel Mean Matching for Content Addressability of GANs |
619 |
0 |
Incorporating Grouping Information into Bayesian Decision Tree Ensembles |
620 |
0 |
Towards Understanding Knowledge Distillation |
621 |
0 |
New results on information theoretic clustering |
622 |
0 |
Anomaly Detection With Multiple-Hypotheses Predictions |
623 |
0 |
Trajectory-Based Off-Policy Deep Reinforcement Learning |
624 |
0 |
LegoNet: Efficient Convolutional Neural Networks with Lego Filters |
625 |
0 |
Lossless or Quantized Boosting with Integer Arithmetic |
626 |
0 |
Variational Russian Roulette for Deep Bayesian Nonparametrics |
627 |
0 |
Approximating Orthogonal Matrices with Effective Givens Factorization |
628 |
0 |
Random Function Priors for Correlation Modeling |
629 |
0 |
Learning Classifiers for Target Domain with Limited or No Labels |
630 |
0 |
On the Computation and Communication Complexity of Parallel SGD with Dynamic Batch Sizes for Stochastic Non-Convex Optimization |
631 |
0 |
Causal Discovery and Forecasting in Nonstationary Environments with State-Space Models |
632 |
0 |
Composing Value Functions in Reinforcement Learning |
633 |
0 |
DP-GP-LVM: A Bayesian Non-Parametric Model for Learning Multivariate Dependency Structures |
634 |
0 |
Distributed Weighted Matching via Randomized Composable Coresets |
635 |
0 |
Causal Identification under Markov Equivalence: Completeness Results |
636 |
0 |
Context-Aware Zero-Shot Learning for Object Recognition |
637 |
0 |
Dynamic Learning with Frequent New Product Launches: A Sequential Multinomial Logit Bandit Problem |
638 |
0 |
DeepNose: Using artificial neural networks to represent the space of odorants |
639 |
0 |
Data Poisoning Attacks in Multi-Party Learning |
640 |
0 |
Screening rules for Lasso with non-convex Sparse Regularizers |
641 |
0 |
Concentration Inequalities for Conditional Value at Risk |
642 |
0 |
Characterizing Well-Behaved vs. Pathological Deep Neural Networks |
643 |
0 |
Dynamic Measurement Scheduling for Event Forecasting using Deep RL |
644 |
0 |
Taming MAML: Efficient unbiased meta-reinforcement learning |
645 |
0 |
Online Learning to Rank with Features |
646 |
0 |
A Wrapped Normal Distribution on Hyperbolic Space for Gradient-Based Learning |
647 |
0 |
Compressed Factorization: Fast and Accurate Low-Rank Factorization of Compressively-Sensed Data |
648 |
0 |
SELFIE: Refurbishing Unclean Samples for Robust Deep Learning |
649 |
0 |
Learning Novel Policies For Tasks |
650 |
0 |
End-to-End Probabilistic Inference for Nonstationary Audio Analysis |
651 |
0 |
Trimming the $\ell_1$ Regularizer: Statistical Analysis, Optimization, and Applications to Deep Learning |
652 |
0 |
Disentangling Disentanglement in Variational Autoencoders |
653 |
0 |
Dimension-Wise Importance Sampling Weight Clipping for Sample-Efficient Reinforcement Learning |
654 |
0 |
Cognitive model priors for predicting human decisions |
655 |
0 |
Overcoming Mean-Field Approximations in Recurrent Gaussian Process Models |
656 |
0 |
A Gradual, Semi-Discrete Approach to Generative Network Training via Explicit Wasserstein Minimization |
657 |
0 |
Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging |
658 |
0 |
Fast and Flexible Inference of Joint Distributions from their Marginals |
659 |
0 |
Collective Model Fusion for Multiple Black-Box Experts |
660 |
0 |
Correlated bandits or: How to minimize mean-squared error online |
661 |
0 |
On discriminative learning of prediction uncertainty |
662 |
0 |
A Multitask Multiple Kernel Learning Algorithm for Survival Analysis with Application to Cancer Biology |
663 |
0 |
Asynchronous Batch Bayesian Optimisation with Improved Local Penalisation |
664 |
0 |
ME-Net: Towards Effective Adversarial Robustness with Matrix Estimation |
665 |
0 |
Learning with Bad Training Data via Iterative Trimmed Loss Minimization |
666 |
0 |
Target Tracking for Contextual Bandits: Application to Demand Side Management |
667 |
0 |
Efficient Amortised Bayesian Inference for Hierarchical and Nonlinear Dynamical Systems |
668 |
0 |
Graph Convolutional Gaussian Processes |
669 |
0 |
Exploiting Worker Correlation for Label Aggregation in Crowdsourcing |
670 |
0 |
Self-similar Epochs: Value in arrangement |
671 |
0 |
HyperGAN: A Generative Model for Diverse, Performant Neural Networks |
672 |
0 |
A Personalized Affective Memory Model for Improving Emotion Recognition |
673 |
0 |
Memory-Optimal Direct Convolutions for Maximizing Classification Accuracy in Embedded Applications |
674 |
0 |
Poission Subsampled R\’enyi Differential Privacy |
675 |
0 |
Jumpout : Improved Dropout for Deep Neural Networks with ReLUs |
676 |
0 |
Geometry Aware Convolutional Filters for Omnidirectional Images Representation |
677 |
0 |
A Framework for Bayesian Optimization in Embedded Subspaces |
678 |
0 |
Area Attention |
679 |
0 |
The Implicit Fairness Criterion of Unconstrained Learning |
680 |
0 |
Co-Representation Network for Generalized Zero-Shot Learning |
681 |
0 |
Sublinear Space Private Algorithms Under the Sliding Window Model |
682 |
0 |
Optimality Implies Kernel Sum Classifiers are Statistically Efficient |
683 |
0 |
Conditional Gradient Methods via Stochastic Path-Integrated Differential Estimator |
684 |
0 |
Shallow-Deep Networks: Understanding and Mitigating Network Overthinking |
685 |
0 |
Neurally-Guided Structure Inference |
686 |
0 |
An Optimal Private Stochastic-MAB Algorithm based on Optimal Private Stopping Rule |
687 |
0 |
A Quantitative Analysis of the Effect of Batch Normalization on Gradient Descent |
688 |
0 |
Active Manifolds: A non-linear analogue to Active Subspaces |
689 |
0 |
Bayesian Counterfactual Risk Minimization |
690 |
0 |
Compressing Gradient Optimizers via Count-Sketches |
691 |
0 |
Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks |
692 |
0 |
White-box vs Black-box: Bayes Optimal Strategies for Membership Inference |
693 |
0 |
Leveraging Low-Rank Relations Between Surrogate Tasks in Structured Prediction |
694 |
0 |
Decomposing feature-level variation with Covariate Gaussian Process Latent Variable Models |
695 |
0 |
Sublinear Time Nearest Neighbor Search over Generalized Weighted Space |
696 |
0 |
Bayesian leave-one-out cross-validation for large data |
697 |
0 |
Formal Privacy for Functional Data with Gaussian Perturbations |
698 |
0 |
Separable value functions across time-scales |
699 |
0 |
Dirichlet Simplex Nest and Geometric Inference |
700 |
0 |
Scalable Learning in Reproducing Kernel Krein Spaces |
701 |
0 |
Heterogeneous Model Reuse via Optimizing Multiparty Multiclass Margin |
702 |
0 |
HexaGAN: Generative Adversarial Nets for Real World Classification |
703 |
0 |
Recurrent Kalman Networks: Factorized Inference in High-Dimensional Deep Feature Spaces |
704 |
0 |
On Dropout and Nuclear Norm Regularization |
705 |
0 |
Phaseless PCA: Low-Rank Matrix Recovery from Column-wise Phaseless Measurements |
706 |
0 |
Understanding and Controlling Memory in Recurrent Neural Networks |
707 |
0 |
kernelPSI: a Post-Selection Inference Framework for Nonlinear Variable Selection |
708 |
0 |
Improved Dynamic Graph Learning through Fault-Tolerant Sparsification |
709 |
0 |
Non-Parametric Priors For Generative Adversarial Networks |
710 |
0 |
Regularization in directable environments with application to Tetris |
711 |
0 |
Imputing Missing Events in Continuous-Time Event Streams |
712 |
0 |
Learning to Convolve: A Generalized Weight-Tying Approach |
713 |
0 |
Large-Scale Sparse Kernel Canonical Correlation Analysis |
714 |
0 |
Curvature-Exploiting Acceleration of Elastic Net Computations |
715 |
0 |
Doubly-Competitive Distribution Estimation |
716 |
0 |
AUCµ: A Performance Metric for Multi-Class Machine Learning Models |
717 |
0 |
Neural Joint Source-Channel Coding |
718 |
0 |
Flat Metric Minimization with Applications in Generative Modeling |
719 |
0 |
Weakly-Supervised Temporal Localization via Occurrence Count Learning |
720 |
0 |
Rehashing Kernel Evaluation in High Dimensions |
721 |
0 |
Learning to Collaborate in Markov Decision Processes |
722 |
0 |
Dual Entangled Polynomial Code: Three-Dimensional Coding for Distributed Matrix Multiplication |
723 |
0 |
A Persistent Weisfeiler–Lehman Procedure for Graph Classification |
724 |
0 |
Neural Logic Reinforcement Learning |
725 |
0 |
Revisiting precision recall definition for generative modeling |
726 |
0 |
Acceleration of SVRG and Katyusha X by Inexact Preconditioning |
727 |
0 |
Look Ma, No Latent Variables: Accurate Cutset Networks via Compilation |
728 |
0 |
Bayesian Deconditional Kernel Mean Embeddings |
729 |
0 |
Optimistic Policy Optimization via Multiple Importance Sampling |
730 |
0 |
Multivariate-Information Adversarial Ensemble for Scalable Joint Distribution Matching |
731 |
0 |
Learning Hawkes Processes Under Synchronization Noise |
732 |
0 |
Automatic Classifiers as Scientific Instruments: One Step Further Away from Ground-Truth |
733 |
0 |
Blended Conditonal Gradients |
734 |
0 |
Boosted Density Estimation Remastered |
735 |
0 |
Distributional Reinforcement Learning for Efficient Exploration |
736 |
0 |
Generalized Approximate Survey Propagation for High-Dimensional Estimation |
737 |
0 |
Projection onto Minkowski Sums with Application to Constrained Learning |
738 |
0 |
Revisiting the Softmax Bellman Operator: New Benefits and New Perspective |
739 |
0 |
Voronoi Boundary Classification: A High-Dimensional Geometric Approach via Weighted Monte Carlo Integration |
740 |
0 |
PROVEN: Verifying Robustness of Neural Networks with a Probabilistic Approach |
741 |
0 |
Uniform Convergence Rate of the Kernel Density Estimator Adaptive to Intrinsic Volume Dimension |
742 |
0 |
Circuit-GNN: Graph Neural Networks for Distributed Circuit Design |
743 |
0 |
Particle Flow Bayes’ Rule |
744 |
0 |
Multiplicative Weights Updates as a distributed constrained optimization algorithm: Convergence to second-order stationary points almost always |
745 |
0 |
Generalized No Free Lunch Theorem for Adversarial Robustness |
746 |
0 |
Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation |
747 |
0 |
Fast and Simple Natural-Gradient Variational Inference with Mixture of Exponential-family Approximations |
748 |
0 |
Shape Constraints for Set Functions |
749 |
0 |
Optimal Continuous DR-Submodular Maximization and Applications to Provable Mean Field Inference |
750 |
0 |
Sparse Extreme Multi-label Learning with Oracle Property |
751 |
0 |
Stein Point Markov Chain Monte Carlo |
752 |
0 |
Nearest Neighbor and Kernel Survival Analysis: Nonasymptotic Error Bounds and Strong Consistency Rates |
753 |
0 |
Graph Neural Network for Music Score Data and Modeling Expressive Piano Performance |
754 |
0 |
Policy Consolidation for Continual Reinforcement Learning |
755 |
0 |
POPQORN: Quantifying Robustness of Recurrent Neural Networks |
756 |
0 |
Multi-Agent Adversarial Inverse Reinforcement Learning |
757 |
0 |
Amortized Monte Carlo Integration |
758 |
0 |
LR-GLM: High-Dimensional Bayesian Inference Using Low-Rank Data Approximations |
759 |
0 |
PAC Learnability of Node Functions in Networked Dynamical Systems |
760 |
0 |
TibGM: A Transferable and Information-Based Graphical Model Approach for Reinforcement Learning |
761 |
0 |
Adversarial camera stickers: A physical camera-based attack on deep learning systems |
762 |
0 |
Composing Entropic Policies using Divergence Correction |
763 |
0 |
TapNet: Neural Network Augmented with Task-Adaptive Projection for Few-Shot Learning |
764 |
0 |
Improving Model Selection by Employing the Test Data |
765 |
0 |
Understanding MCMC Dynamics as Flows on the Wasserstein Space |
766 |
0 |
On Certifying Non-Uniform Bounds against Adversarial Attacks |
767 |
0 |
Moment-Based Variational Inference for Markov Jump Processes |
768 |
0 |
Calibrated Approximate Bayesian Inference |
769 |
0 |
Feature Grouping as a Stochastic Regularizer for High-Dimensional Structured Data |
770 |
0 |
Game Theoretic Optimization via Gradient-based Nikaido-Isoda Function |
771 |
0 |
Refined Complexity of PCA with Outliers |
772 |
0 |
Regret Circuits: Composability of Regret Minimizers |