Search Results for author: Ohad Shamir

Found 89 papers, 8 papers with code

Depth Separation in Norm-Bounded Infinite-Width Neural Networks

no code implementations • 13 Feb 2024 • Suzanna Parkinson, Greg Ongie, Rebecca Willett, Ohad Shamir, Nathan Srebro

We also show that a similar statement in the reverse direction is not possible: any function learnable with polynomial sample complexity by a norm-controlled depth-2 ReLU network with infinite width is also learnable with polynomial sample complexity by a norm-controlled depth-3 ReLU network.

Paper
Add Code

Generalization in Kernel Regression Under Realistic Assumptions

no code implementations • 26 Dec 2023 • Daniel Barzilai, Ohad Shamir

It is by now well-established that modern over-parameterized models seem to elude the bias-variance tradeoff and generalize well despite overfitting noise.

regression

Paper
Add Code

An Algorithm with Optimal Dimension-Dependence for Zero-Order Nonsmooth Nonconvex Stochastic Optimization

no code implementations • 10 Jul 2023 • Guy Kornowski, Ohad Shamir

Recent works proposed several stochastic zero-order algorithms that solve this task, all of which suffer from a dimension-dependence of $\Omega(d^{3/2})$ where $d$ is the dimension of the problem, which was conjectured to be optimal.

LEMMA Stochastic Optimization

Paper
Add Code

From Tempered to Benign Overfitting in ReLU Neural Networks

no code implementations • NeurIPS 2023 • Guy Kornowski, Gilad Yehudai, Ohad Shamir

Thus, we show that the input dimension has a crucial role on the type of overfitting in this setting, which we also validate empirically for intermediate dimensions.

Paper
Add Code

Deterministic Nonsmooth Nonconvex Optimization

no code implementations • 16 Feb 2023 • Michael I. Jordan, Guy Kornowski, Tianyi Lin, Ohad Shamir, Manolis Zampetakis

In particular, we prove a lower bound of $\Omega(d)$ for any deterministic algorithm.

Open-Ended Question Answering

Paper
Add Code

On the Complexity of Finding Small Subgradients in Nonsmooth Optimization

no code implementations • 21 Sep 2022 • Guy Kornowski, Ohad Shamir

We study the oracle complexity of producing $(\delta,\epsilon)$-stationary points of Lipschitz functions, in the sense proposed by Zhang et al. [2020].

Paper
Add Code

Reconstructing Training Data from Trained Neural Networks

1 code implementation • 15 Jun 2022 • Niv Haim, Gal Vardi, Gilad Yehudai, Ohad Shamir, Michal Irani

We propose a novel reconstruction scheme that stems from recent theoretical results about the implicit bias in training neural networks with gradient-based methods.

Paper
Code

The Sample Complexity of One-Hidden-Layer Neural Networks

no code implementations • 13 Feb 2022 • Gal Vardi, Ohad Shamir, Nathan Srebro

We study norm-based uniform convergence bounds for neural networks, aiming at a tight understanding of how these are affected by the architecture and type of norm constraint, for the simple class of scalar-valued one-hidden-layer networks, and inputs bounded in Euclidean norm.

Paper
Add Code

Gradient Methods Provably Converge to Non-Robust Networks

no code implementations • 9 Feb 2022 • Gal Vardi, Gilad Yehudai, Ohad Shamir

Despite a great deal of research, it is still unclear why neural networks are so susceptible to adversarial examples.

Paper
Add Code

Width is Less Important than Depth in ReLU Neural Networks

no code implementations • 8 Feb 2022 • Gal Vardi, Gilad Yehudai, Ohad Shamir

We solve an open question from Lu et al. (2017), by showing that any target network with inputs in $\mathbb{R}^d$ can be approximated by a width $O(d)$ network (independent of the target network's architecture), whose number of parameters is essentially larger only by a linear factor.

Open-Ended Question Answering

Paper
Add Code

Implicit Regularization Towards Rank Minimization in ReLU Networks

no code implementations • 30 Jan 2022 • Nadav Timor, Gal Vardi, Ohad Shamir

We study the conjectured relationship between the implicit regularization in neural networks, trained with gradient-based methods, and rank minimization of their weight matrices.

Paper
Add Code

The Implicit Bias of Benign Overfitting

no code implementations • 27 Jan 2022 • Ohad Shamir

In this paper, we provide several new results on when one can or cannot expect benign overfitting to occur, for both regression and classification tasks.

regression

Paper
Add Code

Convergence Results For Q-Learning With Experience Replay

no code implementations • 8 Dec 2021 • Liran Szlak, Ohad Shamir

A commonly used heuristic in RL is experience replay (e. g.~\citet{lin1993reinforcement, mnih2015human}), in which a learner stores and re-uses past trajectories as if they were sampled online.

Q-Learning

Paper
Add Code

Replay For Safety

no code implementations • 8 Dec 2021 • Liran Szlak, Ohad Shamir

Experience replay \citep{lin1993reinforcement, mnih2015human} is a widely used technique to achieve efficient use of data and improved performance in RL algorithms.

Q-Learning

Paper
Add Code

On the Optimal Memorization Power of ReLU Neural Networks

no code implementations • ICLR 2022 • Gal Vardi, Gilad Yehudai, Ohad Shamir

We prove that having such a large bit complexity is both necessary and sufficient for memorization with a sub-linear number of parameters.

Memorization

Paper
Add Code

A Stochastic Newton Algorithm for Distributed Convex Optimization

no code implementations • NeurIPS 2021 • Brian Bullins, Kumar Kshitij Patel, Ohad Shamir, Nathan Srebro, Blake Woodworth

We propose and analyze a stochastic Newton algorithm for homogeneous distributed stochastic convex optimization, where each machine can calculate stochastic gradients of the same population objective, as well as stochastic Hessian-vector products (products of an independent unbiased estimator of the Hessian of the population objective with arbitrary vectors), with many such stochastic computations performed between rounds of communication.

regression

Paper
Add Code

On Margin Maximization in Linear and ReLU Networks

no code implementations • 6 Oct 2021 • Gal Vardi, Ohad Shamir, Nathan Srebro

The implicit bias of neural networks has been extensively studied in recent years.

Paper
Add Code

Random Shuffling Beats SGD Only After Many Epochs on Ill-Conditioned Problems

1 code implementation • NeurIPS 2021 • Itay Safran, Ohad Shamir

Perhaps surprisingly, we prove that when the condition number is taken into account, without-replacement SGD \emph{does not} significantly improve on with-replacement SGD in terms of worst-case bounds, unless the number of epochs (passes over the data) is larger than the condition number.

Paper
Code

Learning a Single Neuron with Bias Using Gradient Descent

no code implementations • NeurIPS 2021 • Gal Vardi, Gilad Yehudai, Ohad Shamir

We theoretically study the fundamental problem of learning a single neuron with a bias term ($\mathbf{x} \mapsto \sigma(<\mathbf{w},\mathbf{x}> + b)$) in the realizable setting with the ReLU activation, using gradient descent.

Paper
Add Code

Oracle Complexity in Nonsmooth Nonconvex Optimization

no code implementations • NeurIPS 2021 • Guy Kornowski, Ohad Shamir

For this approach, we prove under a mild assumption an inherent trade-off between oracle complexity and smoothness: On the one hand, smoothing a nonsmooth nonconvex function can be done very efficiently (e. g., by randomized smoothing), but with dimension-dependent factors in the smoothness parameter, which can strongly affect iteration complexity when plugging into standard smooth optimization methods.

Paper
Add Code

The Min-Max Complexity of Distributed Stochastic Convex Optimization with Intermittent Communication

no code implementations • 2 Feb 2021 • Blake Woodworth, Brian Bullins, Ohad Shamir, Nathan Srebro

We resolve the min-max complexity of distributed stochastic convex optimization (up to a log factor) in the intermittent communication setting, where $M$ machines work in parallel over the course of $R$ rounds of communication to optimize the objective, and during each round of communication, each machine may sequentially compute $K$ stochastic gradient estimates.

Paper
Add Code

The Connection Between Approximation, Depth Separation and Learnability in Neural Networks

no code implementations • 31 Jan 2021 • Eran Malach, Gilad Yehudai, Shai Shalev-Shwartz, Ohad Shamir

On the other hand, the fact that deep networks can efficiently express a target function does not mean that this target function can be learned efficiently by deep neural networks.

Paper
Add Code

Size and Depth Separation in Approximating Benign Functions with Neural Networks

no code implementations • 30 Jan 2021 • Gal Vardi, Daniel Reichman, Toniann Pitassi, Ohad Shamir

We show a complexity-theoretic barrier to proving such results beyond size $O(d\log^2(d))$, but also show an explicit benign function, that can be approximated with networks of size $O(d)$ and not with networks of size $o(d/\log d)$.

Paper
Add Code

Implicit Regularization in ReLU Networks with the Square Loss

1 code implementation • 9 Dec 2020 • Gal Vardi, Ohad Shamir

For one hidden-layer networks, we prove a similar result, where in general it is impossible to characterize implicit regularization properties in this manner, except for the "balancedness" property identified in Du et al. [2018].

Paper
Code

High-Order Oracle Complexity of Smooth and Strongly Convex Optimization

no code implementations • 13 Oct 2020 • Guy Kornowski, Ohad Shamir

In this note, we consider the complexity of optimizing a highly smooth (Lipschitz $k$-th order derivative) and strongly convex function, via calls to a $k$-th order oracle which returns the value and first $k$ derivatives of the function at a given point, and where the dimension is unrestricted.

Vocal Bursts Intensity Prediction

Paper
Add Code

Gradient Methods Never Overfit On Separable Data

no code implementations • 30 Jun 2020 • Ohad Shamir

A line of recent works established that when training linear predictors over separable data, using gradient methods and exponentially-tailed losses, the predictors asymptotically converge in direction to the max-margin predictor.

Paper
Add Code

The Effects of Mild Over-parameterization on the Optimization Landscape of Shallow ReLU Neural Networks

1 code implementation • 1 Jun 2020 • Itay Safran, Gilad Yehudai, Ohad Shamir

We prove that while the objective is strongly convex around the global minima when the teacher and student networks possess the same number of neurons, it is not even \emph{locally convex} after any amount of over-parameterization.

Paper
Code

Neural Networks with Small Weights and Depth-Separation Barriers

no code implementations • NeurIPS 2020 • Gal Vardi, Ohad Shamir

To show this, we study a seemingly unrelated problem of independent interest: Namely, whether there are polynomially-bounded functions which require super-polynomial weights in order to approximate with constant-depth neural networks.

Open-Ended Question Answering

Paper
Add Code

Can We Find Near-Approximately-Stationary Points of Nonsmooth Nonconvex Functions?

no code implementations • 27 Feb 2020 • Ohad Shamir

It is well-known that given a bounded, smooth nonconvex function, standard gradient-based methods can find $\epsilon$-stationary points (where the gradient norm is less than $\epsilon$) in $\mathcal{O}(1/\epsilon^2)$ iterations.

Paper
Add Code

Is Local SGD Better than Minibatch SGD?

no code implementations • ICML 2020 • Blake Woodworth, Kumar Kshitij Patel, Sebastian U. Stich, Zhen Dai, Brian Bullins, H. Brendan McMahan, Ohad Shamir, Nathan Srebro

We study local SGD (also known as parallel SGD and federated averaging), a natural and frequently used stochastic distributed optimization method.

Distributed Optimization

Paper
Add Code

Proving the Lottery Ticket Hypothesis: Pruning is All You Need

no code implementations • ICML 2020 • Eran Malach, Gilad Yehudai, Shai Shalev-Shwartz, Ohad Shamir

The lottery ticket hypothesis (Frankle and Carbin, 2018), states that a randomly-initialized network contains a small subnetwork such that, when trained in isolation, can compete with the performance of the original network.

Paper
Add Code

Learning a Single Neuron with Gradient Methods

no code implementations • 15 Jan 2020 • Gilad Yehudai, Ohad Shamir

We consider the fundamental problem of learning a single neuron $x \mapsto\sigma(w^\top x)$ using standard gradient methods.

Paper
Add Code

The Complexity of Finding Stationary Points with Stochastic Gradient Descent

no code implementations • ICML 2020 • Yoel Drori, Ohad Shamir

We study the iteration complexity of stochastic gradient descent (SGD) for minimizing the gradient norm of smooth, possibly nonconvex functions.

Paper
Add Code

How Good is SGD with Random Shuffling?

no code implementations • 31 Jul 2019 • Itay Safran, Ohad Shamir

In contrast to the majority of existing theoretical works, which assume that individual functions are sampled with replacement, we focus here on popular but poorly-understood heuristics, which involve going over random permutations of the individual functions.

Paper
Add Code

Depth Separations in Neural Networks: What is Actually Being Separated?

no code implementations • 15 Apr 2019 • Itay Safran, Ronen Eldan, Ohad Shamir

Existing depth separation results for constant-depth networks essentially show that certain radial functions in $\mathbb{R}^d$, which can be easily approximated with depth $3$ networks, cannot be approximated by depth $2$ networks, even up to constant accuracy, unless their size is exponential in $d$.

Paper
Add Code

On the Power and Limitations of Random Features for Understanding Neural Networks

no code implementations • NeurIPS 2019 • Gilad Yehudai, Ohad Shamir

Recently, a spate of papers have provided positive theoretical results for training over-parameterized neural networks (where the network size is larger than what is needed to achieve low error).

Paper
Add Code

The Complexity of Making the Gradient Small in Stochastic Convex Optimization

no code implementations • 13 Feb 2019 • Dylan J. Foster, Ayush Sekhari, Ohad Shamir, Nathan Srebro, Karthik Sridharan, Blake Woodworth

Notably, we show that in the global oracle/statistical learning model, only logarithmic dependence on smoothness is required to find a near-stationary point, whereas polynomial dependence on smoothness is necessary in the local stochastic oracle model.

Stochastic Optimization

Paper
Add Code

Space lower bounds for linear prediction in the streaming model

no code implementations • 9 Feb 2019 • Yuval Dagan, Gil Kur, Ohad Shamir

We show that fundamental learning tasks, such as finding an approximate linear separator or linear regression, require memory at least \emph{quadratic} in the dimension, in a natural streaming setting.

regression

Paper
Add Code

Global Non-convex Optimization with Discretized Diffusions

no code implementations • NeurIPS 2018 • Murat A. Erdogdu, Lester Mackey, Ohad Shamir

An Euler discretization of the Langevin diffusion is known to converge to the global minimizers of certain convex and non-convex optimization problems.

Paper
Add Code

Exponential Convergence Time of Gradient Descent for One-Dimensional Deep Linear Neural Networks

no code implementations • 23 Sep 2018 • Ohad Shamir

We study the dynamics of gradient descent on objective functions of the form $f(\prod_{i=1}^{k} w_i)$ (with respect to scalar parameters $w_1,\ldots, w_k$), which arise in the context of training depth-$k$ linear neural networks.

Paper
Add Code

A Tight Convergence Analysis for Stochastic Gradient Descent with Delayed Updates

no code implementations • 26 Jun 2018 • Yossi Arjevani, Ohad Shamir, Nathan Srebro

We provide tight finite-time convergence bounds for gradient descent and stochastic gradient descent on quadratic functions, when the gradients are delayed and reflect iterates from $\tau$ rounds ago.

Distributed Optimization

Paper
Add Code

Are ResNets Provably Better than Linear Predictors?

no code implementations • NeurIPS 2018 • Ohad Shamir

In this paper, we rigorously prove that arbitrarily deep, nonlinear residual units indeed exhibit this behavior, in the sense that the optimization landscape contains no local minima with value above what can be obtained with a linear predictor (namely a 1-layer network).

Paper
Add Code

Detecting Correlations with Little Memory and Communication

no code implementations • 4 Mar 2018 • Yuval Dagan, Ohad Shamir

We study the problem of identifying correlations in multivariate data, under information constraints: Either on the amount of memory that can be used by the algorithm, or the amount of communication when the data is distributed across several machines.

Paper
Add Code

Spurious Local Minima are Common in Two-Layer ReLU Neural Networks

1 code implementation • ICML 2018 • Itay Safran, Ohad Shamir

We consider the optimization problem associated with training simple ReLU neural networks of the form $\mathbf{x}\mapsto \sum_{i=1}^{k}\max\{0,\mathbf{w}_i^\top \mathbf{x}\}$ with respect to the squared loss.

Vocal Bursts Valence Prediction

Paper
Code

Size-Independent Sample Complexity of Neural Networks

no code implementations • 18 Dec 2017 • Noah Golowich, Alexander Rakhlin, Ohad Shamir

We study the sample complexity of learning neural networks, by providing new bounds on their Rademacher complexity assuming norm constraints on the parameter matrix of each layer.

Paper
Add Code

Weight Sharing is Crucial to Succesful Optimization

no code implementations • 2 Jun 2017 • Shai Shalev-Shwartz, Ohad Shamir, Shaked Shammah

Exploiting the great expressive power of Deep Neural Network architectures, relies on the ability to train them.

Paper
Add Code

Bandit Regret Scaling with the Effective Loss Range

no code implementations • 15 May 2017 • Nicolò Cesa-Bianchi, Ohad Shamir

We study how the regret guarantees of nonstochastic multi-armed bandits can be improved, if the effective range of the losses in each round is small (e. g. the maximal difference between two losses in a given round).

Multi-Armed Bandits

Paper
Add Code

Failures of Gradient-Based Deep Learning

1 code implementation • ICML 2017 • Shai Shalev-Shwartz, Ohad Shamir, Shaked Shammah

In recent years, Deep Learning has become the go-to solution for a broad range of applications, often outperforming state-of-the-art.

Paper
Code

Online Learning with Local Permutations and Delayed Feedback

no code implementations • ICML 2017 • Ohad Shamir, Liran Szlak

In this paper, we consider the applicability of this setting to convex online learning with delayed feedback, in which the feedback on the prediction made in round $t$ arrives with some delay $\tau$.

Paper
Add Code

Communication-efficient Algorithms for Distributed Stochastic Principal Component Analysis

no code implementations • ICML 2017 • Dan Garber, Ohad Shamir, Nathan Srebro

We study algorithms for estimating the leading principal component of the population covariance matrix that are both communication-efficient and achieve estimation error of the order of the centralized ERM solution that uses all $mn$ samples.

Paper
Add Code

Without-Replacement Sampling for Stochastic Gradient Methods

no code implementations • NeurIPS 2016 • Ohad Shamir

Stochastic gradient methods for machine learning and optimization problems are usually analyzed assuming data points are sampled *with* replacement.

Distributed Optimization Learning Theory +1

Paper
Add Code

Oracle Complexity of Second-Order Methods for Finite-Sum Problems

no code implementations • ICML 2017 • Yossi Arjevani, Ohad Shamir

Finite-sum optimization problems are ubiquitous in machine learning, and are commonly solved using first-order methods which rely on gradient computations.

Second-order methods

Paper
Add Code

Depth-Width Tradeoffs in Approximating Natural Functions with Neural Networks

no code implementations • ICML 2017 • Itay Safran, Ohad Shamir

We provide several new depth-based separation results for feed-forward neural networks, proving that various types of simple and natural functions can be better approximated using deeper networks than shallower ones, even if the shallower networks are much larger.

Paper
Add Code

Distribution-Specific Hardness of Learning Neural Networks

no code implementations • 5 Sep 2016 • Ohad Shamir

Although neural networks are routinely and successfully trained in practice using simple gradient-based methods, most existing theoretical results are negative, showing that learning such networks is difficult, in a worst-case sense over all data distributions.

Paper
Add Code

Dimension-Free Iteration Complexity of Finite Sum Optimization Problems

no code implementations • NeurIPS 2016 • Yossi Arjevani, Ohad Shamir

Many canonical machine learning problems boil down to a convex optimization problem with a finite sum structure.

Paper
Add Code

On the Iteration Complexity of Oblivious First-Order Optimization Algorithms

no code implementations • 11 May 2016 • Yossi Arjevani, Ohad Shamir

We consider a broad class of first-order optimization algorithms which are \emph{oblivious}, in the sense that their step sizes are scheduled regardless of the function under consideration, except for limited side-information such as smoothness or strong convexity parameters.

Paper
Add Code

Without-Replacement Sampling for Stochastic Gradient Methods: Convergence Results and Application to Distributed Optimization

no code implementations • NeurIPS 2016 • Ohad Shamir

Stochastic gradient methods for machine learning and optimization problems are usually analyzed assuming data points are sampled \emph{with} replacement.

Distributed Optimization Learning Theory +1

Paper
Add Code

The Power of Depth for Feedforward Neural Networks

no code implementations • 12 Dec 2015 • Ronen Eldan, Ohad Shamir

We show that there is a simple (approximately radial) function on $\reals^d$, expressible by a small 3-layer feedforward neural networks, which cannot be approximated by any 2-layer network, to more than a certain constant accuracy, unless its width is exponential in the dimension.

Paper
Add Code

Multi-Player Bandits -- a Musical Chairs Approach

no code implementations • 9 Dec 2015 • Jonathan Rosenski, Ohad Shamir, Liran Szlak

We consider a variant of the stochastic multi-armed bandit problem, where multiple players simultaneously choose from the same set of arms and may collide, receiving no reward.

Paper
Add Code

On the Quality of the Initial Basin in Overspecified Neural Networks

no code implementations • 13 Nov 2015 • Itay Safran, Ohad Shamir

Deep learning, in the form of artificial neural networks, has achieved remarkable practical success in recent years, for a variety of difficult machine learning applications.

Paper
Add Code

Convergence of Stochastic Gradient Descent for PCA

no code implementations • 30 Sep 2015 • Ohad Shamir

We consider the problem of principal component analysis (PCA) in a streaming stochastic setting, where our goal is to find a direction of approximate maximal variance, based on a stream of i. i. d.

Paper
Add Code

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback

no code implementations • 31 Jul 2015 • Ohad Shamir

We consider the closely related problems of bandit convex optimization with two-point feedback, and zero-order stochastic convex optimization with two function evaluations per round.

Paper
Add Code

Fast Stochastic Algorithms for SVD and PCA: Convergence Properties and Convexity

no code implementations • 31 Jul 2015 • Ohad Shamir

We study the convergence properties of the VR-PCA algorithm introduced by \cite{shamir2015stochastic} for fast computation of leading singular vectors.

Paper
Add Code

Communication Complexity of Distributed Convex Learning and Optimization

no code implementations • NeurIPS 2015 • Yossi Arjevani, Ohad Shamir

We study the fundamental limits to communication-efficient distributed methods for convex learning and optimization, under different assumptions on the information available to individual machines, and the types of functions considered.

Paper
Add Code

On Lower and Upper Bounds for Smooth and Strongly Convex Optimization Problems

no code implementations • 23 Mar 2015 • Yossi Arjevani, Shai Shalev-Shwartz, Ohad Shamir

This, in turn, reveals a powerful connection between a class of optimization algorithms and the analytic theory of polynomials whereby new lower and upper bounds are derived.

valid

Paper
Add Code

On the Complexity of Learning with Kernels

no code implementations • 5 Nov 2014 • Nicolò Cesa-Bianchi, Yishay Mansour, Ohad Shamir

In this paper, we study lower bounds on the error attainable by such methods as a function of the number of entries observed in the kernel matrix or the rank of an approximate kernel matrix.

Paper
Add Code

Attribute Efficient Linear Regression with Data-Dependent Sampling

no code implementations • 23 Oct 2014 • Doron Kukliansky, Ohad Shamir

In this paper we analyze a budgeted learning setting, in which the learner can only choose and observe a small subset of the attributes of each training example.

Attribute regression

Paper
Add Code

On the Computational Efficiency of Training Neural Networks

1 code implementation • NeurIPS 2014 • Roi Livni, Shai Shalev-Shwartz, Ohad Shamir

It is well-known that neural networks are computationally hard to train.

Computational Efficiency

244

Paper
Code

Nonstochastic Multi-Armed Bandits with Graph-Structured Feedback

no code implementations • 30 Sep 2014 • Noga Alon, Nicolò Cesa-Bianchi, Claudio Gentile, Shie Mannor, Yishay Mansour, Ohad Shamir

This naturally models several situations where the losses of different actions are related, and knowing the loss of one action provides information on the loss of other actions.

Multi-Armed Bandits

Paper
Add Code

A Stochastic PCA and SVD Algorithm with an Exponential Convergence Rate

no code implementations • 9 Sep 2014 • Ohad Shamir

We describe and analyze a simple algorithm for principal component analysis and singular value decomposition, VR-PCA, which uses computationally cheap stochastic iterations, yet converges exponentially fast to the optimal solution.

Paper
Add Code

On the Complexity of Bandit Linear Optimization

no code implementations • 11 Aug 2014 • Ohad Shamir

We study the attainable regret for online linear optimization problems with bandit feedback, where unlike the full-information setting, the player can only observe its own loss rather than the full loss vector.

Paper
Add Code

The Sample Complexity of Learning Linear Predictors with the Squared Loss

no code implementations • 19 Jun 2014 • Ohad Shamir

In this short note, we provide a sample complexity lower bound for learning linear predictors with respect to the squared loss.

Paper
Add Code

Graph Approximation and Clustering on a Budget

no code implementations • 10 Jun 2014 • Ethan Fetaya, Ohad Shamir, Shimon Ullman

We consider the problem of learning from a similarity matrix (such as spectral clustering and lowd imensional embedding), when computing pairwise similarities are costly, and only a limited number of entries can be observed.

Clustering

Paper
Add Code

Communication Efficient Distributed Optimization using an Approximate Newton-type Method

1 code implementation • 30 Dec 2013 • Ohad Shamir, Nathan Srebro, Tong Zhang

We present a novel Newton-type method for distributed optimization, which is particularly well suited for stochastic optimization and learning problems.

Distributed Optimization Vocal Bursts Type Prediction

Paper
Code

Online Learning with Costly Features and Labels

no code implementations • NeurIPS 2013 • Nicolò Cesa-Bianchi, Ofer Dekel, Ohad Shamir

In particular, we show that with switching costs, the attainable rate with bandit feedback is $T^{2/3}$.

Paper
Add Code

Fundamental Limits of Online and Distributed Algorithms for Statistical Learning and Estimation

no code implementations • NeurIPS 2014 • Ohad Shamir

Many machine learning approaches are characterized by information constraints on how they interact with the training data.

Multi-Armed Bandits Stochastic Optimization

Paper
Add Code

Probabilistic Label Trees for Efficient Large Scale Image Classification

no code implementations • CVPR 2013 • Baoyuan Liu, Fereshteh Sadeghi, Marshall Tappen, Ohad Shamir, Ce Liu

Large-scale recognition problems with thousands of classes pose a particular challenge because applying the classifier requires more computation as the number of classes grows.

Classification General Classification +1

Paper
Add Code

An Algorithm for Training Polynomial Networks

no code implementations • 26 Apr 2013 • Roi Livni, Shai Shalev-Shwartz, Ohad Shamir

The main goal of this paper is the derivation of an efficient layer-by-layer algorithm for training such networks, which we denote as the \emph{Basis Learner}.

Paper
Add Code

Online Learning with Switching Costs and Other Adaptive Adversaries

no code implementations • NeurIPS 2013 • Nicolo Cesa-Bianchi, Ofer Dekel, Ohad Shamir

In particular, we show that with switching costs, the attainable rate with bandit feedback is $\widetilde{\Theta}(T^{2/3})$.

Paper
Add Code

Relax and Randomize : From Value to Algorithms

no code implementations • NeurIPS 2012 • Sasha Rakhlin, Ohad Shamir, Karthik Sridharan

We show a principled way of deriving online learning algorithms from a minimax analysis.

Matrix Completion Transductive Learning

Paper
Add Code

On the Complexity of Bandit and Derivative-Free Stochastic Convex Optimization

no code implementations • 11 Sep 2012 • Ohad Shamir

The problem of stochastic convex optimization with bandit feedback (in the learning community) or without knowledge of gradients (in the optimization community) has received much attention in recent years, in the form of algorithms and performance upper bounds.

Paper
Add Code

Learning with the weighted trace-norm under arbitrary sampling distributions

no code implementations • NeurIPS 2011 • Rina Foygel, Ohad Shamir, Nati Srebro, Ruslan R. Salakhutdinov

We provide rigorous guarantees on learning with the weighted trace-norm under arbitrary sampling distributions.

Paper
Add Code

From Bandits to Experts: On the Value of Side-Observations

no code implementations • NeurIPS 2011 • Shie Mannor, Ohad Shamir

We consider an adversarial online learning setting where a decision maker can choose an action in every stage of the game.

Multi-Armed Bandits

Paper
Add Code

Efficient Online Learning via Randomized Rounding

no code implementations • NeurIPS 2011 • Nicolò Cesa-Bianchi, Ohad Shamir

Most online algorithms used in machine learning today are based on variants of mirror descent or follow-the-leader.

BIG-bench Machine Learning Collaborative Filtering +1

Paper
Add Code

Better Mini-Batch Algorithms via Accelerated Gradient Methods

no code implementations • NeurIPS 2011 • Andrew Cotter, Ohad Shamir, Nati Srebro, Karthik Sridharan

Mini-batch algorithms have recently received significant attention as a way to speed-up stochastic convex optimization problems.

Paper
Add Code

Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression

no code implementations • NeurIPS 2011 • Sham M. Kakade, Varun Kanade, Ohad Shamir, Adam Kalai

In this paper, we provide algorithms for learning GLMs and SIMs, which are both computationally and statistically efficient.

regression

Paper
Add Code

Efficient Transductive Online Learning via Randomized Rounding

no code implementations • 13 Jun 2011 • Nicolò Cesa-Bianchi, Ohad Shamir

Most traditional online learning algorithms are based on variants of mirror descent or follow-the-leader.

Collaborative Filtering Open-Ended Question Answering

Paper
Add Code

Learning Exponential Families in High-Dimensions: Strong Convexity and Sparsity

no code implementations • 31 Oct 2009 • Sham M. Kakade, Ohad Shamir, Karthik Sridharan, Ambuj Tewari

The versatility of exponential families, along with their attendant convexity properties, make them a popular and effective statistical model.

Vocal Bursts Intensity Prediction

Paper
Add Code

On the Reliability of Clustering Stability in the Large Sample Regime

no code implementations • NeurIPS 2008 • Ohad Shamir, Naftali Tishby

In this paper, we provide a set of general sufficient conditions, which ensure the reliability of clustering stability estimators in the large sample regime.

Clustering Model Selection

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.