no code implementations • 2 Oct 2023 • Praneeth Kacham, Vahab Mirrokni, Peilin Zhong
For context lengths of 32k and GPT-2 style models, our model achieves a 2. 5-4x speedup in training compared to FlashAttention, with no observed degradation in quality across our experiments.
no code implementations • 14 Jul 2023 • Alessandro Epasto, Tamalika Mukherjee, Peilin Zhong
In this work, we provide the first differentially private streaming algorithms for $k$-means and $k$-median clustering of $d$-dimensional Euclidean data points over a stream with length at most $T$ using $poly(k, d,\log(T))$ space to achieve a constant multiplicative error and a $poly(k, d,\log(T))$ additive error.
3 code implementations • 12 Apr 2023 • CJ Carey, Travis Dick, Alessandro Epasto, Adel Javanmard, Josh Karlin, Shankar Kumar, Andres Munoz Medina, Vahab Mirrokni, Gabriel Henrique Nunes, Sergei Vassilvitskii, Peilin Zhong
In this work, we present a new theoretical framework to measure re-identification risk in such user representations.
no code implementations • 5 Dec 2022 • CJ Carey, Jonathan Halcrow, Rajesh Jayaram, Vahab Mirrokni, Warren Schudy, Peilin Zhong
We evaluate the performance of Stars for clustering and graph learning, and demonstrate 10~1000-fold improvements in pairwise similarity comparisons compared to different baselines, and 2~10-fold improvement in running time without quality loss.
1 code implementation • 14 Jul 2022 • Alessandro Epasto, Vahab Mirrokni, Bryan Perozzi, Anton Tsitsulin, Peilin Zhong
Personalized PageRank (PPR) is a fundamental tool in unsupervised learning of graph representations such as node ranking, labeling, and graph embedding.
no code implementations • NeurIPS 2020 • Ruosong Wang, Peilin Zhong, Simon S. Du, Russ R. Salakhutdinov, Lin Yang
Standard sequential decision-making paradigms aim to maximize the cumulative reward when interacting with the unknown environment., i. e., maximize $\sum_{h = 1}^H r_h$ where $H$ is the planning horizon.
no code implementations • 16 Apr 2020 • Zhao Song, David P. Woodruff, Peilin Zhong
entries drawn from any distribution $\mu$ for which the $(1+\gamma)$-th moment exists, for an arbitrarily small constant $\gamma > 0$, then it is possible to obtain a $(1+\epsilon)$-approximate column subset selection to the entrywise $\ell_1$-norm in nearly linear time.
1 code implementation • NeurIPS 2019 • Zhao Song, David Woodruff, Peilin Zhong
entries drawn from any distribution $\mu$ for which the $(1+\gamma)$-th moment exists, for an arbitrarily small constant $\gamma > 0$, then it is possible to obtain a $(1+\epsilon)$-approximate column subset selection to the entrywise $\ell_1$-norm in nearly linear time.
no code implementations • NeurIPS 2019 • Zhao Song, Ruosong Wang, Lin F. Yang, Hongyang Zhang, Peilin Zhong
When the loss function is a general symmetric norm, our algorithm produces a $\sqrt{d} \cdot \mathrm{polylog} n \cdot \mathrm{mmc}(\ell)$-approximate solution in input-sparsity time, where $\mathrm{mmc}(\ell)$ is a quantity related to the symmetric norm under consideration.
1 code implementation • ICLR 2020 • Chang Xiao, Peilin Zhong, Changxi Zheng
In all cases, the robustness of k-WTA networks outperforms that of traditional networks under white-box attacks.
no code implementations • NeurIPS 2019 • Peilin Zhong, Yuchen Mo, Chang Xiao, Peng-Yu Chen, Changxi Zheng
The conventional wisdom to this end is by reducing through training a statistical distance (such as $f$-divergence) between the generated distribution and provided data distribution.
1 code implementation • NeurIPS 2019 • Zhao Song, David P. Woodruff, Peilin Zhong
Our approximation algorithms handle functions which are not even scale-invariant, such as the Huber loss function, which we show have very different structural properties than $\ell_p$-norms, e. g., one can show the lack of scale-invariance causes any column subset selection algorithm to provably require a $\sqrt{\log n}$ factor larger number of columns than $\ell_p$-norms; nevertheless we design the first efficient column subset selection algorithms for such error measures.
no code implementations • ICML 2018 • Alexandr Andoni, Chengyu Lin, Ying Sheng, Peilin Zhong, Ruiqi Zhong
An Orlicz norm is parameterized by a non-negative convex function $G:\mathbb{R}_+\rightarrow\mathbb{R}_+$ with $G(0)=0$: the Orlicz norm of a vector $x\in\mathbb{R}^n$ is defined as $ \|x\|_G=\inf\left\{\alpha>0\large\mid\sum_{i=1}^n G(|x_i|/\alpha)\leq 1\right\}.
1 code implementation • NeurIPS 2018 • Chang Xiao, Peilin Zhong, Changxi Zheng
This paper addresses the mode collapse for generative adversarial networks (GANs).
no code implementations • 1 Feb 2018 • Wei Hu, Zhao Song, Lin F. Yang, Peilin Zhong
We consider the $k$-means clustering problem in the dynamic streaming setting, where points from a discrete Euclidean space $\{1, 2, \ldots, \Delta\}^d$ can be dynamically inserted to or deleted from the dataset.
no code implementations • 26 Apr 2017 • Zhao Song, David P. Woodruff, Peilin Zhong
Despite the success on obtaining relative error low rank approximations for matrices, no such results were known for tensors.
no code implementations • 3 Nov 2016 • Zhao Song, David P. Woodruff, Peilin Zhong
We give the first provable approximation algorithms for $\ell_1$-low rank approximation, showing that it is possible to achieve approximation factor $\alpha = (\log d) \cdot \mathrm{poly}(k)$ in $\mathrm{nnz}(A) + (n+d) \mathrm{poly}(k)$ time, where $\mathrm{nnz}(A)$ denotes the number of non-zero entries of $A$.
no code implementations • 28 Jan 2016 • David P. Woodruff, Peilin Zhong
For example, each of $s$ servers may have an $n \times d$ matrix $A^t$, and we may be interested in computing a low rank approximation to $A = f(\sum_{t=1}^s A^t)$, where $f$ is a function which is applied entrywise to the matrix $\sum_{t=1}^s A^t$.