no code implementations • 7 Feb 2024 • Sanjari Srivastava, Piotr Mardziel, Zhikhun Zhang, Archana Ahlawat, Anupam Datta, John C Mitchell
Through the case of binary gender bias, we demonstrate that Counterfactual Data Augmentation (CDA), a known method for addressing bias, also mitigates bias amplification by DP.
no code implementations • 13 Oct 2023 • Ravi Mangal, Klas Leino, Zifan Wang, Kai Hu, Weicheng Yu, Corina Pasareanu, Anupam Datta, Matt Fredrikson
There are three layers to this inquiry, which we address in this paper: (1) why do we care about robustness research?
no code implementations • 28 Aug 2023 • Clark Barrett, Brad Boyd, Elie Burzstein, Nicholas Carlini, Brad Chen, Jihye Choi, Amrita Roy Chowdhury, Mihai Christodorescu, Anupam Datta, Soheil Feizi, Kathleen Fisher, Tatsunori Hashimoto, Dan Hendrycks, Somesh Jha, Daniel Kang, Florian Kerschbaum, Eric Mitchell, John Mitchell, Zulfikar Ramzan, Khawaja Shams, Dawn Song, Ankur Taly, Diyi Yang
However, GenAI can be used just as well by attackers to generate new attacks and increase the velocity and efficacy of existing attacks.
no code implementations • 1 Jun 2022 • Kaiji Lu, Anupam Datta
Previous works show that deep NLP models are not always conceptually sound: they do not always learn the correct linguistic concepts.
no code implementations • 24 May 2022 • Zifan Wang, Yuhang Yao, Chaoran Zhang, Han Zhang, Youjie Kang, Carlee Joe-Wong, Matt Fredrikson, Anupam Datta
Second, our analytical and empirical results demonstrate that feature attribution methods cannot capture the nonlinear effect of edge features, while existing subgraph explanation methods are not faithful.
no code implementations • ICLR 2022 • Emily Black, Zifan Wang, Matt Fredrikson, Anupam Datta
Counterfactual examples are one of the most commonly-cited methods for explaining the predictions of machine learning models in key areas such as finance and medical diagnosis.
1 code implementation • 20 Mar 2021 • Zifan Wang, Matt Fredrikson, Anupam Datta
Recent work has found that adversarially-robust deep networks used for image classification are more interpretable: their feature attributions tend to be sharper, and are more concentrated on the objects associated with the image's ground-truth class.
no code implementations • NeurIPS 2021 • Kaiji Lu, Zifan Wang, Piotr Mardziel, Anupam Datta
While attention is all you need may be proving true, we do not know why: attention-based transformer models such as BERT are superior but how information flows from input tokens to output predictions are unclear.
no code implementations • 28 Sep 2020 • Kaiji Lu, Zifan Wang, Piotr Mardziel, Anupam Datta
While “attention is all you need” may be proving true, we do not yet know why: attention-based transformer models such as BERT are superior but how they contextualize information even for simple grammatical rules such as subject-verb number agreement(SVA) is uncertain.
no code implementations • 17 Sep 2020 • Xuan Chen, Zifan Wang, Yucai Fan, Bonan Jin, Piotr Mardziel, Carlee Joe-Wong, Anupam Datta
Feature attribution has been a foundational building block for explaining the input feature importance in supervised learning with Deep Neural Network (DNNs), but face new challenges when applied to deep Reinforcement Learning (RL). We propose a new approach to explaining deep RL actions by defining a class of \emph{action reconstruction} functions that mimic the behavior of a network in deep RL.
no code implementations • 14 Jun 2020 • Sanghamitra Dutta, Praveen Venkatesh, Piotr Mardziel, Anupam Datta, Pulkit Grover
While quantifying disparity is essential, sometimes the needs of an occupation may require the use of certain features that are critical in a way that any disparity that can be explained by them might need to be exempted.
1 code implementation • NeurIPS 2020 • Zifan Wang, Haofan Wang, Shakul Ramkumar, Matt Fredrikson, Piotr Mardziel, Anupam Datta
Feature attributions are a popular tool for explaining the behavior of Deep Neural Networks (DNNs), but have recently been shown to be vulnerable to attacks that produce divergent explanations for nearby inputs.
no code implementations • ACL 2020 • Kaiji Lu, Piotr Mardziel, Klas Leino, Matt Fedrikson, Anupam Datta
LSTM-based recurrent neural networks are the state-of-the-art for many natural language processing (NLP) tasks.
no code implementations • 19 Feb 2020 • Zifan Wang, Piotr Mardziel, Anupam Datta, Matt Fredrikson
In this work we expand the foundationsof human-understandable concepts with which attributionscan be interpreted beyond "importance" and its visualization; we incorporate the logical concepts of necessity andsufficiency, and the concept of proportionality.
no code implementations • ICLR 2019 • Klas Leino, Emily Black, Matt Fredrikson, Shayak Sen, Anupam Datta
This overestimation gives rise to feature-wise bias amplification -- a previously unreported form of bias that can be traced back to the features of a trained model.
1 code implementation • NeurIPS 2018 • Samuel Yeom, Anupam Datta, Matt Fredrikson
In this paper we formulate a definition of proxy use for the setting of linear regression and present algorithms for detecting proxies.
1 code implementation • 31 Jul 2018 • Kaiji Lu, Piotr Mardziel, Fangjing Wu, Preetam Amancharla, Anupam Datta
We define a general benchmark to quantify gender bias in a variety of neural NLP tasks.
no code implementations • 28 Mar 2018 • Shayak Sen, Piotr Mardziel, Anupam Datta, Matthew Fredrikson
Standard methods for training classifiers that minimize empirical risk do not constrain the behavior of the classifier on such datapoints.
2 code implementations • ICLR 2018 • Klas Leino, Shayak Sen, Anupam Datta, Matt Fredrikson, Linyi Li
We study the problem of explaining a rich class of behavioral properties of deep neural networks.
no code implementations • 29 Nov 2017 • Anupam Datta, Sophia Kovaleva, Piotr Mardziel, Shayak Sen
The interpretation of latent factors can then replace the uninterpreted latent factors, resulting in a new model that expresses predictions in terms of interpretable features.
no code implementations • 27 Sep 2017 • Linyi Li, Matt Fredrikson, Shayak Sen, Anupam Datta
In this report, we applied integrated gradients to explaining a neural network for diabetic retinopathy detection.
3 code implementations • 25 Jul 2017 • Anupam Datta, Matt Fredrikson, Gihyuk Ko, Piotr Mardziel, Shayak Sen
Machine learnt systems inherit biases against protected classes, historically disparaged groups, from training data.
no code implementations • 22 May 2017 • Anupam Datta, Matthew Fredrikson, Gihyuk Ko, Piotr Mardziel, Shayak Sen
For a specific instantiation of this definition, we present a program analysis technique that detects instances of proxy use in a model, and provides a witness that identifies which parts of the corresponding program exhibit the behavior.
no code implementations • 4 Oct 2013 • Jeremiah Blocki, Manuel Blum, Anupam Datta
(2) The puzzles are hard for a computer to solve even if it has the random bits used by the computer to generate the final puzzle --- unlike a CAPTCHA.
no code implementations • 22 Aug 2012 • Jeremiah Blocki, Avrim Blum, Anupam Datta, Or Sheffet
Specifically, given a query f and a hypothesis H about the structure of a dataset D, we show generically how to transform f into a new query f_H whose global sensitivity (over all datasets including those that do not satisfy H) matches the restricted sensitivity of the query f. Moreover, if the belief of the querier is correct (i. e., D is in H) then f_H(D) = f(D).
Cryptography and Security Social and Information Networks Physics and Society