Search Results for author: Sunipa Dev

Found 25 papers, 14 papers with code

MisgenderMender: A Community-Informed Approach to Interventions for Misgendering

no code implementations • 23 Apr 2024 • Tamanna Hossain, Sunipa Dev, Sameer Singh

We are the first to address this lack of research into interventions for misgendering by conducting a survey of gender-diverse individuals in the US to understand perspectives about automated interventions for text-based misgendering.

Paper
Add Code

GeniL: A Multilingual Dataset on Generalizing Language

no code implementations • 8 Apr 2024 • Aida Mostafazadeh Davani, Sagar Gubbi, Sunipa Dev, Shachi Dave, Vinodkumar Prabhakaran

We argue that understanding the sentential context is crucial for detecting instances of generalization.

Paper
Add Code

SeeGULL Multilingual: a Dataset of Geo-Culturally Situated Stereotypes

no code implementations • 8 Mar 2024 • Mukul Bhutani, Kevin Robinson, Vinodkumar Prabhakaran, Shachi Dave, Sunipa Dev

While generative multilingual models are rapidly being deployed, their safety and fairness evaluations are largely limited to resources collected in English.

Fairness

Paper
Add Code

MiTTenS: A Dataset for Evaluating Misgendering in Translation

1 code implementation • 13 Jan 2024 • Kevin Robinson, Sneha Kudugunta, Romina Stella, Sunipa Dev, Jasmijn Bastings

Misgendering is the act of referring to someone in a way that does not reflect their gender identity.

Machine Translation Translation

Paper
Code

ViSAGe: A Global-Scale Analysis of Visual Stereotypes in Text-to-Image Generation

no code implementations • 12 Jan 2024 • Akshita Jha, Vinodkumar Prabhakaran, Remi Denton, Sarah Laszlo, Shachi Dave, Rida Qadri, Chandan K. Reddy, Sunipa Dev

First, we show that stereotypical attributes in ViSAGe are thrice as likely to be present in generated images of corresponding identities as compared to other attributes, and that the offensiveness of these depictions is especially higher for identities from Africa, South America, and South East Asia.

Text-to-Image Generation

Paper
Add Code

SoUnD Framework: Analyzing (So)cial Representation in (Un)structured (D)ata

no code implementations • 28 Nov 2023 • Mark Díaz, Sunipa Dev, Emily Reif, Emily Denton, Vinodkumar Prabhakaran

The unstructured nature of data used in foundation model development is a challenge to systematic analyses for making data use and documentation decisions.

Paper
Add Code

MISGENDERED: Limits of Large Language Models in Understanding Pronouns

no code implementations • 6 Jun 2023 • Tamanna Hossain, Sunipa Dev, Sameer Singh

Content Warning: This paper contains examples of misgendering and erasure that could be offensive and potentially triggering.

Sentence

Paper
Add Code

SeeGULL: A Stereotype Benchmark with Broad Geo-Cultural Coverage Leveraging Generative Models

1 code implementation • 19 May 2023 • Akshita Jha, Aida Davani, Chandan K. Reddy, Shachi Dave, Vinodkumar Prabhakaran, Sunipa Dev

Stereotype benchmark datasets are crucial to detect and mitigate social stereotypes about groups of people in NLP models.

Paper
Code

PaLM 2 Technical Report

1 code implementation • 17 May 2023 • Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark, Laurent El Shafey, Yanping Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yujing Zhang, Gustavo Hernandez Abrego, Junwhan Ahn, Jacob Austin, Paul Barham, Jan Botha, James Bradbury, Siddhartha Brahma, Kevin Brooks, Michele Catasta, Yong Cheng, Colin Cherry, Christopher A. Choquette-Choo, Aakanksha Chowdhery, Clément Crepy, Shachi Dave, Mostafa Dehghani, Sunipa Dev, Jacob Devlin, Mark Díaz, Nan Du, Ethan Dyer, Vlad Feinberg, Fangxiaoyu Feng, Vlad Fienber, Markus Freitag, Xavier Garcia, Sebastian Gehrmann, Lucas Gonzalez, Guy Gur-Ari, Steven Hand, Hadi Hashemi, Le Hou, Joshua Howland, Andrea Hu, Jeffrey Hui, Jeremy Hurwitz, Michael Isard, Abe Ittycheriah, Matthew Jagielski, Wenhao Jia, Kathleen Kenealy, Maxim Krikun, Sneha Kudugunta, Chang Lan, Katherine Lee, Benjamin Lee, Eric Li, Music Li, Wei Li, Yaguang Li, Jian Li, Hyeontaek Lim, Hanzhao Lin, Zhongtao Liu, Frederick Liu, Marcello Maggioni, Aroma Mahendru, Joshua Maynez, Vedant Misra, Maysam Moussalem, Zachary Nado, John Nham, Eric Ni, Andrew Nystrom, Alicia Parrish, Marie Pellat, Martin Polacek, Alex Polozov, Reiner Pope, Siyuan Qiao, Emily Reif, Bryan Richter, Parker Riley, Alex Castro Ros, Aurko Roy, Brennan Saeta, Rajkumar Samuel, Renee Shelby, Ambrose Slone, Daniel Smilkov, David R. So, Daniel Sohn, Simon Tokumine, Dasha Valter, Vijay Vasudevan, Kiran Vodrahalli, Xuezhi Wang, Pidong Wang, ZiRui Wang, Tao Wang, John Wieting, Yuhuai Wu, Kelvin Xu, Yunhan Xu, Linting Xue, Pengcheng Yin, Jiahui Yu, Qiao Zhang, Steven Zheng, Ce Zheng, Weikang Zhou, Denny Zhou, Slav Petrov, Yonghui Wu

Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on downstream tasks across different model sizes, while simultaneously exhibiting faster and more efficient inference compared to PaLM.

Ranked #1 on Question Answering on StrategyQA

Code Generation Common Sense Reasoning +6

Paper
Code

Cultural Re-contextualization of Fairness Research in Language Technologies in India

no code implementations • 21 Nov 2022 • Shaily Bhatt, Sunipa Dev, Partha Talukdar, Shachi Dave, Vinodkumar Prabhakaran

Recent research has revealed undesirable biases in NLP data and models.

Fairness Position

Paper
Add Code

Auditing Algorithmic Fairness in Machine Learning for Health with Severity-Based LOGAN

no code implementations • 16 Nov 2022 • Anaelia Ovalle, Sunipa Dev, Jieyu Zhao, Majid Sarrafzadeh, Kai-Wei Chang

Therefore, ML auditing tools must be (1) better aligned with ML4H auditing principles and (2) able to illuminate and characterize communities vulnerable to the most harm.

Bias Detection Clustering +1

Paper
Add Code

The Tail Wagging the Dog: Dataset Construction Biases of Social Bias Benchmarks

1 code implementation • 18 Oct 2022 • Nikil Roashan Selvam, Sunipa Dev, Daniel Khashabi, Tushar Khot, Kai-Wei Chang

How reliably can we trust the scores obtained from social bias benchmarks as faithful indicators of problematic social biases in a given language model?

Language Modelling

Paper
Code

Re-contextualizing Fairness in NLP: The Case of India

1 code implementation • 25 Sep 2022 • Shaily Bhatt, Sunipa Dev, Partha Talukdar, Shachi Dave, Vinodkumar Prabhakaran

In this paper, we focus on NLP fair-ness in the context of India.

Fairness

Paper
Code

DisinfoMeme: A Multimodal Dataset for Detecting Meme Intentionally Spreading Out Disinformation

no code implementations • 25 May 2022 • Jingnong Qu, Liunian Harold Li, Jieyu Zhao, Sunipa Dev, Kai-Wei Chang

Disinformation has become a serious problem on social media.

Multimodal Reasoning Optical Character Recognition (OCR)

Paper
Add Code

PaLM: Scaling Language Modeling with Pathways

6 code implementations • Google Research 2022 • Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, Parker Schuh, Kensen Shi, Sasha Tsvyashchenko, Joshua Maynez, Abhishek Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vinodkumar Prabhakaran, Emily Reif, Nan Du, Ben Hutchinson, Reiner Pope, James Bradbury, Jacob Austin, Michael Isard, Guy Gur-Ari, Pengcheng Yin, Toju Duke, Anselm Levskaya, Sanjay Ghemawat, Sunipa Dev, Henryk Michalewski, Xavier Garcia, Vedant Misra, Kevin Robinson, Liam Fedus, Denny Zhou, Daphne Ippolito, David Luan, Hyeontaek Lim, Barret Zoph, Alexander Spiridonov, Ryan Sepassi, David Dohan, Shivani Agrawal, Mark Omernick, Andrew M. Dai, Thanumalayan Sankaranarayana Pillai, Marie Pellat, Aitor Lewkowycz, Erica Moreira, Rewon Child, Oleksandr Polozov, Katherine Lee, Zongwei Zhou, Xuezhi Wang, Brennan Saeta, Mark Diaz, Orhan Firat, Michele Catasta, Jason Wei, Kathy Meier-Hellstern, Douglas Eck, Jeff Dean, Slav Petrov, Noah Fiedel

To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM.

Ranked #1 on Coreference Resolution on Winograd Schema Challenge

Auto Debugging Code Generation +17

979

Paper
Code

Representation Learning for Resource-Constrained Keyphrase Generation

1 code implementation • 15 Mar 2022 • Di wu, Wasi Uddin Ahmad, Sunipa Dev, Kai-Wei Chang

State-of-the-art keyphrase generation methods generally depend on large annotated datasets, limiting their performance in domains with limited annotated data.

Denoising Domain Adaptation +4

Paper
Code

Socially Aware Bias Measurements for Hindi Language Representations

2 code implementations • NAACL 2022 • Vijit Malik, Sunipa Dev, Akihiro Nishi, Nanyun Peng, Kai-Wei Chang

Language representations are efficient tools used across NLP applications, but they are strife with encoded societal biases.

Cultural Vocal Bursts Intensity Prediction

Paper
Code

Harms of Gender Exclusivity and Challenges in Non-Binary Representation in Language Technologies

1 code implementation • EMNLP 2021 • Sunipa Dev, Masoud Monajatipoor, Anaelia Ovalle, Arjun Subramonian, Jeff M Phillips, Kai-Wei Chang

Gender is widely discussed in the context of language tasks and when examining the stereotypes propagated by language models.

Paper
Code

On Measures of Biases and Harms in NLP

no code implementations • 7 Aug 2021 • Sunipa Dev, Emily Sheng, Jieyu Zhao, Aubrie Amstutz, Jiao Sun, Yu Hou, Mattie Sanseverino, Jiin Kim, Akihiro Nishi, Nanyun Peng, Kai-Wei Chang

Recent studies show that Natural Language Processing (NLP) technologies propagate societal biases about demographic groups associated with attributes such as gender, race, and nationality.

Paper
Add Code

VERB: Visualizing and Interpreting Bias Mitigation Techniques for Word Representations

1 code implementation • 6 Apr 2021 • Archit Rathore, Sunipa Dev, Jeff M. Phillips, Vivek Srikumar, Yan Zheng, Chin-Chia Michael Yeh, Junpeng Wang, Wei zhang, Bei Wang

To aid this, we present Visualization of Embedding Representations for deBiasing system ("VERB"), an open-source web-based visualization tool that helps the users gain a technical understanding and visual intuition of the inner workings of debiasing techniques, with a focus on their geometric properties.

Decision Making Dimensionality Reduction +3

Paper
Code

The Geometry of Distributed Representations for Better Alignment, Attenuated Bias, and Improved Interpretability

1 code implementation • 25 Nov 2020 • Sunipa Dev

High-dimensional representations for words, text, images, knowledge graphs and other structured data are commonly used in different paradigms of machine learning and data mining.

Knowledge Graphs

Paper
Code

OSCaR: Orthogonal Subspace Correction and Rectification of Biases in Word Embeddings

1 code implementation • EMNLP 2021 • Sunipa Dev, Tao Li, Jeff M. Phillips, Vivek Srikumar

Language representations are known to carry stereotypical biases and, as a result, lead to biased predictions in downstream tasks.

Word Embeddings

Paper
Code

On Measuring and Mitigating Biased Inferences of Word Embeddings

2 code implementations • 25 Aug 2019 • Sunipa Dev, Tao Li, Jeff Phillips, Vivek Srikumar

Word embeddings carry stereotypical connotations from the text they are trained on, which can lead to invalid inferences in downstream models that rely on them.

Natural Language Inference Word Embeddings

Paper
Code

Attenuating Bias in Word Vectors

1 code implementation • 23 Jan 2019 • Sunipa Dev, Jeff Phillips

Word vector representations are well developed tools for various NLP and Machine Learning tasks and are known to retain significant semantic and syntactic structure of languages.

Paper
Code

Closed Form Word Embedding Alignment

no code implementations • 4 Jun 2018 • Sunipa Dev, Safia Hassan, Jeff M. Phillips

We develop a family of techniques to align word embeddings which are derived from different source datasets or created using different mechanisms (e. g., GloVe or word2vec).

Word Embeddings

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.