no code implementations • 7 Mar 2024 • Devjeet Roy, Xuchao Zhang, Rashi Bhave, Chetan Bansal, Pedro Las-Casas, Rodrigo Fonseca, Saravan Rajmohan
Lastly, we conduct a case study with a team at Microsoft to equip the ReAct agent with tools that give it access to external diagnostic services that are used by the team for manual RCA.
no code implementations • 29 Feb 2024 • Pooja Srinivas, Fiza Husain, Anjaly Parayil, Ayush Choure, Chetan Bansal, Saravan Rajmohan
We conduct an extensive empirical study and derive key insights on the major classes of monitors employed by cloud services at Microsoft, their associated dimensions, and the interrelationship between service properties and this ontology.
no code implementations • 15 Feb 2024 • Drishti Goel, Fiza Husain, Aditya Singh, Supriyo Ghosh, Anjaly Parayil, Chetan Bansal, Xuchao Zhang, Saravan Rajmohan
to generate insights for detection, root causing and mitigating of incidents.
no code implementations • 5 Feb 2024 • Supriyo Ghosh, Karish Grover, Jimmy Wong, Chetan Bansal, Rakesh Namineni, Mohit Verma, Saravan Rajmohan
In this paper, we propose the dependency-aware incident linking (DiLink) framework which leverages both textual and service dependency graph information to improve the accuracy and coverage of incident links not only coming from same service, but also from different services and workloads.
no code implementations • 24 Jan 2024 • Xuchao Zhang, Supriyo Ghosh, Chetan Bansal, Rujia Wang, Minghua Ma, Yu Kang, Saravan Rajmohan
The results reveal that our in-context learning approach outperforms the previous fine-tuned large language models such as GPT-3 by an average of 24. 8\% across all metrics, with an impressive 49. 7\% improvement over the zero-shot model.
no code implementations • 13 Jan 2024 • Lu Wang, Mayukh Das, Fangkai Yang, Chao Duo, Bo Qiao, Hang Dong, Si Qin, Chetan Bansal, QIngwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang
We address the challenge of learning safe and robust decision policies in presence of uncertainty in context of the real scientific problem of adaptive resource oversubscription to enhance resource efficiency while ensuring safety against resource congestion risk.
no code implementations • 11 Sep 2023 • Dylan Zhang, Xuchao Zhang, Chetan Bansal, Pedro Las-Casas, Rodrigo Fonseca, Saravan Rajmohan
Major cloud providers have employed advanced AI-based solutions like large language models to aid humans in identifying the root causes of cloud incidents.
no code implementations • 10 Jan 2023 • Toufique Ahmed, Supriyo Ghosh, Chetan Bansal, Thomas Zimmermann, Xuchao Zhang, Saravan Rajmohan
In this work, we do the first large-scale study to evaluate the effectiveness of these models for helping engineers root cause and mitigate production incidents.
no code implementations • 26 May 2022 • Manish Shetty, Chetan Bansal, Sai Pramod Upadhyayula, Arjun Radhakrishna, Anurag Gupta
To alleviate these gaps, we investigate the automation of TSGs and propose AutoTSG -- a novel framework for automation of TSGs to executable workflows by combining machine learning and program synthesis.
no code implementations • 29 Sep 2021 • Manish Shetty, Chetan Bansal, Suman Nath, Sean Bowles, Henry Wang, Ozgur Arman, Siamak Ahari
We evaluate our model with over a million real-world crashes from four popular Microsoft applications and show that DeepAnalyze, trained with crashes from one set of applications, not only accurately localizes crashes of the same applications, but also bootstraps crash localization for other applications with zero to very little additional training data.
no code implementations • 15 Jan 2021 • Manish Shetty, Chetan Bansal, Sumit Kumar, Nikitha Rao, Nachiappan Nagappan
We have deployed SoftNER at Microsoft, a major cloud service provider and have evaluated it on more than 2 months of cloud incidents.
no code implementations • 25 Nov 2020 • Chandra Maddila, Sai Surya Upadrasta, Chetan Bansal, Nachiappan Nagappan, Georgios Gousios, Arie van Deursen
The key novelty of Nudge is that it succeeds in reducing pull request resolution time, while ensuring that developers perceive the notifications sent as useful, at the scale of thousands of repositories.
1 code implementation • 24 Nov 2020 • Nikitha Rao, Chetan Bansal, Joe Guan
We evaluate the approach against several baselines on a real-world dataset comprised of over 1 million queries mined from Bing web search engine and show that the CNN based model can achieve an accuracy of 77% and 76% for C# and Java respectively.
no code implementations • 10 Jul 2020 • Manish Shetty, Chetan Bansal, Sumit Kumar, Nikitha Rao, Nachiappan Nagappan, Thomas Zimmermann
Prior work on incident management has been heavily focused on the challenges with incident triaging and de-duplication.
no code implementations • 30 May 2020 • Foyzul Hassan, Chetan Bansal, Nachiappan Nagappan, Thomas Zimmermann, Ahmed Hassan Awadallah
Using the machine learning model, we extracted exceptions from raw queries and performed popularity, effort, success, query characteristic and web domain analysis.
no code implementations • 18 May 2020 • Nikitha Rao, Chetan Bansal, Subhabrata Mukherjee, Chandra Maddila
Web search engines are frequently used to access information about products.
no code implementations • 1 May 2020 • Chetan Bansal, Pantazis Deligiannis, Chandra Maddila, Nikitha Rao
Cyber attacks are increasingly becoming prevalent and causing significant damage to individuals, businesses and even countries.
no code implementations • 19 Dec 2019 • Nikitha Rao, Chetan Bansal, Thomas Zimmermann, Ahmed Hassan Awadallah, Nachiappan Nagappan
Subsequently, we propose a taxonomy of intents to identify the various contexts in which web search is used in software engineering.