Preetha Chatterjee

Pronouns: she/her
Office: 3675 Market Street, Office 1160, Philadelphia, PA 19104
Email: preetha[dot]chatterjee[at]drexel[dot]edu


Preetha Chatterjee is an Assistant Professor in the Department of Computer Science at Drexel University . She leads the SOftware Engineering and Analytics Research (SOAR) Lab at Drexel University. Her research interests are primarily in software engineering, with the goal of improving software engineers’ tools and environments through different techniques such as data mining, text analysis, and machine learning. She is especially interested in mining software repositories at a large scale and extending data analytics solutions to transform the plethora of information available in software artifacts into actionable nuggets of knowledge and tools, which is useful for both software engineers and researchers. Through her research, she intends to enable advances in areas including building/enhancing recommender systems for developers, information retrieval tasks from unstructured developer communications, and understanding social and human aspects in software engineering.

Her research has been published in top-tier conferences, such as, the International Conference on Software Engineering (ICSE), the International Conference on the Foundations of Software Engineering (FSE), International Conference on Automated Software Engineering (ASE), Mining Software Repositories (MSR), and journals such as the Transactions on Software Engineering and Methodology (TOSEM), Journal of Systems and Software (JSS), among others. [List of publications]

She teaches courses primarily in software analytics, and software engineering at the undergraduate and graduate levels. [List of courses]

She serves in multiple leadership positions in the software engineering research community. She serves on the program committees of various conferences including ICSE, ESEC/FSE, and MSR. She is also on the editorial board of JSS. [List of roles]

She received her M.S. and Ph.D. in Computer Science from the University of Delaware, advised by Dr. Lori Pollock . Prior to that, she worked in the industry as a Software Engineer for 5.5 years.



Latest News!

  • Jan 2024: Paper accepted at MSR 2024, Data Showcase track
  • Dec 2023: Two papers accepted at ICSE 2024, research track
  • Nov 2023: Received Distinguished Reviewer Award at FSE 2023
  • Nov 2023: Paper accepted at the ICSE 2024, NIER track
  • Jul 2023: 2 papers accepted at ESEC/FSE 2023, Ideas, Visions and Reflections Track
  • Mar 2023: I am invited to talk about my research on Emotion Awarenesss in SE at the ``It Will Never Work in Theory (NWiT)'' April, 2023 series


  • Media/Blog Coverage

  • Recognition at FSE 2024: Drexel CCI news
  • Our research on morality in open source: Compassionate Coding Newsletter
  • Our research on emotion mining in SE texts: Drexel CCI news
  • Our research on mining developer chat communications: ABB news
  • Starting ACM-W Chapter at University of Delaware: UDaily article

  • Research

    Brief Overview:

    Integrated development environments today include sophisticated program modeling and analyses behind the scenes to support the developer in navigating, understanding, and modifying their code. While much can be learned from the results of static and dynamic analysis of their source code, developers also look to others for advice and learning. As software development teams are more globally distributed and the open source community has grown, developers rely increasingly on written documents (e.g., chats, Q&A forums) for help they might have previously obtained through in-person conversations. My research activities cover (a) conducting empirical studies to analyze the information in written documents of software archives, and (b) designing techniques to mine useful information from the software archives which could be used to support software teams, and in building/improving software maintenance and evolution tools. Some of my research projects are:

  • Understanding and Improving the Use of Conversational LLMs in Software Engineering: Conversational LLMs (e.g., GPT, Gemini, Claude) have emerged as a pivotal resource for programming support, providing immediate assistance that enhances productivity and simplifies the learning process for developers. These models are particularly valued for allowing software developers to interact in natural language, supporting an interactive learning experience. Despite their popularity, conversational LLMs often omit crucial details or produce incorrect solutions, which are hard or time-consuming for developers to identify. We found several instances where these conversational LLMs suggest fabricated information (e.g., non-existent APIs) or omit warnings about potential security risks in their code suggestions. Our research aims to improve software quality and developer productivity by providing comprehensive support for developers using conversational LLMs. This involves creating a framework to auto-reformulate queries and assess the correctness and reliability of the generated information.
    [ICSE'24_1] [ICSE NIER'24]
  • Mining Emotions from Software Engineering Communication: Emotions can strongly impact activities that are collaborative in nature and require creativity and problem-solving skills, such as software development. Research has shown that positive emotions (e.g., Joy) are associated with increased productivity and job satisfaction in software engineering teams. On the other hand, negative emotions (e.g., Frustration) can cause developers to lose motivation and exhibit lower participation, ultimately leading to team attrition. In this project, we aim to mine emotions and affect in software related text towards improving collaboration and productivity in software projects.
    [ICSE'24_2] [MSR'24] [ICSE'23] [FSE'23_1] [FSE'23_2] [ASE'22]
  • Mining Information from Developer Chat Conversations Towards Building Software Maintenance Tools: Popular chat platforms such as Slack host public chat communities that focus on specific software development topics such as Python or Ruby-on-Rails. Many of those chat communications contain valuable information, such as description of code snippets and APIs, opinions on good programming practices, and causes of common errors/exceptions. This project aims to develop analyses for automatically identifying and extracting information in developers’ chat communications towards improving and building new tools to support software engineers.
    [MSR'22] [TOSEM'21] [ICSE'21] [MSR'20] [MSR'19]
  • Studying Developer Focus on Question and Answer (Q&A) Forums : Although popular Q&A forums such as Stack Overflow serve as a good knowledge resource, the abundance of information can cause developers to spend considerable time in identifying relevant answers and suitable fixes. This project aims to help developers identify informative code and text from Q&A forums, once they have narrowed down their search to a post relevant to their task.
    [NLBSE'22] [JSS'19]
  • Learning about Code Snippet Characteristics in Software Artifacts: Large corpora of software-related artifacts (e.g., blogs, bug reports, emails) offer the unique opportunity to learn from developers’ discussion about code snippets. The goal of this project is to gain insight into the potential value and difficulty of mining the natural language text associated with the code snippets found in a variety of software-related documents, including blog posts, API documentation, code reviews, and public chats.
    [SANER'17]
  • Mining Source Code Descriptions from Research Articles: Digital libraries of computer science research articles can be a rich source for code examples that are used to motivate or explain particular concepts or issues. In this project, we designed a technique to automatically identify natural language descriptions of code segments embedded within articles. Extracting these natural language descriptions alongside code could enable new advances in areas including code-based search, automatic code comment generation, and documentation generation.
    [MSR'17]

    Selected Talks:

    Emotion Awareness in Software Engineering, Lightning Talk, It Will Never Work in Theory (NWiT) April, 2023 series.

    Automatic Extraction of Opinion-based Q&A from Online Developer Chats, 43nd International Conference on Software Engineering (ICSE 21).

    Finding Help with Programming Errors: An Exploratory Study of Novice Software Engineers’ Focus in Stack Overflow Posts, Journal of Systems and Software (JSS 21) Happy Hour.

    Software-related Slack Chats with Disentangled Conversations, 17th International Conference on Mining Software Repositories (MSR 20).


  • Publications

    2024:

  • Incivility in Open Source Projects: A Comprehensive Annotated Dataset of Locked GitHub Issue Threads
    Ramtin Ehsani, Mia Mohammad Imran, Robert Zita, Kostadin Damevski, and Preetha Chatterjee
    The 21st International Conference on Mining Software Repositories (MSR), Data Showcase Track, Apr 2024.

    Preprint Dataset Slides

  • Exploring ChatGPT for Toxicity Detection in GitHub
    Shyamal Mishra, and Preetha Chatterjee
    The 46th International Conference on Software Engineering (ICSE), New Ideas and Emerging Results Track, Apr 2024.

    Preprint DOI Slides

  • Shedding Light on Software Engineering-specific Metaphors and Idioms
    Mia Mohammad Imran, Preetha Chatterjee, and Kostadin Damevski
    The 46th International Conference on Software Engineering (ICSE), Research Track, Apr 2024.

    Preprint DOI Slides

  • Uncovering the Causes of Emotions in Software Developer Communication Using Zero-shot LLMs
    Mia Mohammad Imran, Preetha Chatterjee, and Kostadin Damevski
    The 46th International Conference on Software Engineering (ICSE), Research Track, Apr 2024.

    Preprint DOI Slides Blog Post

    2023:

  • Exploring Moral Principles Exhibited in OSS: A Case Study on GitHub Heated Issues
    Ramtin Ehsani, Rezvaneh Rezapour, and Preetha Chatterjee
    The 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), Ideas, Visions and Reflections Track, Dec 2023.

    Preprint DOI

  • Towards Understanding Emotions in Informal Developer Interactions: A Gitter Chat Study
    Amirali Sajadi, Kostadin Damevski, and Preetha Chatterjee
    The 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), Ideas, Visions and Reflections Track, Dec 2023.

    Preprint DOI

  • Interpersonal Trust in OSS: Exploring Dimensions of Trust in GitHub Pull Requests
    Amirali Sajadi, Kostadin Damevski, and Preetha Chatterjee
    The 45th International Conference on Software Engineering (ICSE), New Ideas and Emerging Results Track, May 2023.

    Preprint DOI Slides

  • The Evolution of Substance Use Coverage in the Philadelphia Inquirer
    Layla Bouzoubaa, Ramtin Ehsani, Preetha Chatterjee, and Rezvaneh Rezapour
    The 17th International AAAI Conference On Web And Social Media (ICWSM), Data Challenge, Jun 2023.

    Preprint DOI

    2022:

  • Data Augmentation for Improving Emotion Recognition in Software Engineering Communication
    Mia Mohammad Imran, Yashasvi Jain, Preetha Chatterjee, and Kostadin Damevski
    The 37th IEEE/ACM International Conference on Automated Software Engineering (ASE), Research Track, Oct 2022.

    Preprint DOI Slides

  • DISCO: A Dataset of Discord Chat Conversations for Software Engineering Research
    Keerthana Muthu Subash, Lakshmi Prasanna Kumar, Sri Lakshmi Vadlamani, Preetha Chatterjee and Olga Baysal
    The 19th International Conference on Mining Software Repositories (MSR), Data Showcase Track, May 2022.

    Preprint DOI Dataset

  • Automatic Identification of Informative Code in Stack Overflow Posts
    Preetha Chatterjee
    The 1st International Workshop on Natural Language-based Software Engineering (NLBSE), co-located with ICSE, May 2022.

    Preprint DOI Slides Talk

  • Empirical Standards for Repository Mining
    Preetha Chatterjee, Tushar Sharma, Paul Ralph
    The 19th International Conference on Mining Software Repositories (MSR), Tutorial, May 2022, May 2022.

    Preprint DOI Empirical Standards

    2021:

  • Automatic Extraction of Opinion-based Q&A from Online Developer Chats
    Preetha Chatterjee, Kostadin Damevski, and Lori Pollock
    The 43rd International Conference on Software Engineering (ICSE), Technical Track, May 2021.

    Preprint DOI Slides Talk

  • Automatically Identifying the Quality of Developer Chats for Post Hoc Use
    Preetha Chatterjee, Kostadin Damevski, Nicholas A. Kraft, and Lori Pollock
    Transactions on Software Engineering and Methodology (TOSEM), Feb2021

    Preprint DOI Slides Talk

  • Mining Information from Developer Chats Towards Building Software Maintenance Tools (Ph.D. Thesis)
    Preetha Chatterjee
    University of Delaware

    Manuscript

  • 2020:

  • Software-related Slack Chats with Disentangled Conversations
    Preetha Chatterjee, Kostadin Damevski, Nicholas A. Kraft, and Lori Pollock
    The 17th International Conference on Mining Software Repositories (MSR), Data Showcase Track, Oct 2020. Seoul, South Korea

    Preprint DOI Dataset Slides Talk

  • Extracting Archival-Quality Information from Software-Related Chats
    Preetha Chatterjee
    The 42nd International Conference on Software Engineering (ICSE), Doctoral Symposium Track, Oct 2020. Seoul, South Korea

    Preprint DOI Slides

  • Finding Help with Programming Errors: An Exploratory Study of Novice Software Engineers’ Focus in Stack Overflow Posts
    Preetha Chatterjee, Minji Kong, Lori Pollock
    Journal of Systems and Software (JSS), Research Paper, Jan 2020.

    Preprint DOI Slides Talk

  • 2019:

  • Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineering Tools
    Preetha Chatterjee, Kostadin Damevski, Lori Pollock, Vinay Augustine, and Nicholas A. Kraft
    The 16th International Conference on Mining Software Repositories (MSR), Research Track, May 2019. Montreal, Canada

    Preprint DOI Slides Press Coverage

  • 2017:

  • Extracting Code Segments and Their Descriptions from Research Articles
    Preetha Chatterjee, Benjamin Gause, Hunter Hedinger, and Lori Pollock
    The 14th International Conference on Mining Software Repositories (MSR), Research Track, May 2017. Buenos Aires, Argentina

    Preprint DOI Slides

  • What Information about Code Snippets Is Available in Different Software-Related Documents? An Exploratory Study
    Preetha Chatterjee, Manziba Akanda Nishi, Kostadin Damevski, Vinay Augustine, Lori Pollock, and Nicholas A. Kraft
    The 24th IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER), Early Research Achievements Track, Feb 2017. Klagenfurt, Austria

    Preprint DOI

  • 2015:

  • Exploring the Generality of a Java-based Loop Action Model for the Quorum Programming Language (Ph.D. Preliminary Project)
    Preetha Chatterjee
    University of Delaware

    Manuscript


  • Teaching

  • Fall 2023: Introduction to Software Engineering and Development (SE 181) @ Drexel University [Instructor]
  • Spring 2023: Software Analytics (CS T680) @ Drexel University [Instructor]
  • Fall 2022: Introduction to Software Engineering and Development (SE 181) @ Drexel University [Instructor]
  • Spring 2022: Introduction to Software Engineering and Development (SE 181) @ Drexel University [Instructor]
  • Fall 2021: Introduction to Software Engineering and Development (SE 181) @ Drexel University [Instructor]
  • Summer 2019: Introduction to Computer Science II (CISC 181) @ University of Delaware [Instructor]
  • Fall 2018: Intro to Computer Science Research (CISC 367) @ University of Delaware [Substitute Instructor]
  • Spring 2018: Communication Skills for CS Researchers (CISC 667) @ University of Delaware [Substitute Instructor]
  • Fall 2017: Advanced Software Systems: Text Analysis for Software Engineering (CISC 879) @ University of Delaware [Substitute Instructor]
  • Spring 2016: Advanced Web Technologies (CISC 474) @ University of Delaware [Teaching Assistant]
  • Fall 2015: Web Applications using Computer Science (CISC 103) @ University of Delaware [Teaching Assistant]
  • Spring 2015: General Computer Science for Engineers (CISC 106) @ University of Delaware [Teaching Assistant]
  • Fall 2014: Introduction to Computer Science II (CISC 181) @ University of Delaware [Teaching Assistant]

  • Service

    Academic Service

    A subset of my Academic service is available as a snapshot on my conf.researchr.org profile

  • Organizing Committee:
    • Journal‐first Co‐Chair, 32nd IEEE/ACM Intl. Conf. on Program Comprehension (ICPC 2024)
    • Mining Challenge Co‐Chair, 21st Intl. Conf. on Mining Software Repositories (MSR 2024)
    • NIER PC co‐Chair, 23nd IEEE Intl. Conf. on Source Code Analysis and Manipulation (SCAM 2023)
    • PC Co-Chair, 3rd International Workshop on Software Engineering and AI for Data Quality in Cyber‐Physical Systems/Internet of Things (SEA4DQ 2023)
    • Diversity and Inclusion co-Chair, International Conference on Mining Software Repositories (MSR 2023)
    • PC Co-Chair, 1st International Workshop on Recruiting Participants for Empirical SE (RoPES 2022)
    • Editorial Board, Journal of Systems and Software (JSS), 2021 - Present
    • Conference social media chair for the International Conference on Mining Software Repositories (MSR 2020, 2022)
  • Program Committee Member:
    • International Conference on Software Engineering (ICSE 2025 - Technical Track)
    • The Foundations of Software Engineering (FSE 2025 - Technical Track)
    • International Conference on Software Engineering (ICSE 2024 - Technical Track)
    • The Foundations of Software Engineering (ESEC/FSE 2024 - Technical Track)
    • International Conference on Mining Software Repositories (MSR 2024 - Technical Track)
    • The Foundations of Software Engineering (ESEC/FSE 2023 - Technical Track) -- Distinguished Reviewer
    • International Conference on Software Engineering (ICSE 2023 - SEIP Track)
    • International Conference on Mining Software Repositories (MSR 2023 - Technical Track)
    • International Workshop on Natural Language-based Software Engineering (NLBSE 2023)
    • International Conference on Mining Software Repositories (MSR 2022 - Technical Track)
    • IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER 2022 - ERA Track)
    • International Conference on Software Engineering (ICSE 2022 - SEET Track)
    • International Conference on Software Maintenance and Evolution (ICSME 2021 - Tool Demo Track)
    • International Conference on Mining Software Repositories (MSR 2021 - Mining Challenge Track)
    • IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER 2021 - ERA Track)
  • Journal Reviewer:
    • Empirical Software Engineering (EMSE)
    • Transactions on Software Engineering and Methodology (TOSEM)
    • Transactions on Software Engineering (TSE)
    • Journal of Systems and Software (JSS)

    Diversity and Outreach Activites

  • Invited Speaker, ACM‐W Alumni Panel, University of Delaware
  • Faculty Volunteer, Women in Tech Reception, Drexel University
  • Founder and Chair, University of Delaware ACM-W Student Chapter (2016-2017)
  • Participant, Grace Hopper Celebration of Women in Computing (2020)
  • Travel Graduate Mentor from University of Delaware, Grace Hopper Celebration of Women in Computing (2015)
  • Participant, Computing Research Association-W Grad Cohort Workshop (2015 and 2017)
  • Technical Administrator, Indian Graduate Student Association (IGSA), University of Delaware (2015)
  • Professional Memberships and Affiliations

  • Member, Association for Computing Machinery, Special Interest Group on Software Engineering (ACM-SIGSOFT)
  • Member, Association for Computing Machinery, Women (ACM-W)
  • Member, Association for Computing Machinery (ACM)

  • Students

    Current Students

  • Ramtin Ehsani, (2023-Present), Ph.D. Student, Drexel University
  • Amirali Sajadi (2022-Present), Ph.D. Student, Drexel University
  • Sakshi Pathak (WI'24-Present), M.S. Student, Drexel University
  • Binh Le (SP'24-Present), Undergraduate Student, Drexel University
  • Anh Nguyen (SP'24-Present), Undergraduate Student, Drexel University
  • Former Students

  • Giles Odigwe (SP'24), Undergraduate Student, Drexel University
  • Mustafa Bookwala (WI'24), Undergraduate Student, Drexel University
  • Shyamal Mishra (SU'23), M.S. Student, Drexel University
  • Vanessa Martinez (WI'23), Undergraduate Student, Drexel University
  • Thomas Do (WI'22), Undergraduate Student, Drexel University
  • Yashasvi Jain (2021-2022), Undergraduate Student, Drexel University
  • Brian Phillips (2019-2020), Undergraduate Student, University of Delaware
  • Humpher Owusu (2019-2020), Undergraduate Student, University of Delaware
  • Kevin Mason (2019-2020), Undergraduate Student, University of Delaware
  • Minji Kong (2018), Undergraduate Student, University of Delaware
  • Qilin Ma (2017), Undergraduate Student, University of Delaware
  • Benjamin Gause (2016), Undergraduate Student, University of Delaware
  • Hunter Hedinger (2016), Undergraduate Student, University of Delaware
  • Prospective Students

    I am generally on the lookout for highly motivated students (at all levels - undergrad, MS, and PhD) with strong academic background to work at the intersection of software engineering, machine learning, and natural language processing at Drexel University.

    Qualifications: An ideal candidate has strong programming skills, communication/writing skills, and willingness to learn. Experience in software engineering and machine learning research are a plus.

    How to Apply: Please submit the following documents via email to preetha.chatterjee@drexel.edu under the subject “Potential Student Application”.
  • Brief cover letter including: your research interests, outline of previous research experience, preferred start date
  • Your current resume/CV (including major accomplishments e.g., projects, publications, awards, etc.)
  • One or two references that I can contact for a letter of reference (e.g., previous supervisors, instructors)
  • Unofficial Transcripts
  • Sample publications (if any)
  • I encourage you to include links to any projects/software that you have worked on. The review of applications will begin immediately and will continue until the positions are filled. I will carefully go through all the applications, and contact potentially eligible candidates for a brief interview (via Zoom).

    Resources For Applying To Drexel University: All PhD students are fully supported with an assistantship in the Computer Science PhD program at Drexel University. Assistantships may be in the form of research, teaching or a combination of the two. These assistantships carry appropriate stipend, tuition remission, and subsidized health insurance. PhD admissions are rolling until department closes review (no hard deadlines). If you are thinking about applying to the Ph.D. program at Drexel University, I have included some resources:
  • Drexel University Graduate Program Admissions
  • Drexel University PhD in Computer Science Admissions and Requirements
  • If you are already a student at Drexel University, feel free to email me to discuss potential research opportunities.