Preetha Chatterjee

3675 Market Street, Office 1160, Philadelphia, PA 19104
Email: preetha[dot]chatterjee[at]drexel[dot]edu

Preetha Chatterjee is an Assistant Professor in the Department of Computer Science at Drexel University (Fall 2021). She graduated with her M.S. in Computer Science from University of Delaware in 2016, and her Ph.D. in Computer Science from University of Delaware in 2021, advised by Dr. Lori Pollock . Her research interests are primarily in software engineering, with an emphasis on improving software engineers’ tools and environments through data mining, text analysis and machine learning. She is especially interested in mining software repositories at a large scale, extending data analytics solutions to transform the plethora of information available in software artifacts into actionable nuggets of knowledge and tools, useful for both software engineers and researchers. Through her research, she intends to enable advances in areas including building/enhancing recommender systems for developers, information retrieval tasks from unstructured developer communications, and virtual assistants for software engineers.

I am actively looking for motivated PhD students to work on Software Analytics. If you are enthusiastic about solving real-life Software Engineering problems, please get in touch with me via email. Detailed instructions [here]

Latest News!

  • September 2021: Started as an Assistant Professor at Drexel University
  • April 2021: Received Frank A. Pehrson Graduate Student Award for Outstanding Computer Science Research from the Department of Computer and Information Science, University of Delaware, in recognition of outstanding performance and future potential in the field
  • March 2021: Successfully defended my Ph.D. dissertation on “Mining Information from Developer Chats Towards Building Software Maintenance Tools” at University of Delaware
  • March 2021: I will be joining as an Assistant Professor of Computer Science in the Department of Computer Science at Drexel University from Fall 2021!
  • March 2021: Journal paper “Help with Programming Errors: An Exploratory Study of Novice Software Engineers’ Focus in Stack Overflow Posts” selected for the Journal of Systems and Software (JSS) Happy Hour
  • February 2021: Journal paper “Automatically Identifying the Quality of Developer Chats for Post Hoc Use” accepted in the Transactions on Software Engineering and Methodology (TOSEM)
  • December 2020: Technical paper “Automatic Extraction of Opinion-based Q&A from Online Developer Chats” accepted in Proceedings of the 43rd International Conference on Software Engineering (ICSE 2021)

  • Research

    Brief Overview:

    Integrated development environments today include sophisticated program modeling and analyses behind the scenes to support the developer in navigating, understanding, and modifying their code. While much can be learned from the results of static and dynamic analysis of their source code, developers also look to others for advice and learning. As software development teams are more globally distributed and the open source community has grown, developers rely increasingly on written documents for help they might have previously obtained through in-person conversations. My research activities cover (a) conducting empirical studies to analyze the information in written documents of software archives, and (b) designing techniques to mine useful information from the software archives which could be used in building/improving software maintenance and evolution tools. Some of my past research projects are:

  • Learning about Code Snippet Characteristics in Software Artifacts: Large corpora of software-related artifacts (e.g., blogs, bug reports, emails) offer the unique opportunity to learn from developers’ discussion about code snippets. We conducted an empirical study of 12 types of artifacts to investigate: 1) characteristics of the embedded code snippets, 2) kinds of information available across all artifacts, and their frequency and distribution of availability, and 3) textual cues that indicate code-related information, and how the cues differ across artifacts.
    [SANER'17]
  • Studying Developer Focus on Question and Answer (Q&A) Forums : Although popular Q&A forums such as Stack Overflow serve as a good knowledge resource, the abundance of information can cause developers to spend considerable time in identifying relevant answers and suitable fixes. We conducted an exploratory study to understand how novice software engineers direct their efforts and what kinds of information they focus on within a Stack Overflow post. We qualitatively analyzed the software engineers’ perceptions and annotations from a survey involving 400 Stack Overflow posts related to errors and exceptions in Java and C++.
    [JSS'19]
  • Mining Source Code Descriptions from Research Articles: Digital libraries of computer science research articles can be a rich source for code examples that are used to motivate or explain particular concepts or issues. We designed a technique to automatically identify natural language descriptions of code segments embedded within articles. Extracting these natural language descriptions alongside code will enable new advances in areas including code-based search, automatic code comment generation, and documentation generation.
    [MSR'17]
  • Mining Information from Developer Chat Conversations Towards Building Software Maintenance Tools Popular chat platforms such as Slack host public chat communities that focus on specific software development topics such as Python or Ruby-on-Rails. Many of those chat communications contain valuable information, such as description of code snippets and APIs, opinions on good programming practices, and causes of common errors/exceptions. This project aims to develop analyses for automatically identifying and extracting information in developers’ chat communications towards improving and building new tools to support software engineers. Specifically, (a) We conducted an exploratory study into the potential usefulness and challenges of mining developer Q&A format chat conversations for supporting software maintenance and evolution tools. (b) We created and published an openly available dataset of software-related chat conversations. (c) We designed approaches towards automatically analyzing the quality of information in chats using supervised machine learning techniques and natural language analysis. (d) We developed automatic techniques to extract opinion-based questions and answers from chats using deep learning architectures.
    [MSR'19] [MSR'20] [TOSEM'21] [ICSE'21]


    Publications:

    Conference Publications.

  • Automatic Extraction of Opinion-based Q&A from Online Developer Chats
    Preetha Chatterjee, Kostadin Damevski, and Lori Pollock
    The 43rd International Conference on Software Engineering (ICSE), Technical Track, May 2021.

    Preprint DOI Slides

  • Software-related Slack Chats with Disentangled Conversations
    Preetha Chatterjee, Kostadin Damevski, Nicholas A. Kraft, and Lori Pollock
    The 17th International Conference on Mining Software Repositories (MSR), Data Showcase Track, Oct 2020. Seoul, South Korea

    Preprint DOI Dataset Slides

  • Extracting Archival-Quality Information from Software-Related Chats
    Preetha Chatterjee
    The 42nd International Conference on Software Engineering (ICSE), Doctoral Symposium Track, Oct 2020. Seoul, South Korea

    Preprint DOI Slides

  • Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineering Tools
    Preetha Chatterjee, Kostadin Damevski, Lori Pollock, Vinay Augustine, and Nicholas A. Kraft
    The 16th International Conference on Mining Software Repositories (MSR), Research Track, May 2019. Montreal, Canada

    Preprint DOI Slides Press Coverage

  • Extracting Code Segments and Their Descriptions from Research Articles
    Preetha Chatterjee, Benjamin Gause, Hunter Hedinger, and Lori Pollock
    The 14th International Conference on Mining Software Repositories (MSR), Research Track, May 2017. Buenos Aires, Argentina

    Preprint DOI Slides

  • What Information about Code Snippets Is Available in Different Software-Related Documents? An Exploratory Study
    Preetha Chatterjee, Manziba Akanda Nishi, Kostadin Damevski, Vinay Augustine, Lori Pollock, and Nicholas A. Kraft
    The 24th IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER), Early Research Achievements Track, Feb 2017. Klagenfurt, Austria

    Preprint DOI

  • Journal Publications.

  • Automatically Identifying the Quality of Developer Chats for Post Hoc Use
    Preetha Chatterjee, Kostadin Damevski, Nicholas A. Kraft, and Lori Pollock
    Transactions on Software Engineering and Methodology (TOSEM), Feb2021

    Preprint DOI

  • Finding Help with Programming Errors: An Exploratory Study of Novice Software Engineers’ Focus in Stack Overflow Posts
    Preetha Chatterjee, Minji Kong, Lori Pollock
    Journal of Systems and Software (JSS), Research Paper, Jan 2020.

    Preprint DOI Slides

  • Other.

  • Mining Information from Developer Chats Towards Building Software Maintenance Tools (Ph.D. Thesis)
    Preetha Chatterjee
    University of Delaware

    Manuscript

  • Exploring the Generality of a Java-based Loop Action Model for the Quorum Programming Language (Ph.D. Preliminary Project)
    Preetha Chatterjee
    University of Delaware

    Manuscript


  • Selected Talks:

    Automatic Extraction of Opinion-based Q&A from Online Developer Chats, 43nd International Conference on Software Engineering (ICSE 21).

    Finding Help with Programming Errors: An Exploratory Study of Novice Software Engineers’ Focus in Stack Overflow Posts, Journal of Systems and Software (JSS 21) Happy Hour.

    Software-related Slack Chats with Disentangled Conversations, 17th International Conference on Mining Software Repositories (MSR 20).


    Teaching

  • Fall 2021: Introduction to Software Engineering and Development (SE 181) @ Drexel University [Instructor]
  • Summer 2019: Introduction to Computer Science II (CISC 181) @ University of Delaware [Instructor]
  • Fall 2018: Intro to Computer Science Research (CISC 367) @ University of Delaware [Substitute Instructor]
  • Spring 2018: Communication Skills for CS Researchers (CISC 667) @ University of Delaware [Substitute Instructor]
  • Fall 2017: Advanced Software Systems: Text Analysis for Software Engineering (CISC 879) @ University of Delaware [Substitute Instructor]
  • Spring 2016: Advanced Web Technologies (CISC 474) @ University of Delaware [Teaching Assistant]
  • Fall 2015: Web Applications using Computer Science (CISC 103) @ University of Delaware [Teaching Assistant]
  • Spring 2015: General Computer Science for Engineers (CISC 106) @ University of Delaware [Teaching Assistant]
  • Fall 2014: Introduction to Computer Science II (CISC 181) @ University of Delaware [Teaching Assistant]

  • Service

    Academic Service

  • Program Committee Member:
    • International Conference on Mining Software Repositories (MSR 2022 - Technical Track)
    • IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER 2022 - ERA Track)
    • International Conference on Software Engineering (ICSE 2022 - SEET Track)
    • International Conference on Software Maintenance and Evolution (ICSME 2021 - Tool Demo Track)
    • International Conference on Mining Software Repositories (MSR 2021 - Mining Challenge Track)
    • IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER 2021 - ERA Track)
  • Organizing Committee:
    • Journal Publicity Chair, for Journal of Systems and Software (JSS), 2021 - Present
    • Conference social media chair for the International Conference on Mining Software Repositories (MSR 2020)
  • External Conference Reviewer:
    • The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (FSE 2017)
  • Journal Reviewer:
    • Empirical Software Engineering (EMSE 2021)
    • Transactions on Software Engineering and Methodology (TOSEM 2021)
  • External Journal Reviewer:
    • Journal of Software: Practice and Experience (2018)

    Diversity and Outreach Activites

  • Founder and Chair, University of Delaware ACM-W Student Chapter (2016-2017)
  • Participant, Grace Hopper Celebration of Women in Computing (2020)
  • Travel Graduate Mentor from University of Delaware, Grace Hopper Celebration of Women in Computing (2015)
  • Participant, Computing Research Association-W Grad Cohort Workshop (2015 and 2017)
  • Technical Administrator, Indian Graduate Student Association (IGSA), University of Delaware (2015)
  • Professional Memberships and Affiliations

  • Member, Association for Computing Machinery, Special Interest Group on Software Engineering (ACM-SIGSOFT)
  • Member, Association for Computing Machinery, Women (ACM-W)
  • Member, Association for Computing Machinery (ACM)

  • Students

    Prospective Students

    I am currently seeking highly motivated students with strong academic background for fully funded positions to work at the intersection of software engineering, machine learning, and natural language processing at Drexel University (starting date Fall 2022). I am seeking students at the doctoral level. If you are enthusiastic about solving real-world Software Engineering problems, please reach out to me (detailed instructions are below).

    Qualifications: An ideal candidate has strong programming skills, communication/writing skills, and willingness to learn. Experience in software engineering and machine learning are a plus.

    How to Apply: Please submit the following documents via email to preetha.chatterjee@drexel.edu under the subject “Potential Student Application”.

  • Brief cover letter including: your research interests, outline of previous research experience, preferred start date
  • Your current resume/CV (including major accomplishments e.g., projects, publications, awards, etc.)
  • One or two references that I can contact for a letter of reference (e.g., previous supervisors, instructors)
  • Unofficial Transcripts
  • Sample publications (if any)
  • I encourage you to include links to any projects/software that you have worked on. The review of applications will begin immediately and will continue until the positions are filled. I will carefully go through all the applications, and contact potentially eligible candidates for a brief interview (via Zoom).

    Resources For Applying To Drexel University: All PhD students are fully supported with an assistantship in the Computer Science PhD program at Drexel University. Assistantships may be in the form of research, teaching or a combination of the two. These assistantships carry appropriate stipend, tuition remission, and subsidized health insurance. PhD admissions are rolling until department closes review (no hard deadlines). If you are thinking about applying to the Ph.D. program at Drexel University, I have included some resources:

  • Drexel University Graduate Program Admissions
  • Drexel University PhD in Computer Science Admissions and Requirements
  • If you are already a student at Drexel University, feel free to email me to discuss potential research opportunities.

    Overview of My Research: My research interests are primarily in software engineering, with an emphasis on improving software engineers tools and environments through data mining, text analysis and machine learning. I am especially interested in mining software repositories at a large scale, extending data analytics solutions to transform the plethora of information available in software artifacts into actionable nuggets of knowledge and tools, useful for both software engineers and researchers. I am also broadly interested in empirical software engineering. For more information, please refer to: My Previous Publications and Presentations

    Past Students

  • Brian Phillips (2019-2020) @University of Delaware, Data analysis and case study of developer conversations on chat forums
  • Humpher Owusu (2019-2020) @University of Delaware, Data analysis and case study of developer conversations on chat forums
  • Kevin Mason (2019-2020) @University of Delaware, Data analysis of developer conversations on chat forums
  • Minji Kong (2018) @University of Delaware, Qualitative study on novice programmers focus on Stack Overflow
  • Qilin Ma (2017) @University of Delaware, Development of Python-based research tool for mining developer discussions on Stack Overflow
  • Benjamin Gause (2016) @University of Delaware, Development of Python-based research tool for mining code descriptions from research articles
  • Hunter Hedinger (2016) @University of Delaware, Development of Python-based research tool for mining code descriptions from research articles