Suvodeep Majumder

College of Engineering

Works (8)

Updated: March 18th, 2024 08:21

2024 journal article

When less is more: on the value of "co-training" for semi-supervised software defect predictors

EMPIRICAL SOFTWARE ENGINEERING, 29(2).

By: S. Majumder*, J. Chakraborty* & T. Menzies*

author keywords: Semi-supervised learning; SSL; Self-training; Co-training; Boosting methods; Semi-supervised preprocessing; Clustering-based semi-supervised preprocessing; Intrinsically semi-supervised methods; Graph-based methods; Co-forest; Effort aware tri-training
Sources: Web Of Science, NC State University Libraries
Added: March 11, 2024

2023 journal article

A deep learning synthetic likelihood approximation of a non-stationary spatial model for extreme streamflow forecasting

SPATIAL STATISTICS, 55.

author keywords: Climate change; Deep learning; Density regression; Gaussian process; Max -stable processes; Vecchia approximation
TL;DR: A non-stationary process mixture model (NPMM) for annual streamflow maxima over the central US which uses downscaled climate model precipitation projections to forecast extremal streamflow and is flexible with desirable tail dependence properties, but yields an intractable likelihood. (via Semantic Scholar)
UN Sustainable Development Goal Categories
13. Climate Action (Web of Science; OpenAlex)
Sources: Web Of Science, NC State University Libraries
Added: June 19, 2023

2023 journal article

Fair Enough: Searching for Sufficient Measures of Fairness

ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 32(6).

By: S. Majumder n, J. Chakraborty n, G. Bai n, K. Stolee n & T. Menzies n

author keywords: Software fairness; fairness metrics; clustering; theoretical analysis; empirical analysis
TL;DR: This article shows that many of those fairness metrics effectively measure the same thing, and it is no longer necessary (or even possible) to satisfy all fairness metrics. (via Semantic Scholar)
Sources: Web Of Science, ORCID, NC State University Libraries
Added: October 31, 2023

2022 article

Fair-SSL: Building fair ML Software with less data

2022 IEEE/ACM INTERNATIONAL WORKSHOP ON EQUITABLE DATA & TECHNOLOGY (FAIRWARE 2022), pp. 1–8.

By: J. Chakraborty n, S. Majumder n & H. Tu n

author keywords: Machine Learning with and for SE; Ethics in Software Engineering
TL;DR: This is the first SE work where semi-supervised techniques are used to fight against ethical bias in SE ML models, and the clear advantage of Fair-SSL is that it requires only 10% of the labeled training data. (via Semantic Scholar)
Source: Web Of Science
Added: October 3, 2022

2022 article

Methods for Stabilizing Models Across Large Samples of Projects (with case studies on Predicting Defect and Project Health)

2022 MINING SOFTWARE REPOSITORIES CONFERENCE (MSR 2022), pp. 566–578.

By: S. Majumder n, T. Xia n, R. Krishna n & T. Menzies n

author keywords: Defect Prediction; Project Health; Bellwether; Hierarchical Clustering; Random Forest; Two Phase Transfer Learning; Transfer Learning
TL;DR: This paper provides a promising result showing such stable models can be generated using a new transfer learning framework called STABILIZER, and these case studies are the largest demonstration of the generalizability of quantitative predictions of project quality yet reported in the SE literature. (via Semantic Scholar)
Sources: Web Of Science, NC State University Libraries
Added: September 19, 2022

2022 journal article

Revisiting process versus product metrics: a large scale analysis

EMPIRICAL SOFTWARE ENGINEERING, 27(3).

By: S. Majumder n, P. Mody n & T. Menzies n

author keywords: Software engineering; Software process; Process metrics; Product metrics; Developer metrics; Random forest; Logistic regression; Support vector machine; HPO
TL;DR: Prior small-scale results are rechecked and it is found that process metrics are better predictors for defects than product metrics, but it is unwise to trust metric importance results from analytics in-the-small studies since those change dramatically when moving to analytics in the-large. (via Semantic Scholar)
Sources: Web Of Science, NC State University Libraries
Added: April 4, 2022

2021 article

Bias in Machine Learning Software: Why? How? What to Do?

PROCEEDINGS OF THE 29TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '21), pp. 429–440.

By: J. Chakraborty n, S. Majumder n & T. Menzies n

author keywords: Software Fairness; Fairness Metrics; Bias Mitigation
TL;DR: This paper postulates that the root causes of bias are the prior decisions that affect what data was selected and the labels assigned to those examples, and proposes the Fair-SMOTE algorithm, which removes biased labels; and rebalances internal distributions such that based on sensitive attribute, examples are equal in both positive and negative classes. (via Semantic Scholar)
Sources: Web Of Science, ORCID, NC State University Libraries
Added: March 7, 2022

2021 article

Early Life Cycle Software Defect Prediction. Why? How?

2021 IEEE/ACM 43RD INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2021), pp. 448–459.

By: N. Shrikanth n, S. Majumder n & T. Menzies n

author keywords: sampling; early; defect prediction; analytics
TL;DR: It is shown that, at least for learning defect predictors, after the first few months, this may not be true, and researchers should adopt a "simplicity-first" approach to their work. (via Semantic Scholar)
Sources: Web Of Science, NC State University Libraries
Added: September 13, 2021

Citation Index includes data from a number of different sources. If you have questions about the sources of data in the Citation Index or need a set of data which is free to re-distribute, please contact us.

Certain data included herein are derived from the Web of Science© and InCites© (2024) of Clarivate Analytics. All rights reserved. You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.