Works (7)

Updated: July 5th, 2023 15:30

2022 article

DebtFree: minimizing labeling cost in self-admitted technical debt identification using semi-supervised learning

Tu, H., & Menzies, T. (2022, April 6). Empirical Software Engineering.

By: H. Tu n & T. Menzies n

author keywords: Technical debt; Semi-supervised learning; Unsupervised learning; Labeling effort
topics (OpenAlex): Software Engineering Research; Software Testing and Debugging Techniques; Software Reliability and Analysis Research
TL;DR: The proposed DebtFree, a two-mode framework based on unsupervised learning for identifying SATDs, can reduce the labeling effort by 99% in mode1 (unlabeled training data), and up to 63% in modes2 (labeledTraining data) while improving the current active learner’s F1 relatively to almost 100%. (via Semantic Scholar)
Source: Web Of Science
Added: April 25, 2022

2022 article

Fair-SSL

Chakraborty, J., Majumder, S., & Tu, H. (2022, May 19).

By: J. Chakraborty n, S. Majumder n & H. Tu n

author keywords: Machine Learning with and for SE; Ethics in Software Engineering
topics (OpenAlex): Ethics and Social Impacts of AI; Intellectual Property and Patents; Digitalization, Law, and Regulation
TL;DR: This is the first SE work where semi-supervised techniques are used to fight against ethical bias in SE ML models, and the clear advantage of Fair-SSL is that it requires only 10% of the labeled training data. (via Semantic Scholar)
Source: Web Of Science
Added: October 3, 2022

2021 article

FRUGAL: Unlocking Semi-Supervised Learning for Software Analytics

Tu, H., & Menzies, T. (2021, November 1). 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 394–406.

By: H. Tu n & T. Menzies n

author keywords: Software Analytics; Data Labelling Efforts; Semi-Supervised Learning
topics (OpenAlex): Software Engineering Research; Software Reliability and Analysis Research; Software Testing and Debugging Techniques
TL;DR: It is asserted that FRUGAL can save considerable effort in data labelling especially in validating prior work or researching new problems, and suggested that proponents of complex and expensive methods should always baseline such methods against simpler and cheaper alternatives. (via Semantic Scholar)
Sources: Web Of Science, NC State University Libraries
Added: April 25, 2022

2021 article

Mining Workflows for Anomalous Data Transfers

Tu, H., Papadimitriou, G., Kiran, M., Wang, C., Mandal, A., Deelman, E., & Menzies, T. (2021, May 1). 2021 IEEE/ACM 18TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR 2021), pp. 1–12.

author keywords: Scientific Workflow; TCP Signatures; Anomaly Detection; Hyper-Parameter Tuning; Sequential Optimization
topics (OpenAlex): Software System Performance and Reliability; Scientific Computing and Data Management; Anomaly Detection Techniques and Applications
TL;DR: X-FLASH is developed, a network anomaly detection tool for faulty TCP workflow transfers that incorporates novel hyperparameter tuning and data mining approaches for improving the performance of the machine learning algorithms to accurately classify the anomalous TCP packets. (via Semantic Scholar)
UN Sustainable Development Goals Color Wheel
UN Sustainable Development Goal Categories
9. Industry, Innovation and Infrastructure (OpenAlex)
Sources: Web Of Science, NC State University Libraries
Added: October 4, 2021

2020 article

Better Data Labelling With EMBLEM (and how that Impacts Defect Prediction)

Tu, H., Yu, Z., & Menzies, T. (2020, April 14). IEEE Transactions on Software Engineering, Vol. 48, pp. 278–294.

By: H. Tu n, Z. Yu n & T. Menzies n

author keywords: Human-in-the-loop AI; data labelling; defect prediction; software analytics
topics (OpenAlex): Software Engineering Research; Software Engineering Techniques and Practices; Software Reliability and Analysis Research
TL;DR: This approach, called EMBLEM, an AI tool first explore the software development process to label commits that are most problematic, and humans then apply their expertise to check those labels (perhaps resulting in the AI updating the support vectors within their SVM learner). (via Semantic Scholar)
Sources: Web Of Science, ORCID, NC State University Libraries
Added: January 11, 2022

2020 article

Identifying Self-Admitted Technical Debts With Jitterbug: A Two-Step Approach

Yu, Z., Fahid, F. M., Tu, H., & Menzies, T. (2020, October 15). IEEE Transactions on Software Engineering, Vol. 48, pp. 1676–1691.

By: Z. Yu*, F. Fahid n, H. Tu n & T. Menzies n

author keywords: Software; Machine learning; Pattern recognition; Training; Computer hacking; Machine learning algorithms; Estimation; Technical debt; software engineering; machine learning; pattern recognition
topics (OpenAlex): Software Engineering Research; Software Reliability and Analysis Research; Advanced Malware Detection Techniques
TL;DR: Jitterbug is proposed, a two-step framework for identifying SATDs that identifies the “easy to find” SATDs automatically with close to 100 percent precision using a novel pattern recognition technique and machine learning techniques are applied to assist human experts in manually identifying the remaining “hard to find (via Semantic Scholar)
Sources: Web Of Science, ORCID, NC State University Libraries
Added: May 17, 2022

2018 article

Is one hyperparameter optimizer enough?

Tu, H., & Nair, V. (2018, November 5).

By: H. Tu n & V. Nair n

author keywords: Defect Prediction; SBSE; Hyperparameter Tuning
topics (OpenAlex): Software Engineering Research; Machine Learning and Data Classification; Software Reliability and Analysis Research
TL;DR: It is concluded that hyperparameter optimization is more nuanced than previously believed and, while such optimization can certainly lead to large improvements in the performance of classifiers used in software analytics, it remains to be seen which specific optimizers should be applied to a new dataset. (via Semantic Scholar)
Source: Web Of Science
Added: April 2, 2019

Citation Index includes data from a number of different sources. If you have questions about the sources of data in the Citation Index or need a set of data which is free to re-distribute, please contact us.

Certain data included herein are derived from the Web of Science© and InCites© (2026) of Clarivate Analytics. All rights reserved. You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.