TY - JOUR TI - Using Docker to Assist Q&A Forum Users AU - Melo, Luis AU - Wiese, Igor AU - drAmorim, Marcelo T2 - IEEE Transactions on Software Engineering AB - Q&A forums are today a valuable tool to assist developers in programming tasks. Unfortunately, contributions to these forums are often unclear and incomplete. Docker is a container solution that enables software developers to encapsulate an operating environment and could help address reproducibility issues. This artile reports on a feasibility study to evaluate if Docker can help improve reproducibility in Stack Overflow. We started surveying Stack Overflow users to understand their perceptions on the proposal of using Docker to reproduce Stack Overflow posts. Participants were critical and mentioned two important aspects: cost and need. To validate their criticism, we conducted an exploratory study focused on understanding how costly the task of creating containers for posts is for developers. Overall, results indicate that the cost of creating containers is not high, especially due to the fact that dockerfiles are highly similar and small. Based on these findings we developed a tool, dubbed Frisk , to assist developers in creating containers for those posts. We then conducted a user study to evaluate interest of Stack Overflow developers on the tool. We found that, on average, users spent nearly ten minutes interacting with Frisk and that 45.3% of the 563 Frisk sessions we created for existing posts resulted in a successful access to the corresponding web service by the owners of the post. Overall, this artile provides early evidence that the use of Docker in Q&A forums should be encouraged for configuration-related posts. DA - 2021/11/1/ PY - 2021/11/1/ DO - 10.1109/TSE.2019.2956919 UR - https://doi.org/10.1109/TSE.2019.2956919 ER - TY - JOUR TI - Exposing bugs in JavaScript engines through test transplantation and differential testing AU - Lima, Igor AU - Silva, Jefferson AU - Miranda, Breno AU - Pinto, Gustavo AU - d’Amorim, Marcelo T2 - Software Quality Journal AB - JavaScript is a popular programming language today with several implementations competing for market dominance. Although a specification document and a conformance test suite exist to guide engine development, bugs occur and have important practical consequences. Implementing correct engines is challenging because the spec is intentionally incomplete and evolves frequently. This paper investigates the use of test transplantation and differential testing for revealing functional bugs in JavaScript engines. The former technique runs the regression test suite of a given engine on another engine. The latter technique fuzzes existing inputs and then compares the output produced by different engines with a differential oracle. We conducted experiments with engines from five major players—Apple, Facebook, Google, Microsoft, and Mozilla—to assess the effectiveness of test transplantation and differential testing. Our results indicate that both techniques revealed several bugs, many of which are confirmed by developers. We reported 35 bugs with test transplantation (23 of these bugs confirmed and 19 fixed) and reported 24 bugs with differential testing (17 of these confirmed and 10 fixed). Results indicate that most of these bugs affected two engines—Apple’s JSC and Microsoft’s ChakraCore (24 and 26 bugs, respectively). To summarize, our results show that test transplantation and differential testing are easy to apply and very effective in finding bugs in complex software, such as JavaScript engines. DA - 2021/3// PY - 2021/3// DO - 10.1007/s11219-020-09537-8 UR - https://doi.org/10.1007/s11219-020-09537-8 ER - TY - JOUR TI - Old but Gold: Reconsidering the value of feedforward learners for software analytics AU - Yedida, R. AU - Yang, X. AU - Menzies, T. T2 - arXiv AB - There has been an increased interest in the use of deep learning approaches for software analytics tasks. State-of-the-art techniques leverage modern deep learning techniques such as LSTMs, yielding competitive performance, albeit at the price of longer training times. Recently, Galke and Scherp [18] showed that at least for image recognition, a decades-old feedforward neural network can match the performance of modern deep learning techniques. This motivated us to try the same in the SE literature. Specifically, in this paper, we apply feedforward networks with some preprocessing to two analytics tasks: issue close time prediction, and vulnerability detection. We test the hypothesis laid by Galke and Scherp [18], that feedforward networks suffice for many analytics tasks (which we call, the "Old but Gold" hypothesis) for these two tasks. For three out of five datasets from these tasks, we achieve new high-water mark results (that out-perform the prior state-of-the-art results) and for a fourth data set, Old but Gold performed as well as the recent state of the art. Furthermore, the old but gold results were obtained orders of magnitude faster than prior work. For example, for issue close time, old but gold found good predictors in 90 seconds (as opposed to the newer methods, which took 6 hours to run). Our results supports the "Old but Gold" hypothesis and leads to the following recommendation: try simpler alternatives before more complex methods. At the very least, this will produce a baseline result against which researchers can compare some other, supposedly more sophisticated, approach. And in the best case, they will obtain useful results that are as good as anything else, in a small fraction of the effort. To support open science, all our scripts and data are available on-line at https://github.com/fastidiouschipmunk/simple. DA - 2021/// PY - 2021/// DO - 10.48550/arxiv.2101.06319 UR - http://www.scopus.com/inward/record.url?eid=2-s2.0-85171057311&partnerID=MN8TOARS ER - TY - CONF TI - Lessons learned from hyper-parameter tuning for microservice candidate identification AU - Yedida, R. AU - Krishna, R. AU - Kalia, A. AU - Menzies, T. AU - Xiao, J. AU - Vukovic, M. AB - When optimizing software for the cloud, monolithic applications need to be partitioned into many smaller microservices. While many tools have been proposed for this task, we warn that the evaluation of those approaches has been incomplete; e.g. minimal prior exploration of hyperparameter optimization. Using a set of open source Java EE applications, we show here that (a) such optimization can significantly improve microservice partitioning; and that (b) an open issue for future work is how to find which optimizer works best for different problems. To facilitate that future work, see https://github.com/yrahul3910/ase-tuned-mono2micro for a reproduction package for this research. C2 - 2021/// C3 - Proceedings - 2021 36th IEEE/ACM International Conference on Automated Software Engineering, ASE 2021 DA - 2021/// DO - 10.1109/ASE51524.2021.9678704 SP - 1141-1145 UR - http://www.scopus.com/inward/record.url?eid=2-s2.0-85125444531&partnerID=MN8TOARS ER - TY - JOUR TI - Lessons learned from hyper-parameter tuning for microservice candidate identification AU - Yedida, R. AU - Krishna, R. AU - Kalia, A. AU - Menzies, T. AU - Xiao, J. AU - Vukovic, M. T2 - arXiv AB - When optimizing software for the cloud, monolithic applications need to be partitioned into many smaller *microservices*. While many tools have been proposed for this task, we warn that the evaluation of those approaches has been incomplete; e.g. minimal prior exploration of hyperparameter optimization. Using a set of open source Java EE applications, we show here that (a) such optimization can significantly improve microservice partitioning; and that (b) an open issue for future work is how to find which optimizer works best for different problems. To facilitate that future work, see [https://github.com/yrahul3910/ase-tuned-mono2micro](https://github.com/yrahul3910/ase-tuned-mono2micro) for a reproduction package for this research. DA - 2021/// PY - 2021/// DO - 10.48550/arxiv.2106.06652 UR - http://www.scopus.com/inward/record.url?eid=2-s2.0-85170462197&partnerID=MN8TOARS ER - TY - JOUR TI - Crowdsourcing the state of the art(ifacts) AU - Baldassarre, M.T. AU - Ernst, N. AU - Hermann, B. AU - Menzies, T. AU - Yedida, R. T2 - arXiv DA - 2021/// PY - 2021/// DO - 10.48550/arxiv.2108.06821 UR - http://www.scopus.com/inward/record.url?eid=2-s2.0-85170893555&partnerID=MN8TOARS ER - TY - JOUR TI - An Expert System for Redesigning Software for Cloud Applications AU - Yedida, R. AU - Krishna, R. AU - Kalia, A. AU - Menzies, T. AU - Xiao, J. AU - Vukovic, M. T2 - arXiv AB - Cloud-based software has many advantages. When services are divided into many independent components, they are easier to update. Also, during peak demand, it is easier to scale cloud services (just hire more CPUs). Hence, many organizations are partitioning their monolithic enterprise applications into cloud-based microservices. Recently there has been much work using machine learning to simplify this partitioning task. Despite much research, no single partitioning method can be recommended as generally useful. More specifically, those prior solutions are "brittle"; i.e. if they work well for one kind of goal in one dataset, then they can be sub-optimal if applied to many datasets and multiple goals. In order to find a generally useful partitioning method, we propose DEEPLY. This new algorithm extends the CO-GCN deep learning partition generator with (a) a novel loss function and (b) some hyper-parameter optimization. As shown by our experiments, DEEPLY generally outperforms prior work (including CO-GCN, and others) across multiple datasets and goals. To the best of our knowledge, this is the first report in SE of such stable hyper-parameter optimization. To aid reuse of this work, DEEPLY is available on-line at https://bit.ly/2WhfFlB. DA - 2021/// PY - 2021/// DO - 10.48550/arXiv.2109.14569 UR - http://www.scopus.com/inward/record.url?eid=2-s2.0-85134109536&partnerID=MN8TOARS ER - TY - CONF TI - SCIFFS: Enabling Secure Third-Party Security Analytics using Serverless Computing AU - Polinsky, Isaac AU - Datta, Pubali AU - Bates, Adam AU - Enck, William C2 - 2021/// C3 - Proceedings of the 26th ACM Symposium on Access Control Models and Technologies DA - 2021/// SP - 175-186 ER - TY - CONF TI - Role-Based Deception in Enterprise Networks AU - Anjum, Iffat AU - Zhu, Mu AU - Polinsky, Isaac AU - Enck, William AU - Reiter, Michael K AU - Singh, Munindar P C2 - 2021/// C3 - Proceedings of the Eleventh ACM Conference on Data and Application Security and Privacy DA - 2021/// SP - 65-76 ER - TY - CONF TI - PolyScope: Multi-Policy Access Control Analysis to Compute Authorized Attack Operations in Android Systems AU - Lee, Yu-Tsung AU - Enck, William AU - Chen, Haining AU - Vijayakumar, Hayawardh AU - Li, Ninghui AU - Qian, Zhiyun AU - Wang, Daimeng AU - Petracca, Giuseppe AU - Jaeger, Trent C2 - 2021/// C3 - 30th ${$USENIX$}$ Security Symposium (${$USENIX$}$ Security 21) DA - 2021/// ER - TY - CONF TI - TYPOS: A Computer Science Exercise Platform AU - Gaweda, Adam M. AU - Lynch, Collin F. T2 - Symposium on Computer Science Education C2 - 2021/// C3 - Symposium on Computer Science Education DA - 2021/// PY - 2021/// ER - TY - CONF TI - On the Limited Impact of Visualizing Encryption: Perceptions of E2E Messaging Security AU - Stransky, Christian AU - Wermke, Dominik AU - Schrader, Johanna AU - Huaman, Nicolas AU - Acar, Yasemin AU - Fehlhaber, Anna Lena AU - Wei, Miranda AU - Ur, Blase AU - Fahl, Sascha C2 - 2021/// C3 - Seventeenth Symposium on Usable Privacy and Security (SOUPS 2021) DA - 2021/// SP - 437-454 ER - TY - CONF TI - Never ever or no matter what: Investigating Adoption Intentions and Misconceptions about the Corona-Warn-App in Germany AU - Häring, Maximilian AU - Gerlitz, Eva AU - Tiefenau, Christian AU - Smith, Matthew AU - Wermke, Dominik AU - Fahl, Sascha AU - Acar, Yasemin C2 - 2021/// C3 - In Proceedings of Seventeenth Symposium on Usable Privacy and Security (SOUPS 2020) DA - 2021/// ER - TY - CONF TI - A Large-Scale Interview Study on Information Security in and Attacks against Small and Medium-sized Enterprises AU - Huaman, Nicolas AU - Skarczinski, Bennet AU - Stransky, Christian AU - Wermke, Dominik AU - Acar, Yasemin AU - Dreißigacker, Arne AU - Fahl, Sascha C2 - 2021/// C3 - 30th USENIX Security Symposium (USENIX Security 21) DA - 2021/// ER - TY - JOUR TI - Student Practice Sessions Modeled as ICAP Activity Silos. AU - Gaweda, Adam M AU - Lynch, Collin F T2 - International Educational Data Mining Society DA - 2021/// PY - 2021/// ER - TY - CONF TI - SQLRepair: Identifying and Repairing Mistakes in Student-Authored SQL Queries AU - Presler-Marshall, Kai AU - Heckman, Sarah AU - Stolee, Kathryn T T2 - IEEE C2 - 2021/// C3 - 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering Education and Training (ICSE-SEET) DA - 2021/// SP - 199-210 ER - TY - CONF TI - PEDI-Piazza Explorer Dashboard for Intervention AU - Akintunde, Ruth Okoilu AU - Limke, Ally AU - Barnes, Tiffany AU - Heckman, Sarah AU - Lynch, Collin T2 - IEEE Computer Society C2 - 2021/// C3 - 2021 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) DA - 2021/// SP - 1-4 ER - TY - CONF TI - Online Vs Face-to-face Web-development Course: Course Strategies, Learning, and Engagement AU - Basu, Debarati AU - Heckman, Sarah AU - Maher, Mary Lou C2 - 2021/// C3 - Proceedings of the 52nd ACM Technical Symposium on Computer Science Education DA - 2021/// SP - 1191-1197 ER - TY - JOUR TI - Automatically Classifying Student Help Requests: A Multi-Year Analysis. AU - Gao, Zhikai AU - Lynch, Collin AU - Heckman, Sarah AU - Barnes, Tiffany T2 - International Educational Data Mining Society DA - 2021/// PY - 2021/// ER - TY - JOUR TI - A Systematic Literature Review of Empiricism and Norms of Reporting in Computing Education Research Literature AU - Heckman, Sarah AU - Carver, Jeffrey C AU - Sherriff, Mark AU - Al-Zubidy, Ahmed T2 - ACM Transactions on Computing Education (TOCE) DA - 2021/// PY - 2021/// VL - 22 IS - 1 SP - 1-46 ER - TY - CONF TI - Toward Real-Time Guaranteed Scheduling for Autonomous Driving Systems AU - Sun, Jinghao AU - Wang, Tianyi AU - Duan, Kailu AU - Lu, Bin AU - Ren, Jiankang AU - Guo, Zhishan AU - Tan, Guozhen T2 - 42nd IEEE Real-Time Systems Symposium (RTSS 2021), Industry Challenge C2 - 2021/12// C3 - Proceedings of the 42nd IEEE Real-Time Systems Symposium (RTSS 2021), Industry Challenge CY - Dortmund, DE DA - 2021/12// PY - 2021/12/7/ ER - TY - ER - TY - ER - TY - ER - TY - ER - TY - ER - TY - ER - TY - ER - TY - JOUR TI - Mixed-criticality real-time scheduling of gang task systems AU - Bhuiyan, Ashikahmed AU - Yang, Kecheng AU - Arefin, Samsil AU - Saifullah, Abusayeed AU - Guan, Nan AU - Guo, Zhishan T2 - Real-Time Systems DA - 2021/5/23/ PY - 2021/5/23/ DO - 10.1007/s11241-021-09368-1 VL - 57 IS - 3 SP - 268-301 J2 - Real-Time Syst LA - en OP - SN - 0922-6443 1573-1383 UR - http://dx.doi.org/10.1007/s11241-021-09368-1 DB - Crossref ER - TY - JOUR TI - A network analysis of cross-occupational skill transferability for the hospitality industry AU - Huang, Arthur Yan AU - Fisher, Tyler AU - Ding, Huiling AU - Guo, Zhishan T2 - International Journal of Contemporary Hospitality Management AB - Purpose This paper aims to examine transferable skills and viable career transition pathways for hospitality and tourism workers. Future career prospects are discussed, along with the importance of reskilling for low-wage hospitality workers. Design/methodology/approach A network analysis is conducted to model skill relationships between the hospitality industry and other industries such as health-care and information technology. Multiple data are used in the analysis, including data from the US Department of Labor Occupational Information Network (O*NET), wage data from the Bureau of Labor Statistics and job computerization data (Frey and Osborne, 2017). Findings Although hospitality workers have lower than average skills scores when compared to workers from other career clusters included in the analysis, they possess essential soft skills that are valuable in other industries. Therefore, improving hospitality workers’ existing soft skills may help them enhance their cross-sector mobility, which may allow them to obtain jobs with a lower likelihood of computerization. Practical implications The findings shed light on workforce development theories and practice in the hospitality industry by quantitatively analyzing cross-sector skill correlations. Sharpening transferable soft skills will be essential to enhancing hospitality workers’ career development opportunities. Originality/value To the best of the authors’ knowledge, this is the first study that specifically examines the skill taxonomy for the hospitality industry and identifies its connection with other in-demand career clusters. DA - 2021/9/29/ PY - 2021/9/29/ DO - 10.1108/ijchm-01-2021-0073 VL - 33 IS - 12 SP - 4215-4236 J2 - IJCHM LA - en OP - SN - 0959-6119 0959-6119 UR - http://dx.doi.org/10.1108/ijchm-01-2021-0073 DB - Crossref ER - TY - JOUR TI - Sampling Sparse Representations with Randomized Measurement Langevin Dynamics AU - Wang, Kafeng AU - Xiong, Haoyi AU - Bian, Jiang AU - Zhu, Zhanxing AU - Gao, Qian AU - Guo, Zhishan AU - Xu, Cheng-Zhong AU - Huan, Jun AU - Dou, Dejing T2 - ACM Transactions on Knowledge Discovery from Data AB - Stochastic Gradient Langevin Dynamics (SGLD) have been widely used for Bayesian sampling from certain probability distributions, incorporating derivatives of the log-posterior. With the derivative evaluation of the log-posterior distribution, SGLD methods generate samples from the distribution through performing as a thermostats dynamics that traverses over gradient flows of the log-posterior with certainly controllable perturbation. Even when the density is not known, existing solutions still can first learn the kernel density models from the given datasets, then produce new samples using the SGLD over the kernel density derivatives. In this work, instead of exploring new samples from kernel spaces, a novel SGLD sampler, namely, Randomized Measurement Langevin Dynamics (RMLD) is proposed to sample the high-dimensional sparse representations from the spectral domain of a given dataset. Specifically, given a random measurement matrix for sparse coding, RMLD first derives a novel likelihood evaluator of the probability distribution from the loss function of LASSO, then samples from the high-dimensional distribution using stochastic Langevin dynamics with derivatives of the logarithm likelihood and Metropolis–Hastings sampling. In addition, new samples in low-dimensional measuring spaces can be regenerated using the sampled high-dimensional vectors and the measurement matrix. The algorithm analysis shows that RMLD indeed projects a given dataset into a high-dimensional Gaussian distribution with Laplacian prior, then draw new sparse representation from the dataset through performing SGLD over the distribution. Extensive experiments have been conducted to evaluate the proposed algorithm using real-world datasets. The performance comparisons on three real-world applications demonstrate the superior performance of RMLD beyond baseline methods. DA - 2021/2/10/ PY - 2021/2/10/ DO - 10.1145/3427585 VL - 15 IS - 2 SP - 1-21 J2 - ACM Trans. Knowl. Discov. Data LA - en OP - SN - 1556-4681 1556-472X UR - http://dx.doi.org/10.1145/3427585 DB - Crossref ER - TY - JOUR TI - CRLEDD: Regularized Causalities Learning for Early Detection of Diseases Using Electronic Health Record (EHR) Data AU - Bian, Jiang AU - Yang, Sijia AU - Xiong, Haoyi AU - Wang, Licheng AU - Fu, Yanjie AU - Sun, Zeyi AU - Guo, Zhishan T2 - IEEE Transactions on Emerging Topics in Computational Intelligence AB - The availability of Electronic Health Records (EHR) in health care settings has provided tremendous opportunities for early disease detection. While many supervised learning models have been adopted for EHR-based disease early detection, the ill-posed inverse problem in the parameter learning has imposed a significant challenge on improving the accuracy of these algorithms. In this paper, we propose CRLEDD - Causality-Regularized Learning for Early Detection of Disease, an algorithm to improve the performance of Linear Discriminant Analysis (LDA) on top of diagnosis-frequency vector data representation. While most existing regularization methods exploit sparsity regularization to improve detection performance, CRLEDD provides a unique perspective by ensuring positive semi-definiteness of the sparsified precision matrix used in LDA which is different from the regular regularization method (e.g., L2 regularization). To achieve this goal, CRLEDD employs Graphical Lasso to estimate the precision matrix in the ill-posed settings for enhanced accuracy of LDA classifiers. We perform extensive evaluation of CRLEDD using a large-scale real-world EHR dataset to predict mental health disorders (e.g., depression and anxiety) of college students from 10 universities in the U.S. We compare CRLEDD with other regularized LDA and downstream classifiers. The result shows that CRLEDD outperforms all baselines in terms of accuracy and F1 scores. DA - 2021/8// PY - 2021/8// DO - 10.1109/TETCI.2020.3010017 UR - https://doi.org/10.1109/TETCI.2020.3010017 ER - TY - JOUR TI - COMO: Efficient Deep Neural Networks Expansion With COnvolutional MaxOut AU - Zhao, Baoxin AU - Xiong, Haoyi AU - Bian, Jiang AU - Guo, Zhishan AU - Xu, Cheng-Zhong AU - Dou, Dejing T2 - IEEE Transactions on Multimedia AB - In this paper, we extend the classic MaxOut strategy, originally designed for Multiple Layer Preceptors (MLPs), into CO nvolutional M ax O ut (COMO) — a new strategy making deep convolutional neural networks wider with parameter efficiency. Compared to the existing solutions, such as ResNeXt for ResNet or Inception for VGG-alikes, COMO works well on both linear architectures and the ones with skipped connections and residual blocks. More specifically, COMO adopts a novel split-transform-merge paradigm that extends the layers with spatial resolution reduction into multiple parallel splits. For the layer with COMO, each split passes the input feature maps through a 4D convolution operator with independent batch normalization operators for transformation, then merge into the aggregated output of the original sizes through max-pooling . Such a strategy is expected to tackle the potential classification accuracy degradation due to the spatial resolution reduction, by incorporating the multiple splits and max-pooling-based feature selection. Our experiment using a wide range of deep architectures shows that COMO can significantly improve the classification accuracy of ResNet/VGG-alike networks based on a large number of benchmark datasets. COMO further outperforms the existing solutions, e.g., Inceptions, ResNeXts, SE-ResNet, and Xception, that make networks wider, and it dominates in the comparison of accuracy versus parameter sizes. DA - 2021/// PY - 2021/// DO - 10.1109/TMM.2020.3002614 VL - 23 SP - 1722-1730 UR - https://doi.org/10.1109/TMM.2020.3002614 ER - TY - JOUR TI - Partitioning-Based Scheduling of OpenMP Task Systems With Tied Tasks AU - Wang, Yang AU - Jiang, Xu AU - Guan, Nan AU - Guo, Zhishan AU - Liu, Xue AU - Yi, Wang T2 - IEEE Transactions on Parallel and Distributed Systems AB - OpenMP is a popular programming framework in both general and high-performance computing and has recently drawn much interest in embedded and real-time computing. Although the execution semantics of OpenMP are similar to the DAG task model, the constraints posed by the OpenMP specification make them significantly more challenging to analyze. A tied task is an important feature in OpenMP that must execute on the same thread throughout its entire life cycle. A previous work [1] succeeded in analyzing the real-time scheduling of tied tasks by modifying the Task Scheduling Constraints (TSCs) in OpenMP specification. In this article, we also study the real-time scheduling of OpenMP task systems with tied tasks but without changing the original TSCs. In particular, we propose a partitioning-based algorithm, P-EDF-omp, by which the tied constraint can be automatically guaranteed as long as an OpenMP task system can be successfully partitioned to a multiprocessor platform. Furthermore, we conduct comprehensive experiments with both synthetic workloads and established OpenMP benchmarks to show that our approach consistently outperforms the work in [1] —even without modifying the TSCs. DA - 2021/6/1/ PY - 2021/6/1/ DO - 10.1109/TPDS.2020.3048373 VL - 32 IS - 6 SP - 1322-1339 UR - https://doi.org/10.1109/TPDS.2020.3048373 ER - TY - JOUR TI - Narrowing the speedup factor gap of partitioned EDF AU - Liu, Xingwu AU - Han, Xin AU - Zhao, Liang AU - Guo, Zhishan T2 - Information and Computation AB - Schedulability is a fundamental problem in analyzing real-time systems, but it often has to be approximated because of the intrinsic computational hardness. Partitioned earliest deadline first (EDF) is one of the most popular polynomial-time and practical scheduler on multiprocessor platforms, and it was shown to have a speedup factor of at most 2.6322 − 1 / m . This paper further improves the factor to 2.5556 − 1 / m for both the constrained-deadline case and the arbitrary-deadline case, and it is very close to the known (non-tight) lower bound of 2.5 − 1 / m . The key ideas are that we develop a novel method to discretize and regularize sporadic task sets that are schedulable on uniprocessors, and we find that the ratio ( ρ ) of the approximate demand bound value to the machine capacity is upper-bounded by 1.5556 for the arbitrary-deadline case, which plays an important role in estimating the speed factor of partitioned EDF. DA - 2021/12// PY - 2021/12// DO - 10.1016/j.ic.2021.104743 VL - 281 SP - 104743 UR - https://doi.org/10.1016/j.ic.2021.104743 ER - TY - JOUR TI - How end-user programmers forage in online repositories? An information foraging perspective AU - Kuttal, Sandeep Kaur AU - Kim, Se Yeon AU - Martos, Carlos AU - Bejarano, Alexandra T2 - Journal of Computer Languages AB - End-user (non-professional) programmers often opportunistically create programs, they evaluate various alternatives and reuse existing code by merging components from it or modifying it to suit the context or problems of their programs. Finding and evaluating which program variants to reuse code from is challenging because the searching mechanisms within online repositories are not optimal. To understand the reuse behavior of end-user programmers and to provide implications on how to further support them, we conducted an empirical study in which eight end-user programmers foraged in online repositories, specifically App Inventor Gallery and File Exchange. Using Information Foraging Theory, we qualitatively analyzed the end-user programmers’ behavior and focused on not only program variants from a single source, but also on similar variants from various sources developed over time and by different authors. This analysis revealed new cue types and strategies specific to novice and experienced end-user programmers as they foraged between- and within-variants. DA - 2021/2// PY - 2021/2// DO - 10.1016/j.cola.2020.101010 VL - 62 SP - 101010 J2 - Journal of Computer Languages LA - en OP - SN - 2590-1184 UR - http://dx.doi.org/10.1016/j.cola.2020.101010 DB - Crossref ER - TY - ER - TY - ER - TY - ER - TY - JOUR TI - Visual Resume: Exploring developers’ online contributions for hiring AU - Kuttal, Sandeep Kaur AU - Chen, Xiaofan AU - Wang, Zhendong AU - Balali, Sogol AU - Sarma, Anita T2 - Information and Software Technology AB - Recruiters and practitioners are increasingly relying on online activities of developers to find a suitable candidate. Past empirical studies have identified technical and soft skills that managers use in online peer production sites when making hiring decisions. However, finding candidates with relevant skills is a labor-intensive task for managers, due to the sheer amount of information online peer production sites contain. We designed a profile aggregation tool—Visual Resume—that aggregates contribution information across two types of peer production sites: a code hosting site (GitHub) and a technical Q&A forum (Stack Overflow). Visual Resume displays summaries of developers’ contributions and allows easy access to their contribution details. It also facilitates pairwise comparisons of candidates through a card-based design. We present the motivation for such a design and design guidelines for creating such recruitment tool. We performed a scenario-based evaluation to identify how participants use developers’ online contributions in peer production sites as well as how they used Visual Resume when making hiring decisions. Our analysis helped in identifying the technical and soft skill cues that were most useful to our participants when making hiring decisions in online production sites. We also identified the information features that participants used and the ways the participants accessed that information to select a candidate. Our results suggest that Visual Resume helps in participants evaluate cues for technical and soft skills more efficiently as it presents an aggregated view of candidate’s contributions, allows drill down to details about contributions, and allows easy comparison of candidates via movable cards that could be arranged to match participants’ needs. DA - 2021/10// PY - 2021/10// DO - 10.1016/j.infsof.2021.106633 VL - 138 SP - 106633 J2 - Information and Software Technology LA - en OP - SN - 0950-5849 UR - http://dx.doi.org/10.1016/j.infsof.2021.106633 DB - Crossref ER - TY - JOUR TI - FASE: Fine-Grained Accountable and Space-Efficient Access Control for Multimedia Content With In-Network Caching AU - He, Peixuan AU - Xue, Kaiping AU - Yang, Jiayu AU - Xia, Qiudong AU - Liu, Jianqing AU - Wei, David SL T2 - IEEE Transactions on Network and Service Management DA - 2021/// PY - 2021/// VL - 18 IS - 4 SP - 4462-4475 ER - TY - JOUR TI - Transparent Multipath: Using Double MPTCP Proxies to Enhance Transport Performance for Traditional TCP AU - Han, Jiangping AU - Xue, Kaiping AU - Wei, Wenjia AU - Xing, Yitao AU - Liu, Jianqing AU - Hong, Peilin T2 - IEEE Network DA - 2021/// PY - 2021/// VL - 35 IS - 5 SP - 181-187 ER - TY - CONF TI - MP-VR: An MPTCP-Based Adaptive Streaming Framework for 360-degree Virtual Reality Videos AU - Wei, Wenjia AU - Han, Jiangping AU - Xing, Yitao AU - Xue, Kaiping AU - Liu, Jianqing AU - Zhuang, Rui T2 - IEEE C2 - 2021/// C3 - ICC 2021-IEEE International Conference on Communications DA - 2021/// SP - 1-6 ER - TY - JOUR TI - Building a large-scale and wide-area quantum Internet based on an OSI-alike model AU - Li, Zhonghui AU - Xue, Kaiping AU - Li, Jian AU - Yu, Nenghai AU - Liu, Jianqing AU - Wei, David SL AU - Sun, Qibin AU - Lu, Jun T2 - China Communications DA - 2021/// PY - 2021/// VL - 18 IS - 10 SP - 1-14 ER - TY - JOUR TI - FVC-Dedup: A Secure Report Deduplication Scheme in a Fog-assisted Vehicular Crowdsensing System AU - Jiang, Shunrong AU - Liu, Jianqing AU - Zhou, Yong AU - Fang, Yuguang T2 - IEEE Transactions on Dependable and Secure Computing DA - 2021/// PY - 2021/// ER - TY - JOUR TI - Enabling Cross-chain Transactions: A Decentralized Cryptocurrency Exchange Protocol AU - Tian, Hangyu AU - Xue, Kaiping AU - Luo, Xinyi AU - Li, Shaohua AU - Xu, Jie AU - Liu, Jianqing AU - Zhao, Jun AU - Wei, David SL T2 - IEEE Transactions on Information Forensics and Security DA - 2021/// PY - 2021/// VL - 16 SP - 3928-3941 ER - TY - JOUR TI - An Intelligent Resource Allocation Scheme in Energy Harvesting Cognitive Wireless Sensor Networks AU - Deng, Xiaoheng AU - Guan, Peiyuan AU - Hei, Cong AU - Li, Feng AU - Liu, Jianqing AU - Xiong, Naixue T2 - IEEE Transactions on Network Science and Engineering AB - The energy harvesting cognitive wireless sensor network (EHCWSN) introduces energy harvesting technology and cognitive radio technology into the traditional wireless sensor network (WSN), which significantly prolongs the working life of the sensor node and effectively alleviates the congestion problem of the unlicensed spectrum. Due to the uncertainty of the energy harvesting process and the behavior of the primary user (PU), how to allocate and manage limited network resources is a crucial problem in the EHCWSN. In this work, a new Q-learning-based channel selection method is proposed for the energy harvesting process and the randomness of the PU's behavior in the sensor network. By continuously interacting and learning with the environment, the method guides the secondary user (SU) to select the channel with better channel quality. Moreover, we also propose a resource management and allocation mechanism with guaranteed QoS requirements for node traffic based on the framework of Lyapunov optimization theory. We design a low-complex online algorithm based on the optimization framework, which is then validated through extensive simulations. The results demonstrate that our design achieves higher accuracy with the QoS guarantee. DA - 2021/4/1/ PY - 2021/4/1/ DO - 10.1109/tnse.2021.3076485 VL - 8 IS - 2 SP - 1900-1912 ER - TY - JOUR TI - Empirical Optimization on Post-Disaster Communication Restoration for Social Equality AU - Liu, Jianqing AU - Dong, Shangjia AU - Morris, Thomas T2 - arXiv preprint arXiv:2103.10582 DA - 2021/// PY - 2021/// ER - TY - JOUR TI - A Low-Latency MPTCP Scheduler for Live Video Streaming in Mobile Networks AU - Xing, Yitao AU - Xue, Kaiping AU - Zhang, Yuan AU - Han, Jiangping AU - Li, Jian AU - Liu, Jianqing AU - Li, Ruidong T2 - IEEE Transactions on Wireless Communications AB - It is a known issue that low-latency communication is hard to achieve when using multiple network interfaces with asymmetric capacity and delay (e.g., LTE and WLAN) simultaneously. A main underlying cause of this issue is that the packets with lower sequence number are stalled on a high-latency path, thus the early arriving packets with higher sequence number become “out-of-order (OFO)” packets. These OFO packets may excessively consume receiver’s buffer, causing long reordering delay and unnecessary packet retransmission. In this paper, we present a novel design of packet scheduling for Multipath TCP (MPTCP), called OverLapped Scheduler (OLS), able to tackle the OFO-packet problem more effectively. OLS can guarantee sufficient throughput on demand of upper layer applications, and utilizes the remaining bandwidth to reduce OFO-packets. To do so, OLS schedules packets according to their arrival time and sends a controlled number of redundant packets to avoid the impact of inaccurate arrival-time estimations due to network jitter. We implement OLS in a Linux kernel, and the experiments show that in asymmetric networks with or without jitter, OLS can effectively reduce OFO-packets and transmission latency while maintaining a sufficient throughput, which makes it fully capable to meet the requirements of applications such as live video streaming. DA - 2021/// PY - 2021/// DO - 10.1109/twc.2021.3081498 SP - 1-1 UR - http://dx.doi.org/10.1109/twc.2021.3081498 ER - TY - CONF TI - Investigate Effectiveness of Code Features in Knowledge Tracing Task on Novice Programming Course AU - Penmetsa, P. AU - Shi, Y. AU - Price, T. C2 - 2021/// C3 - CEUR Workshop Proceedings DA - 2021/// VL - 3051 UR - http://www.scopus.com/inward/record.url?eid=2-s2.0-85122896934&partnerID=MN8TOARS ER - TY - JOUR TI - Detecting Disruptive Talk in Student Chat-Based Discussion within Collaborative Game-Based Learning Environments AU - Park, Kyungjin AU - Sohn, Hyunwoo AU - Mott, Bradford W. AU - Min, Wookhee AU - Saleh, Asmalina AU - Glazewski, Krista D. AU - Hmelo-Silver, Cindy E. AU - Lester, James C. T2 - LAK21 CONFERENCE PROCEEDINGS: THE ELEVENTH INTERNATIONAL CONFERENCE ON LEARNING ANALYTICS & KNOWLEDGE AB - Collaborative game-based learning environments offer significant promise for creating engaging group learning experiences. Online chat plays a pivotal role in these environments by providing students with a means to freely communicate during problem solving. These chat-based discussions and negotiations support the coordination of students’ in-game learning activities. However, this freedom of expression comes with the possibility that some students might engage in undesirable communicative behavior. A key challenge posed by collaborative game-based learning environments is how to reliably detect disruptive talk that purposefully disrupt team dynamics and problem-solving interactions. Detecting disruptive talk during collaborative game-based learning is particularly important because if it is allowed to persist, it can generate frustration and significantly impede the learning process for students. This paper analyzes disruptive talk in a collaborative game-based learning environment for middle school science education to investigate how such behaviors influence students’ learning outcomes and varies across gender and students’ prior knowledge. We present a disruptive talk detection framework that automatically detects disruptive talk in chat-based group conversations. We further investigate both classic machine learning and deep learning models for the framework utilizing a range of dialogue representations as well as supplementary information such as student gender. Findings show that long short-term memory network (LSTM)-based disruptive talk detection models outperform competitive baseline models, indicating that the LSTM-based disruptive talk detection framework offers significant potential for supporting effective collaborative game-based learning through the identification of disruptive talk. DA - 2021/// PY - 2021/// DO - 10.1145/3448139.3448178 SP - 405-415 KW - Collaborative Game-Based Learning KW - Disruptive Talk Detection KW - Text Analytics ER - TY - JOUR TI - Investigating Student Reflection during Game-Based Learning in Middle Grades Science AU - Carpenter, Dan AU - Cloude, Elizabeth AU - Rowe, Jonathan AU - Azevedo, Roger AU - Lester, James T2 - LAK21 CONFERENCE PROCEEDINGS: THE ELEVENTH INTERNATIONAL CONFERENCE ON LEARNING ANALYTICS & KNOWLEDGE AB - Reflection plays a critical role in learning by encouraging students to contemplate their knowledge and previous learning experiences to inform their future actions and higher-order thinking, such as reasoning and problem solving. Reflection is particularly important in inquiry-driven learning scenarios where students have the freedom to set goals and regulate their own learning. However, despite the importance of reflection in learning, there are significant theoretical, methodological, and analytical challenges posed by measuring, modeling, and supporting reflection. This paper presents results from a classroom study to investigate middle-school students’ reflection during inquiry-driven learning with Crystal Island, a game-based learning environment for middle-school microbiology. To collect evidence of reflection during game-based learning, we used embedded reflection prompts to elicit written reflections during students’ interactions with Crystal Island. Results from analysis of data from 105 students highlight relationships between features of students’ reflections and learning outcomes related to both science content knowledge and problem solving. We consider implications for building adaptive support in game-based learning environments to foster deep reflection and enhance learning, and we identify key features in students’ problem-solving actions and reflections that are predictive of reflection depth. These findings present a foundation for providing adaptive support for reflection during game-based learning. DA - 2021/// PY - 2021/// DO - 10.1145/3448139.3448166 SP - 280-291 KW - Self-Regulated Learning KW - Game-Based Learning KW - Reflection ER - TY - JOUR TI - Adaptively Scaffolding Cognitive Engagement with Batch Constrained Deep Q-Networks AU - Fahid, Fahmid Morshed AU - Rowe, Jonathan P. AU - Spain, Randall D. AU - Goldberg, Benjamin S. AU - Pokorny, Robert AU - Lester, James T2 - ARTIFICIAL INTELLIGENCE IN EDUCATION (AIED 2021), PT I AB - Scaffolding student engagement is a central challenge in adaptive learning environments. The ICAP framework defines levels of cognitive engagement with a learning activity in terms of four different engagement modes—Interactive, Constructive, Active, and Passive—and it predicts that increased cognitive engagement will yield improved learning. However, a key open question is how best to translate the ICAP theory into the design of adaptive scaffolding in adaptive learning environments. Specifically, should scaffolds be designed to require the highest levels of cognitive engagement (i.e., Interactive and Constructive modes) with every instance of feedback or knowledge component? To answer this question, in this paper we investigate a data-driven pedagogical modeling framework based on batch-constrained deep Q-networks, a type of deep reinforcement learning (RL) method, to induce policies for delivering ICAP-inspired scaffolding in adaptive learning environments. The policies are trained with log data from 487 learners as they interacted with an adaptive learning environment that provided ICAP-inspired feedback and remediation. Results suggest that adaptive scaffolding policies induced with batch-constrained deep Q-networks outperform heuristic policies that strictly follow the ICAP model without RL-based tailoring. The findings demonstrate the utility of deep RL for tailoring scaffolding for learner cognitive engagement. DA - 2021/// PY - 2021/// DO - 10.1007/978-3-030-78292-4_10 VL - 12748 SP - 113-124 SN - 1611-3349 KW - Deep reinforcement learning KW - Cognitive engagement KW - ICAP KW - Adaptive learning environments ER - TY - JOUR TI - The Challenge of Noisy Classrooms: Speaker Detection During Elementary Students' Collaborative Dialogue AU - Ma, Yingbo AU - Wiggins, Joseph B. AU - Celepkolu, Mehmet AU - Boyer, Kristy Elizabeth AU - Lynch, Collin AU - Wiebe, Eric T2 - ARTIFICIAL INTELLIGENCE IN EDUCATION (AIED 2021), PT I AB - Adaptive and intelligent collaborative learning support systems are effective for supporting learning and building strong collaborative skills. This potential has not yet been realized within noisy classroom environments, where automated speech recognition (ASR) is very difficult. A key challenge is to differentiate each learner’s speech from the background noise, which includes the teachers’ speech as well as other groups’ speech. In this paper, we explore a multimodal method to identify speakers by using visual and acoustic features from ten video recordings of children pairs collaborating in an elementary school classroom. The results indicate that the visual modality was better for identifying the speaker when in-group speech was detected, while the acoustic modality was better for differentiating in-group speech from background speech. Our analysis also revealed that recurrent neural network (RNN)-based models outperformed convolutional neural network (CNN)-based models with higher speaker detection F-1 scores. This work represents a critical step toward the classroom deployment of intelligent systems that support collaborative learning. DA - 2021/// PY - 2021/// DO - 10.1007/978-3-030-78292-4_22 VL - 12748 SP - 268-281 SN - 1611-3349 KW - Adaptive and intelligent collaborative learning support KW - Classroom environment KW - Speaker detection KW - Multimodal learning ER - TY - JOUR TI - Modeling Frustration Trajectories and Problem-Solving Behaviors in Adaptive Learning Environments for Introductory Computer Science AU - Tian, Xiaoyi AU - Wiggins, Joseph B. AU - Fahid, Fahmid Morshed AU - Emerson, Andrew AU - Bounajim, Dolly AU - Smith, Andy AU - Boyer, Kristy Elizabeth AU - Wiebe, Eric AU - Mott, Bradford AU - Lester, James T2 - ARTIFICIAL INTELLIGENCE IN EDUCATION (AIED 2021), PT II AB - Modeling a learner’s frustration in adaptive environments can inform scaffolding. While much work has explored momentary frustration, there is limited research investigating the dynamics of frustration over time and its relationship with problem-solving behaviors. In this paper, we clustered 86 undergraduate students into four frustration trajectories as they worked with an adaptive learning environment for introductory computer science. The results indicate that students who initially report high levels of frustration but then reported lower levels later in their problem solving were more likely to have sought help. These findings provide insight into how frustration trajectory models can guide adaptivity during extended problem-solving episodes. DA - 2021/// PY - 2021/// DO - 10.1007/978-3-030-78270-2_63 VL - 12749 SP - 355-360 SN - 1611-3349 KW - Frustration trajectory KW - Adaptive learning environments KW - Problem-solving behavior KW - Computer science education KW - Block-based programming ER - TY - JOUR TI - Multidimensional Team Communication Modeling for Adaptive Team Training: A Hybrid Deep Learning and Graphical Modeling Framework AU - Min, Wookhee AU - Spain, Randall AU - Saville, Jason D. AU - Mott, Bradford AU - Brawner, Keith AU - Johnston, Joan AU - Lester, James T2 - ARTIFICIAL INTELLIGENCE IN EDUCATION (AIED 2021), PT I AB - Team communication modeling offers great potential for adaptive learning environments for team training. However, the complex dynamics of team communication pose significant challenges for team communication modeling. To address these challenges, we present a hybrid framework integrating deep learning and probabilistic graphical models that analyzes team communication utterances with respect to the intent of the utterance and the directional flow of communication within the team. The hybrid framework utilizes conditional random fields (CRFs) that use deep learning-based contextual, distributed language representations extracted from team members’ utterances. An evaluation with communication data collected from six teams during a live training exercise indicate that linear-chain CRFs utilizing ELMo utterance embeddings (1) outperform both multi-task and single-task variants of stacked bidirectional long short-term memory networks using the same distributed representations of the utterances, (2) outperform a hybrid approach that uses non-contextual utterance representations for the dialogue classification tasks, and (3) demonstrate promising domain-transfer capabilities. The findings suggest that the hybrid multidimensional team communication analysis framework can accurately recognize speaker intent and model the directional flow of team communication to guide adaptivity in team training environments. DA - 2021/// PY - 2021/// DO - 10.1007/978-3-030-78292-4_24 VL - 12748 SP - 293-305 SN - 1611-3349 KW - Team communication analytics KW - Probabilistic graphical models KW - Deep learning KW - Distributed language representations KW - Natural language processing ER - TY - JOUR TI - "Can You Clarify What You Said?": Studying the Impact of Tutee Agents' Follow-Up Questions on Tutors' Learning AU - Shahriar, Tasmia AU - Matsuda, Noboru T2 - ARTIFICIAL INTELLIGENCE IN EDUCATION (AIED 2021), PT I AB - Students learn by teaching others as tutors. Advancement in the theory of learning by teaching has given rise to many pedagogical agents. In this paper, we exploit a known cognitive theory that states if a tutee asks deep questions in a peer tutoring environment, a tutor benefits from it. Little is known about a computational model of such deep questions. This paper aims to formalize the deep tutee questions and proposes a generalized model of inquiry-based dialogue, called the constructive tutee inquiry, to ask follow-up questions to have tutors reflect their current knowledge (aka knowledge-building activity). We conducted a Wizard of Oz study to evaluate the proposed constructive tutee inquiry. The results showed that the constructive tutee inquiry was particularly effective for the low prior knowledge students to learn conceptual knowledge. DA - 2021/// PY - 2021/// DO - 10.1007/978-3-030-78292-4_32 VL - 12748 SP - 395-407 SN - 1611-3349 UR - https://doi.org/10.1007/978-3-030-78292-4_32 KW - Learning by teaching KW - Deep questions KW - Teachable agents KW - Tutor learning KW - Knowledge-building KW - Wizard of Oz ER - TY - JOUR TI - Multimodal Trajectory Analysis of Visitor Engagement with Interactive Science Museum Exhibits AU - Emerson, Andrew AU - Henderson, Nathan AU - Min, Wookhee AU - Rowe, Jonathan AU - Minogue, James AU - Lester, James T2 - ARTIFICIAL INTELLIGENCE IN EDUCATION (AIED 2021), PT II AB - Recent years have seen a growing interest in investigating visitor engagement in science museums with multimodal learning analytics. Visitor engagement is a multidimensional process that unfolds temporally over the course of a museum visit. In this paper, we introduce a multimodal trajectory analysis framework for modeling visitor engagement with an interactive science exhibit for environmental sustainability. We investigate trajectories of multimodal data captured during visitor interactions with the exhibit through slope-based time series analysis. Utilizing the slopes of the time series representations for each multimodal data channel, we conduct an ablation study to investigate how additional modalities lead to improved accuracy while modeling visitor engagement. We are able to enhance visitor engagement models by accounting for varying levels of visitors’ science fascination, a construct integrating science interest, curiosity, and mastery goals. The results suggest that trajectory-based representations of the multimodal visitor data can serve as the foundation for visitor engagement modeling to enhance museum learning experiences. DA - 2021/// PY - 2021/// DO - 10.1007/978-3-030-78270-2_27 VL - 12749 SP - 151-155 SN - 1611-3349 KW - Museum learning KW - Visitor engagement KW - Multimodal trajectory KW - analytics ER - TY - JOUR TI - Learning Association Between Learning Objectives and Key Concepts to Generate Pedagogically Valuable Questions AU - Shimmei, Machi AU - Matsuda, Noboru T2 - ARTIFICIAL INTELLIGENCE IN EDUCATION (AIED 2021), PT II AB - It has been shown that answering questions contributes to students learning effectively. However, generating questions is an expensive task and requires a lot of effort. Although there has been research reported on the automation of question generation in the literature of Natural Language Processing, these technologies do not necessarily generate questions that are useful for educational purposes. To fill this gap, we propose QUADL, a method for generating questions that are aligned with a given learning objective. The learning objective reflects the skill or concept that students need to learn. The QUADL method first identifies a key concept, if any, in a given sentence that has a strong connection with the given learning objective. It then converts the given sentence into a question for which the predicted key concept becomes the answer. The results from the survey using Amazon Mechanical Turk suggest that the QUADL method can be a step towards generating questions that effectively contribute to students’ learning. DA - 2021/// PY - 2021/// DO - 10.1007/978-3-030-78270-2_57 VL - 12749 SP - 320-324 SN - 1611-3349 UR - https://doi.org/10.1007/978-3-030-78270-2_57 KW - Question generation KW - MOOCS KW - Learning engineering ER - TY - JOUR TI - Evaluating Critical Reinforcement Learning Framework in the Field AU - Ju, Song AU - Zhou, Guojing AU - Abdelshiheed, Mark AU - Barnes, Tiffany AU - Chi, Min T2 - ARTIFICIAL INTELLIGENCE IN EDUCATION (AIED 2021), PT I AB - Reinforcement Learning (RL) is learning what action to take next by mapping situations to actions so as to maximize cumulative rewards. In recent years RL has achieved great success in inducing effective pedagogical policies for various interactive e-learning environments. However, it is often prohibitive to identify the critical pedagogical decisions that actually contribute to desirable learning outcomes. In this work, by utilizing the RL framework we defined critical decisions to be those states in which the agent has to take the optimal actions, and subsequently, the Critical policy as carrying out optimal actions in the critical states while acting randomly in others. We proposed a general Critical-RL framework for identifying critical decisions and inducing a Critical policy. The effectiveness of our Critical-RL framework is empirically evaluated from two perspectives: whether optimal actions must be carried out in critical states (the necessary hypothesis) and whether only carrying out optimal actions in critical states is as effective as a fully-executed RL policy (the sufficient hypothesis). Our results confirmed both hypotheses. DA - 2021/// PY - 2021/// DO - 10.1007/978-3-030-78292-4_18 VL - 12748 SP - 215-227 SN - 1611-3349 KW - Critical decisions KW - Reinforcement learning KW - ITS ER - TY - JOUR TI - Infusing Computing: Moving a Service Oriented Internship Program Online AU - Isvik, Amy AU - Catete, Veronica AU - Bell, Dave AU - Gransbury, Isabella AU - Barnes, Tiffany T2 - IEEE STCBP RESPECT CONFERENCE: 2021 RESEARCH ON EQUITY AND SUSTAINED PARTICIPATION IN ENGINEERING, COMPUTING, AND TECHNOLOGY (RESPECT) AB - As virtual conferencing technology becomes more common and situations make in-person experiences difficult or unsafe to host, the need for online internships to support sustained participation in computing increases. We investigate the problem of how to provide a meaningful experiential education program in a virtual environment and serve geographically dispersed participants through our experience with moving a service oriented internship program online. Our computer science internship program leverages high school interns' programming skills and classroom experience to assist teachers in developing computing-infused lessons for their classrooms. Using a combination of synchronous and asynchronous activities, we trained our interns in how to make these lessons and helped interns build community amongst themselves. Our interns created over 90 lessons during the summer and helped over 50 teachers create their own lessons at an infusing computing professional development. DA - 2021/// PY - 2021/// DO - 10.1109/RESPECT51740.2021.9620644 SP - 199-203 KW - computing education KW - virtual internship KW - service-learning ER - TY - JOUR TI - Examining Equity in Computing-Infused Lessons Made by Novices AU - Isvik, Amy AU - Catete, Veronica AU - Elmore, Erynn AU - Barnes, Tiffany T2 - IEEE STCBP RESPECT CONFERENCE: 2021 RESEARCH ON EQUITY AND SUSTAINED PARTICIPATION IN ENGINEERING, COMPUTING, AND TECHNOLOGY (RESPECT) AB - In this study, we examine 10 computing-infused lessons with high equity scores created by high school interns. These projects were part of a larger corpus of 90+ projects made in summer 2020 for middle school and high school classrooms and the projects were evaluated using the Teacher Accessibility, Equity, and Content (TEC) rubric. This article examines the observed extensive evidence for equity in these 10 projects to determine how meaningful these equity scores are, what themes are present across projects, and to provide curriculum developers with strategies for ensuring their activities utilize equitable practices to be intentionally inclusive of all students. DA - 2021/// PY - 2021/// DO - 10.1109/RESPECT51740.2021.9620700 SP - 157-161 KW - equity KW - curriculum design KW - equity analysis ER - TY - JOUR TI - Hybrid Blockchain Architecture for Cloud Manufacturing-as-a-service (CMaaS) Platforms with Improved Data Storage and Transaction Efficiency AU - Hasan, Mahmud AU - Ogan, Kemafor AU - Starly, Binil T2 - 49TH SME NORTH AMERICAN MANUFACTURING RESEARCH CONFERENCE (NAMRC 49, 2021) AB - Blockchain based decentralized Cloud Manufacturing-as-a-Service (CMaaS) platforms enable customers to gain access to a large capacity of manufacturing nodes over cryptographically secure networks. In recent times, the Ethereum network has emerged as a popular blockchain framework for providing provenance and traceability of proprietary manufacturing data in decentralized CMaaS. However, the Ethereum ecosystem was only designed to store and transmit low volume financial transaction data and little has been done to make it an efficient repository of large manufacturing data streams in CMaaS systems. In this paper, the authors build on their previous work and report the design, implementation, and validation of middleware software architectures that allow Ethereum based distributed CMaaS platforms to harness the benefits of the secure asset models of the Ethereum ecosystem and the immutable big data storage capabilities of the decentralized BigchainDB database platform. A novel hybrid blockchain architecture enabled by efficient communication protocols and blockchain oracles is proposed. This architecture allows the transfer and immutable storage of large manufacturing data streams onto global BigchainDB nodes allowing data rich manufacturing transactions to bypass the transaction fees of the Ethereum ecosystem. Additionally, a machine learning based time series inference model is proposed which enables the forecast of Ethereum gas price into the future. This allows the CMaaS platform smart contracts to judiciously assign gas price limits and hence save on transactions ensuing from transfer or creation of assets. The outcomes of this research show that the designed hybrid architecture can lead to the reduction of significant number of computational steps and hence transaction fees on Ethereum by offloading large volume data onto BigchainDB nodes. A Random Forest regressor based time series inference model has been shown to exhibit superior performance in the prediction of Ethereum gas price, that allows the CMaaS to avoid executing transactions in periods of high gas prices within the Ethereum ecosystem. DA - 2021/// PY - 2021/// DO - 10.1016/j.promfg.2021.06.060 VL - 53 SP - 594-605 SN - 2351-9789 KW - blockchain KW - smart contracts KW - distributed database KW - time series KW - regression KW - lstm KW - gas price prediction ER - TY - JOUR TI - Developing Sustainable, Mutually Collaborative, Global Partnerships AU - Bottomley, Laura AU - Catete, Veronica AU - Mbaneme, Veronica AU - Daniel, Angelitha AU - Pender, Kimberly AU - Reynolds, Kanton AU - Marshall, Lisa T2 - 2021 WORLD ENGINEERING EDUCATION FORUM/GLOBAL ENGINEERING DEANS COUNCIL (WEEF/GEDC) AB - We examine partnerships between a United States university and K-12 schools in Rwanda. Our program uses an engineering-outreach model to qualitatively explore global student experiences and through collaborative efforts, how integration and dissemination of knowledge has occurred. The developed educational model emphasizes problem-solving and critical-thinking over sophisticated materials. The national curriculum aligned activities are designed to be accessible to classrooms with limited resources. Through this multi-year partnership, our team derived a series of lessons learned regarding contextualized diversity, culturally situated learning, and pathways for sustained mentorships. DA - 2021/// PY - 2021/// DO - 10.1109/WEEF/GEDC53299.2021.9657357 SP - 82-87 KW - Engineering KW - Global Partnerships KW - Service Learning ER - TY - JOUR TI - T-Pack: Timed Network Security for Real Time Systems AU - Mittal, Swastik AU - Mueller, Frank T2 - 2021 IEEE 24TH INTERNATIONAL SYMPOSIUM ON REAL-TIME DISTRIBUTED COMPUTING (ISORC 2021) AB - Network communication between real-time control systems raises system vulnerability to malware attacks over the network. Such attacks not only result in alteration of system behavior but also incur timing dilation due to executing injected code or, in case of network attacks, to dropped, added, rerouted, or modified packets. This work proposes to detect intrusion based on time dilation induced by time delays within the network potentially resulting in system malfunctioning due to missed deadlines. A new method of timed packet protection, T-Pack, analyzes end-to-end transmission times of packets and detects a compromised system or network based on deviation of observed time from the expected time on end nodes, well in advance of a task's deadline. First, the Linux network stack is extended with timing information maintained within the kernel and further embedded within packets for TCP and UDP communication. Second, real-time application scenarios are analyzed in terms of their susceptibility to malware attacks. Results are evaluated on a distributed system of embedded platforms running a Preempt RT Linux kernel to demonstrate its real-time capabilities. DA - 2021/// PY - 2021/// DO - 10.1109/ISORC52013.2021.00014 SP - 20-28 SN - 2375-5261 ER - TY - JOUR TI - Dissecting Cloud Gaming Performance with DECAF AU - Iqbal, Hassan AU - Khalid, Ayesha AU - Shahzad, Muhammad T2 - PROCEEDINGS OF THE ACM ON MEASUREMENT AND ANALYSIS OF COMPUTING SYSTEMS AB - Cloud gaming platforms have witnessed tremendous growth over the past two years with a number of large Internet companies including Amazon, Facebook, Google, Microsoft, and Nvidia publicly launching their own platforms. While cloud gaming platforms continue to grow, the visibility in their performance and relative comparison is lacking. This is largely due to absence of systematic measurement methodologies which can generally be applied. As such, in this paper, we implement DECAF, a methodology to systematically analyze and dissect the performance of cloud gaming platforms across different game genres and game platforms. DECAF is highly automated and requires minimum manual intervention. By applying DECAF, we measure the performance of three commercial cloud gaming platforms including Google Stadia, Amazon Luna, and Nvidia GeForceNow, and uncover a number of important findings. First, we find that processing delays in the cloud comprise majority of the total round trip delay experienced by users, accounting for as much as 73.54% of total user-perceived delay. Second, we find that video streams delivered by cloud gaming platforms are characterized by high variability of bitrate, frame rate, and resolution. Platforms struggle to consistently serve 1080p/60 frames per second streams across different game genres even when the available bandwidth is 8-20× that of platform's recommended settings. Finally, we show that game platforms exhibit performance cliffs by reacting poorly to packet losses, in some cases dramatically reducing the delivered bitrate by up to 6.6× when loss rates increase from 0.1% to 1%. Our work has important implications for cloud gaming platforms and opens the door for further research on comprehensive measurement methodologies for cloud gaming. DA - 2021/12// PY - 2021/12// DO - 10.1145/3491043 VL - 5 IS - 3 SP - SN - 2476-1249 KW - Cloud gaming KW - measurement KW - performance evaluation KW - deep learning KW - game bot KW - latency KW - streaming bitrate KW - network utilization ER - TY - CONF TI - Parameterized algorithms for identifying gene co-expression modules via weighted clique decomposition AU - Cooley, Madison AU - Greene, Casey S. AU - Issac, Davis AU - Pividori, Milton AU - Sullivan, Blair D. AB - We present a new combinatorial model for identifying regulatory modules in gene co-expression data using a decomposition into weighted cliques. To capture complex interaction effects, we generalize the previously-studied weighted edge clique partition problem. As a first step, we restrict ourselves to the noise-free setting, and show that the problem is fixed parameter tractable when parameterized by the number of modules (cliques). We present two new algorithms for finding these decompositions, using linear programming and integer partitioning to determine the clique weights. Further, we implement these algorithms in Python and test them on a biologically-inspired synthetic corpus generated using real-world data from transcription factors and a latent variable analysis of co-expression in varying cell types. C2 - 2021/1// C3 - SIAM Conference on Applied and Computational Discrete Algorithms (ACDA21) DA - 2021/1// DO - 10.1137/1.9781611976830.11 SP - 111-122 PB - Society for Industrial and Applied Mathematics UR - http://dx.doi.org/10.1137/1.9781611976830.11 ER - TY - JOUR TI - Projecting genetic associations through gene expression patterns highlights disease etiology and drug mechanisms AU - Pividori, M. AU - Lu, S. AU - Li, B. AU - Su, C. AU - Johnson, M.E. AU - Wei, W.-Q. AU - Feng, Q. AU - Namjou, B. AU - Kiryluk, K. AU - Kullo, I. AU - Luo, Y. AU - Sullivan, B.D. AU - Voight, B.F. AU - Skarke, C. AU - Ritchie, M.D. AU - Grant, S.F.A. AU - Greene, C.S. T2 - bioRxiv AB - Abstract Genes act in concert with each other in specific contexts to perform their functions. Determining how these genes influence complex traits requires a mechanistic understanding of expression regulation across different conditions. It has been shown that this insight is critical for developing new therapies. In this regard, the role of individual genes in disease-relevant mechanisms can be hypothesized with transcriptome-wide association studies (TWAS), which have represented a significant step forward in testing the mediating role of gene expression in GWAS associations. However, modern models of the architecture of complex traits predict that gene-gene interactions play a crucial role in disease origin and progression. Here we introduce PhenoPLIER, a computational approach that maps gene-trait associations and pharmacological perturbation data into a common latent representation for a joint analysis. This representation is based on modules of genes with similar expression patterns across the same conditions. We observed that diseases were significantly associated with gene modules expressed in relevant cell types, and our approach was accurate in predicting known drug-disease pairs and inferring mechanisms of action. Furthermore, using a CRISPR screen to analyze lipid regulation, we found that functionally important players lacked TWAS associations but were prioritized in trait-associated modules by PhenoPLIER. By incorporating groups of co-expressed genes, PhenoPLIER can contextualize genetic associations and reveal potential targets missed by single-gene strategies. DA - 2021/// PY - 2021/// DO - 10.1101/2021.07.05.450786 UR - http://www.scopus.com/inward/record.url?eid=2-s2.0-85115761345&partnerID=MN8TOARS ER - TY - JOUR TI - An Updated Experimental Evaluation of Graph Bipartization Methods AU - Goodrich, T.D. AU - Horton, E. AU - Sullivan, B.D. T2 - ACM Journal of Experimental Algorithmics AB - We experimentally evaluate the practical state-of-the-art in graph bipartization (Odd Cycle Transversal (OCT)), motivated by the need for good algorithms for embedding problems into near-term quantum computing hardware. We assemble a preprocessing suite of fast input reduction routines from the OCT and Vertex Cover (VC) literature and compare algorithm implementations using Quadratic Unconstrained Binary Optimization problems from the quantum literature. We also generate a corpus of frustrated cluster loop graphs, which have previously been used to benchmark quantum annealing hardware. The diversity of these graphs leads to harder OCT instances than in existing benchmarks. In addition to combinatorial branching algorithms for solving OCT directly, we study various reformulations into other NP-hard problems such as VC and Integer Linear Programming (ILP), enabling the use of solvers such as CPLEX. We find that for heuristic solutions with time constraints under a second, iterative compression routines jump-started with a heuristic solution perform best, after which point using a highly tuned solver like CPLEX is worthwhile. Results on exact solvers are split between using ILP formulations on CPLEX and solving VC formulations with a branch-and-reduce solver. We extend our results with a large corpus of synthetic graphs, establishing robustness and potential to generalize to other domain data. In total, over 8,000 graph instances are evaluated, compared to the previous canonical corpus of 100 graphs. Finally, we provide all code and data in an open source suite, including a Python API for accessing reduction routines and branching algorithms, along with scripts for fully replicating our results. DA - 2021/// PY - 2021/// DO - 10.1145/3467968 VL - 26 UR - http://www.scopus.com/inward/record.url?eid=2-s2.0-85119170953&partnerID=MN8TOARS ER - TY - JOUR TI - Sparse dominating sets and balanced neighborhood partitioning AU - Mizutani, Y. AU - Staker, A. AU - Sullivan, B.D. T2 - arXiv DA - 2021/// PY - 2021/// UR - http://www.scopus.com/inward/record.url?eid=2-s2.0-85121833484&partnerID=MN8TOARS ER - TY - CHAP TI - Secondary Structure Ensemble Analysis via Community Detection AU - Du, H. AU - Ferrari, M.M. AU - Heitsch, C. AU - Hurley, F. AU - Mennicke, C.V. AU - Sullivan, B.D. AU - Xu, B. T2 - Association for Women in Mathematics Series AB - We explored the extent to which graph algorithms for community detection can improve the mining of structural information from the predicted Boltzmann/Gibbs ensemble for the biological objects known as RNA secondary structures. As described, a new computational pipeline was developed, implemented, and tested against the prior method RNAStructProfiling. Since the new approach was judged to provide more structural information in 75% of the test cases, this proof-of-principle analysis supports efforts to improve the data mining of RNA secondary structure ensembles. PY - 2021/// DO - 10.1007/978-3-030-57129-0_4 VL - 22 SP - 55-81 UR - http://www.scopus.com/inward/record.url?eid=2-s2.0-85101125344&partnerID=MN8TOARS ER - TY - JOUR TI - Parameterized algorithms for identifying gene co-expression modules via weighted clique decomposition AU - Cooley, M. AU - Greene, C.S. AU - Issac, D. AU - Pividori, M. AU - Sullivan, B.D. T2 - arXiv DA - 2021/// PY - 2021/// UR - http://www.scopus.com/inward/record.url?eid=2-s2.0-85108789587&partnerID=MN8TOARS ER - TY - JOUR TI - Hardness of the Generalized Coloring Numbers AU - Breen-McKay, M. AU - Lavallee, B. AU - Sullivan, B.D. T2 - arXiv DA - 2021/// PY - 2021/// UR - http://www.scopus.com/inward/record.url?eid=2-s2.0-85121786987&partnerID=MN8TOARS ER - TY - CONF TI - Peer Assessment Rubric Analyzer: An NLP approach to analyzing rubric items for better peer review AU - Rashid, M.Parvez AU - Gehringer, Edward F. AU - Young, Mitchell G. AU - Doshi, Divyang AU - Jia, Qinjin AU - Xiao, Yunkai T2 - 2021 19th International Conference on Information Technology Based Higher Education and Training (ITHET) AB - Rubrics have long been used to provide a grading process that is fair and adherent to standards. Just as rubrics can help instructors assess a piece of work, they can also help students to do a more effective job of peer assessment. In a peer-review environment, reviewers provide formative feedback following the rubric criteria. High-quality feedback can greatly enhance the learning process. Rubric criteria need to be worded carefully to provide clear instruction and effective guidance. Heretofore, little research has been performed on how rubric text affects rubric feedback. This study focuses on analyzing rubric text to identify whether rubric criteria will induce peer-reviewers to write quality reviews. We have analyzed 408,104 formative feedback comments based on 3,164 rubric criteria using natural language processing techniques with advanced neural network methods. To our knowledge, this is the first attempt to analyze rubric text to improve review comments for the peer-review environment. C2 - 2021/// C3 - 12th International Workshop on Interactive Environments and Emerging Technologies for eLearning, 2021 19th International Conference on Information Technology Based Higher Education and Training (ITHET) CY - Sydney, Australia DA - 2021/// PY - 2021/11/4/ DO - 10.1109/ITHET50392.2021.9759679 PB - IEEE UR - https://doi.org/10.1109/ITHET50392.2021.9759679 ER - TY - CONF TI - Tools for Detecting Plagiarism in Online Exams AU - Gehringer, Edward F AU - Menon, Ashwini AU - Wang, Guoyi C2 - 2021/7// C3 - American Society for Engineering Virtual Annual Conference DA - 2021/7// UR - https://peer.asee.org/37915 ER - TY - CHAP TI - Narraport: Narrative-Based Interactions and Report Generation with Large Datasets AU - Potts, Colin M. AU - Jhala, Arnav T2 - Interactive Storytelling AB - There is an increasing demand for rapid content filtering in relation to topics like digital forensics for legal cases, cybersecurity, and social media conduct monitoring. While there have been significant advances in algorithms and frameworks for media processing, this task requires an ensemble of tools and algorithms that are not well-understood by human analysts, thereby reducing their trustworthiness. In this paper, we present a novel perspective on this problem through the development of an intelligent system that generates reports from large email datasets in the form of short stories. The stories generated by the system are based on identifiable plot structures in popular media. These structures are used as semantic sensemaking templates to organize data for further filtering and triage. The end-to-end system, accessible through an interactive dashboard, incorporates unsupervised annotation modules (such as speech acts and sentiment), topic discovery, communication network analysis, character personality profiles, and automated text and visualization generators. This emerging application prototype is developed and internally deployed in collaboration with analysts and researchers actively working in this area. PY - 2021/// DO - 10.1007/978-3-030-92300-6_11 SP - 118-127 PB - Springer International Publishing UR - https://doi.org/10.1007/978-3-030-92300-6_11 ER - TY - CONF TI - ALL-IN-ONE: Multi-Task Learning BERT models for Evaluating Peer Assessments AU - Jia, Qinjin AU - Cui, Jialin AU - Xiao, Yunkai AU - Liu, Chengyuan AU - Rashid, Parvez AU - Gehringer, Edward T2 - EDM 2021: 14th International Conference on Educational Data Mining C2 - 2021/7// C3 - EDM 2021: 14th International Conference on Educational Data Mining DA - 2021/7// PY - 2021/6/29/ ER - TY - CONF TI - Can Students Produce Effective Training Data to Improve Formative Feedback? AU - Zhang, Yulin AU - Gehringer, Edward F. T2 - 2021 IEEE Frontiers in Education Conference (FIE) AB - This full research paper shows how machine learning can improve peer assessment by giving students advice on how to write better quality reviews. We trained a model that gives automated feedback by using labeled data produced by students over a period of several semesters. To improve the accuracy of the model, we are working to incorporate active learning (in the machine-learning sense) to direct students to produce training data for situations where the model has the most difficulty making predictions. With the active-learning approach, we expect students to have to do less labeling, so that they can be more attentive and produce more accurate labels. Our results revealed that we are able to cut the amount of labeling effort by half, without loss of reliable training data. C2 - 2021/// C3 - 2021 IEEE Frontiers in Education Conference (FIE) CY - Lincoln, NE, USA DA - 2021/// PY - 2021/10/13/ DO - 10.1109/FIE49875.2021.9637414 SP - 1-7 PB - IEEE SN - 9781665438513 9781665438520 KW - Machine learning KW - active learning KW - peer assessment KW - peer review KW - automated feedback ER - TY - JOUR TI - Nudging Students Toward Better Software Engineering Behaviors AU - Brown, Chris AU - Parnin, Chris T2 - 2021 IEEE/ACM THIRD INTERNATIONAL WORKSHOP ON BOTS IN SOFTWARE ENGINEERING (BOTSE 2021) AB - Student experiences in large undergraduate Computer Science courses are increasingly impacted by automated systems. Bots, or agents of software automation, are useful for efficiently grading and generating feedback. Current efforts at automation in CS education focus on supporting instructional tasks, but do not address student struggles due to poor behaviors, such as procrastination. In this paper, we explore using bots to improve the software engineering behaviors of students using developer recommendation choice architectures, a framework incorporating behavioral science concepts in recommendations to improve the actions of programmers. We implemented this framework in class-bot, a novel system designed to nudge students to make better choices while working on programming assignments. This work presents a preliminary evaluation integrating this tool in an introductory programming course. Our results show that class-bot is beneficial for improving student development behaviors increasing code quality and productivity. DA - 2021/// PY - 2021/// DO - 10.1109/BotSE52550.2021.00010 SP - 11-15 ER - TY - JOUR TI - Explaining Drug-Discovery Hypotheses Using Knowledge-Graph Patterns AU - Schatz, Kara AU - Melo-Filho, Cleber AU - Tropsha, Alexander AU - Chirkova, Rada T2 - 2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) AB - Drug discovery is an important process used by biomedical experts to identify potential treatments for diseases. In its traditional form, the process requires significant expert time and manual effort. By encoding a wealth of information about relationships between drugs and diseases, modern large-scale biomedical knowledge graphs provide excellent opportunities to accelerate drug discovery, by automating aspects of the process. One opportunity is to use explainable fact-checking tools to generate explanations for hypothesized drug-disease treatment relationships in a given knowledge graph, with a reliability score assigned to each explanation. The explanations and their scores can then be used by experts to determine which drug-disease pairs to consider for clinical trials.In our collaboration with a biomedical team, we have found that existing explainable fact-checking tools are not necessarily helpful in drug discovery, as their explanation formats and evaluation metrics do not match well the requirements of scientific discovery in the biomedical domain. To address these challenges in using fact-checking tools in drug discovery, we introduce a scalable automated approach for generating explanations that are modeled after existing biomedical concepts and supplemented with data-supported evaluation metrics. Our explanations are based on knowledge-graph patterns, which are readily understood by biomedical experts. Our experimental results suggest that our proposed metrics are accurate and useful on largescale biomedical knowledge graphs, and our explanations are understandable and reasonable to experts doing drug discovery. DA - 2021/// PY - 2021/// DO - 10.1109/BigData52589.2021.9672006 SP - 3709-3716 SN - 2639-1589 KW - Drug discovery KW - knowledge discovery KW - explainable fact checking KW - link prediction KW - knowledge graph mining ER - TY - JOUR TI - Trustworthy Knowledge Graph Population From Texts for Domain Query Answering AU - Ao, Jing AU - Dinakaran, Swathi AU - Yang, Hungjian AU - Wright, David AU - Chirkova, Rada T2 - 2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) AB - Obtaining answers to domain-specific questions over large-scale unstructured (text) data is an important component of data analytics in many application domains. As manual question answering does not scale to large text corpora, it is common to use information extraction (IE) to preprocess the texts of interest prior to posing the questions. This is often done by transforming text corpora into the knowledge-graph (KG) triple format that is suitable for efficient processing of the user questions in graph-oriented data-intensive systems.In a number of real-life scenarios, trustworthiness of the answers obtained from domain-specific texts is vital for downstream decision making. In this paper we focus on one critical aspect of trustworthiness, which concerns aligning with the given domain vocabularies (ontologies) those KG triples that are obtained from the source texts via IE solutions. To address this problem, we introduce a scalable domain-independent text-to-KG approach that adapts to specific domains by using domain ontologies, without having to consult external triple repositories. Our IE solution builds on the power of neural-based learning models and leverages feature engineering to distinguish ontology-aligned data from generic data in the source texts. Our experimental results indicate that the proposed approach could be more dependable than a state-of-the-art IE baseline in constructing KGs that are suitable for trustworthy domain question answering on text data. DA - 2021/// PY - 2021/// DO - 10.1109/BigData52589.2021.9671514 SP - 4590-4599 SN - 2639-1589 KW - Text data KW - populating knowledge graphs KW - ontology-based information extraction KW - feature engineering ER - TY - JOUR TI - To Reduce Healthcare Workload: Identify Critical Sepsis Progression Moments through Deep Reinforcement Learning AU - Ju, Song AU - Kim, Yeo Jin AU - Ausin, Markel Sanz AU - Mayorga, Maria E. AU - Chi, Min T2 - 2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) AB - Healthcare systems are struggling with increasing workloads that adversely affect quality of care and patient outcomes. When clinical practitioners have to make countless medical decisions, they may not always able to make them consistently or spend time on them. In this work, we formulate clinical decision making as a reinforcement learning (RL) problem and propose a human-controlled machine-assisted (HC-MA) decision making framework whereby we can simultaneously give clinical practitioners (the humans) control over the decision-making process while supporting effective decision-making. In our HC-MA framework, the role of the RL agent is to nudge clinicians only if they make suboptimal decisions at critical moments. This framework is supported by a general Critical Deep RL (Critical-DRL) approach, which uses Long-Short Term Rewards (LSTRs) and Critical Deep Q-learning Networks (CriQNs). Critical-DRL’s effectiveness has been evaluated in both a GridWorld game and real-world datasets from two medical systems: a large health system in the northeast of USA, referred as NEMed and Mayo Clinic in Rochester, Minnesota, USA for septic patient treatment. We found that our Critical-DRL approach, by which decisions are made at critical junctures, is as effective as a fully executed DRL policy and moreover, it enables us to identify the critical moments in the septic treatment process, thus greatly reducing burden on medical decision-makers by allowing them to make critical clinical decisions without negatively impacting outcomes. DA - 2021/// PY - 2021/// DO - 10.1109/BigData52589.2021.9671407 SP - 1640-1646 SN - 2639-1589 KW - Reinforcement Learning KW - Sepsis KW - Critical Decision ER - TY - JOUR TI - InferNet for Delayed Reinforcement Tasks: Addressing the Temporal Credit Assignment Problem AU - Ausin, Markel Sanz AU - Azizsoltani, Hamoon AU - Ju, Song AU - Kim, Yeo Jin AU - Chi, Min T2 - 2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) AB - Rewards are the critical signals for Reinforcement Learning (RL) algorithms to learn the desired behavior in a sequential multi-step learning task. However, when these rewards are delayed and noisy in nature, the learning process becomes more challenging. The temporal Credit Assignment Problem (CAP) is a well-known and challenging task in AI. While RL, especially Deep RL, often works well with immediate rewards but may fail when rewards are delayed or noisy, or both. In this work, we propose delegating the CAP to a Neural Network-based algorithm named InferNet that explicitly learns to infer the immediate rewards from the delayed and noisy rewards. The effectiveness of InferNet was evaluated on three online RL tasks: a GridWorld, a CartPole, and 40 Atari games; and two offline RL tasks: GridWorld and a real-life Sepsis treatment task. The effectiveness of InferNet rewards is compared to that of immediate and delayed rewards in two settings: with and without noise. For the offline RL tasks, it is also compared to a strong baseline, InferGP [7]. Overall, our results show that InferNet is robust to delayed or noisy reward functions, and it could be used effectively for solving the temporal CAP in a wide range of RL tasks, when immediate rewards are not available or they are noisy. DA - 2021/// PY - 2021/// DO - 10.1109/BigData52589.2021.9671827 SP - 1337-1348 SN - 2639-1589 KW - Credit Assignment Problem KW - Deep Reinforcement Learning ER - TY - JOUR TI - Multi-Temporal Abstraction with Time-Aware Deep Q-Learning for Septic Shock Prevention AU - Kim, Yeo Jin AU - Ausin, Markel Sanz AU - Chi, Min T2 - 2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) AB - Sepsis is a life-threatening organ dysfunction and a disease of astronomical burden. Septic shock, the most severe complication of sepsis, leads to a mortality rate as high as 50%. However, septic shock prevention is extremely challenging because individual patients often have very different disease progression, and thus the timings of medical interventions can play a key role in their effectiveness. Recently, reinforcement learning (RL) methods like deep Q-learning networks (DQN) have shown great promise in developing effective treatments for preventing septic shock. In this work, we propose MTA-TQN, a Multi-view -Temporal Abstraction mechanism within a Time-aware deep Q-learning Network framework for this task. More specifically, 1) MTA-TQN leverages irregular time intervals to discount expected return which would prevent systemic overestimations caused by temporal discount errors; 2) it learns both short and long-range dependencies with multi-view temporal abstractions which would reduce bias to a specific series of observations for a single state. The effectiveness of MTA-TQN is validated on two hard exploration Atari games and the septic shock prevention task using real-world EHRs. Our results demonstrate that both time-awareness and multi-view temporal abstraction are essential to induce effective policies, particularly with irregular time-series data. In the septic shock prevention task, while the top 10% of patients whose treatments agreed with DQN induced policy experienced a 17% septic shock rate, our MTA-TQN policies achieved a 5.7% septic shock rate. DA - 2021/// PY - 2021/// DO - 10.1109/BigData52589.2021.9671662 SP - 1657-1663 SN - 2639-1589 KW - deep reinforcement learning KW - time-aware KW - temporal abstraction KW - sepsis ER - TY - JOUR TI - A Scalable System for Searching Large-scale Multi-sensor Remote Sensing Image Collections AU - Zhao, Yifan AU - Yang, Xian AU - Vatsavai, Ranga Raju T2 - 2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) AB - Huge amounts of remote sensing data collected from hundreds of operational satellites in conjunction with on-demand UAV based imaging products are offering unprecedented capabilities towards monitoring dynamic earth resources. However, searching for the right combination of imagery products that satisfy an application requirement is a daunting task. Earlier efforts at streamlining remote sensing data discovery include NASA’s Earth Observing System (EOS) Data and Information System (EOSDIS), USGS Global Visualization Viewer (GloVis), and several other research systems like Minnesota MapServer. These systems were built on top of metadata harvesting, indexing, keyword searching modules which were not scalable and interoperable. To address these challenges, recently the SpatioTemporal Asset Catalog (STAC) specification was developed to provide a common language to describe a range of geospatial information, so that data products can be more easily indexed and discovered. In this paper we present an highly scalable STAC API based system with spatiotemporal indexing support. Experimental evaluation shows that our spatiotemporal indexing based queries are 1000x faster than standard STAC API server. DA - 2021/// PY - 2021/// DO - 10.1109/BigData52589.2021.9671679 SP - 3780-3783 SN - 2639-1589 KW - SpatioTemporal Asset Catalog (STAC) KW - Spatiotemporal Indexing KW - Remote Sensing Data Discovery ER - TY - JOUR TI - Edge-Assisted Collaborative Perception in Autonomous Driving: A Reflection on Communication Design AU - Yu, Ruozhou AU - Yang, Dejun AU - Zhang, Hao T2 - 2021 ACM/IEEE 6TH SYMPOSIUM ON EDGE COMPUTING (SEC 2021) DA - 2021/// PY - 2021/// DO - 10.1145/3453142.3491413 SP - 371-375 KW - Autonomous Driving KW - Collaborative Perception KW - Edge Computing KW - Cellular-V2X KW - Sensing-based SPS KW - NS-3 ER - TY - JOUR TI - Seeds of SEED: New Security Challenges for Persistent Memory AU - Ul Mustafa, Naveed AU - Xu, Yuanchao AU - Shen, Xipeng AU - Solihin, Yan T2 - 2021 INTERNATIONAL SYMPOSIUM ON SECURE AND PRIVATE EXECUTION ENVIRONMENT DESIGN (SEED 2021) AB - Persistent Memeory Object (PMO) is a general system abstraction for holding persistent data in persistent main memory, managed by an operating system. PMO programming model breaks inter-process isolation as it results in sharing of persistent data between two processes as they alternatively access the same PMO. The uncoordinated data-access opens a new avenue for cross-run and cross-process security attacks.In this paper, we discuss threat vulnerabilities that are either new or increased in intensity under PMO programming model. We also discuss security implications of using the PMO, highlighting sample PMO-based attacks and potential strategies to defend against them. DA - 2021/// PY - 2021/// DO - 10.1109/SEED51797.2021.00020 SP - 83-88 KW - Persistent memory objects KW - Security attacks KW - PMO vulnerability ER - TY - JOUR TI - Enhancing Multimodal Affect Recognition with Multi-Task Affective Dynamics Modeling AU - Henderson, Nathan AU - Min, Wookhee AU - Rowe, Jonathan AU - Lester, James T2 - 2021 9TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII) AB - Accurately recognizing students’ affective states is critical for enabling adaptive learning environments to promote engagement and enhance learning outcomes. Multimodal approaches to student affect recognition capture multi-dimensional patterns of student behavior through the use of multiple data channels. An important factor in multimodal affect recognition is the context in which affect is experienced and exhibited. In this paper, we present a multimodal, multitask affect recognition framework that predicts students’ future affective states as auxiliary training tasks and uses prior affective states as input features to capture bi-directional affective dynamics and enhance the training of affect recognition models. Additionally, we investigate cross-stitch networks to maintain parameterized separation between shared and task-specific representations and task-specific uncertainty-weighted loss functions for contextual modeling of student affective states. We evaluate our approach using interaction and posture data captured from students engaged with a game-based learning environment for emergency medical training. Results indicate that the affective dynamics-based approach yields significant improvements in multimodal affect recognition across four different affective states. DA - 2021/// PY - 2021/// DO - 10.1109/ACII52823.2021.9597432 SP - SN - 2156-8103 KW - multitask learning KW - affect recognition KW - multimodal interaction KW - game-based learning environments ER - TY - JOUR TI - Removing the Walls Around Visual Educational Programming Environments AU - Broll, Brian AU - Ledeczi, Akos AU - Stein, Gordon AU - Jean, Devin AU - Brady, Corey AU - Grover, Shuchi AU - Catete, Veronica AU - Barnes, Tiffany T2 - 2021 IEEE SYMPOSIUM ON VISUAL LANGUAGES AND HUMAN-CENTRIC COMPUTING (VL/HCC 2021) AB - Many block-based programming environments have proven to be effective at engaging novices in learning programming. However, most restrict access to the outside world, limiting learners to commands and computing resources built in to the environment. Some allow learners to drag and drop files, connect to sensors and robots locally or issue HTTP requests. But in a world where most of the applications in our daily lives are distributed (i.e., their functionality depends on communicating with other programs or accessing resources and data on the internet), the lack of support for beginners to envision and create such distributed programs is a lost opportunity. This paper argues that it is not only feasible, but crucial, to create environments with simple yet powerful abstractions that open up distributed computing and other widely used but advanced computing concepts including networking, the Internet of Things, and cybersecurity to novices. By thus removing the walls around our environments, we can expand opportunities for learning considerably: programs can access a wealth of online data and web services, and communicate with other projects. Moreover, these changes can enable young learners to collaborate with each other during program construction whether they share their physical location or study remotely. Importantly, providing access to the wider world will also help counter widespread student perceptions that block-based environments are mere toys, and show that they are capable of creating compelling applications. The paper presents NetsBlox, a programming environment that supports these ideas and shows that tools can be designed to democratize access to powerful ideas in computing. DA - 2021/// PY - 2021/// DO - 10.1109/VL/HCC51201.2021.9576399 SP - SN - 1943-6092 ER - TY - JOUR TI - Designing a Visual Interface for Elementary Students to Formulate AI Planning Tasks AU - Park, Kyungjin AU - Mott, Bradford AU - Lee, Seung AU - Glazewski, Krista AU - Scribner, J. Adam AU - Ottenbreit-Leftwich, Anne AU - Hmelo-Silver, Cindy E. AU - Lester, James T2 - 2021 IEEE SYMPOSIUM ON VISUAL LANGUAGES AND HUMAN-CENTRIC COMPUTING (VL/HCC 2021) AB - Recent years have seen the rapid adoption of artificial intelligence (AI) in every facet of society. The ubiquity of AI has led to an increasing demand to integrate AI learning experiences into K-12 education. Early learning experiences incorporating AI concepts and practices are critical for students to better understand, evaluate, and utilize AI technologies. AI planning is an important class of AI technologies in which an AI-driven agent utilizes the structure of a problem to construct plans of actions to perform a task. Although a growing number of efforts have explored promoting AI education for K-12 learners, limited work has investigated effective and engaging approaches for delivering AI learning experiences to elementary students. In this paper, we propose a visual interface to enable upper elementary students (grades 3–5, ages 8–11) to formulate AI planning tasks within a game-based learning environment. We present our approach to designing the visual interface as well as how the AI planning tasks are embedded within narrative-centered gameplay structured around a Use-Modify-Create scaffolding progression. Further, we present results from a qualitative study of upper elementary students using the visual interface. We discuss how the Use-Modify-Create approach supported student learning as well as discuss the misconceptions and usability issues students encountered while using the visual interface to formulate AI planning tasks. DA - 2021/// PY - 2021/// DO - 10.1109/VL/HCC51201.2021.9576163 SP - SN - 1943-6092 KW - Artificial intelligence education for K-12 KW - Visual interface KW - Game-based learning ER - TY - JOUR TI - PEDI - Piazza Explorer Dashboard for Intervention AU - Akintunde, Ruth Okoilu AU - Limke, Ally AU - Barnes, Tiffany AU - Heckman, Sarah AU - Lynch, Collin T2 - 2021 IEEE SYMPOSIUM ON VISUAL LANGUAGES AND HUMAN-CENTRIC COMPUTING (VL/HCC 2021) AB - Analytics about how students navigate online learning tools throughout the duration of an assignment is scarce. Knowledge about how students use online tools before a course's end could positively impact students' learning outcomes. We introduce PEDI (Piazza Explorer Dashboard for Intervention), a tool which analyzes and presents visualizations of forum activity on Piazza, a question and answer forum, to instructors. We outline the design principles and data-informed recommendations used to design PEDI. Our prior research revealed two critical periods in students' forum engagement over the duration of an assignment. Early engagement in the first half of an assignment duration positively correlates with class average performance. Whereas, extremely high engagement toward the deadline predicted lower class average performance. PEDI uses these findings to detect and flag troubling engagement levels and informs instructors through clear visualizations to promote data-informed interventions. By providing insights to instructors, PEDI may improve class performance and pave the way for a new generation of online tools. DA - 2021/// PY - 2021/// DO - 10.1109/VL/HCC51201.2021.9576443 SP - SN - 1943-6092 KW - learning analytics dashboards KW - forum activity KW - real time visualizations ER - TY - JOUR TI - Scaffolding Game Design: Towards Tool Support for Planning Open-Ended Projects in an Introductory Game Design Class AU - Card, Alexander AU - Wang, Wengran AU - Martens, Chris AU - Price, Thomas T2 - 2021 IEEE SYMPOSIUM ON VISUAL LANGUAGES AND HUMAN-CENTRIC COMPUTING (VL/HCC 2021) AB - One approach to teaching game design to students with a wide variety of disciplinary backgrounds is through team game projects that span multiple weeks, up to an entire term. However, open-ended, creative projects introduce a gamut of challenges to novice programmers. Our goal is to assist game design students with the planning stage of their projects. This paper describes our data collection process through three course interventions and student interviews, and subsequent analysis in which we learned students had difficulty expressing their creative vision and connecting the game mechanics to the intended player experience. We present these results as a step towards the goal of scaffolding the planning process for student game projects, supporting more creative ideas, clearer communication among team members, and a stronger understanding of human-centered design in software development. DA - 2021/// PY - 2021/// DO - 10.1109/VL/HCC51201.2021.9576209 SP - SN - 1943-6092 KW - game design KW - game development KW - design documents KW - planning support tools KW - education KW - open-ended programming projects ER - TY - JOUR TI - Interactive Fiction Creation in Villanelle: Understanding and Supporting the Author Experience AU - Bacher, John Thomas AU - Martens, Chris T2 - 2021 IEEE SYMPOSIUM ON VISUAL LANGUAGES AND HUMAN-CENTRIC COMPUTING (VL/HCC 2021) AB - Villanelle is an interactive fiction authoring tool designed to support autonomous non-player characters, or “char-acter AI.” Character AI is notoriously challenging for interactive fiction authors to develop, especially for authors approaching interactive fiction from a writing rather than programming background. This paper describes a participatory design process in which we assess the author experience with Villanelle and build a new tool iteration to support their needs. The results of our first user study demonstrate the strong potential of Villanelle's incorporation of behavior trees as an easy-to-Learn computational model for character AI, but they also indicate syntax challenges for inexperienced programmers. Consequently, we developed a block-based programming interface for Villanelle and recruited a new set of study participants to evaluate this iteration using the same study instruments. The results indicate improvements in Villanelle's usability and creativity support for inexperienced programmers. DA - 2021/// PY - 2021/// DO - 10.1109/VL/HCC51201.2021.9576417 SP - SN - 1943-6092 KW - interactive fiction KW - block-based programming KW - behavior trees KW - game development KW - developer experience ER - TY - JOUR TI - A Characteristic Study of Deadlocks in Database-Backed Web Applications AU - Qiu, Zhengyi AU - Shao, Shudi AU - Zhao, Qi AU - Jin, Guoliang T2 - 2021 IEEE 32ND INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING (ISSRE 2021) AB - Deadlocks in database-backed web applications could involve different numbers of HTTP requests, and they could be caused by locks explicitly requested in application code or implicitly requested by databases during query execution. To help developers understand these deadlocks and guide the design of tools for combating these deadlocks, we conduct a characteristic study with 49 deadlocks collected from real-world web applications developed following different programming paradigms. We provide categorization results based on HTTP request numbers and resource types, with a special focus on cat-egorizing deadlocks on database locks. We expect our results to be useful for application developers to understand web-application deadlocks and for tool researchers to design comprehensive support for combating web-application deadlocks. DA - 2021/// PY - 2021/// DO - 10.1109/ISSRE52982.2021.00059 SP - 510-521 SN - 1071-9458 ER - TY - JOUR TI - Data-Driven Edge Resource Provisioning for Inter-Dependent Microservices with Dynamic Load AU - Yu, Ruozhou AU - Lo, Szu-Yu AU - Zhou, Fangtong AU - Xue, Guoliang T2 - 2021 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM) AB - This paper studies how to provision edge computing and network resources for complex microservice-based applications (MSAs) in face of uncertain and dynamic geo-distributed demands. The complex inter-dependencies between distributed microservice components make load balancing for MSAs extremely challenging, and the dynamic geo-distributed demands exacerbate load imbalance and consequently congestion and performance loss. In this paper, we develop an edge resource provisioning model that accurately captures the inter-dependencies between microservices and their impact on load balancing across both computation and communication resources. We also propose a robust formulation that employs explicit risk estimation and optimization to hedge against potential worst-case load fluctuations, with controlled robustness-resource trade-off. Utilizing a data-driven approach, we provide a solution that provides risk estimation with measurement data of past load geo-distributions. Simulations with real-world datasets have validated that our solution provides the important robustness crucially needed in MSAs, and performs superiorly compared to baselines that neglect either network or inter-dependency constraints. DA - 2021/// PY - 2021/// DO - 10.1109/GLOBECOM46510.2021.9685155 SP - SN - 2576-6813 KW - Edge computing KW - microservice KW - load balancing KW - resource provisioning KW - robustness KW - data-driven ER - TY - JOUR TI - Parameterized Exhaustive Routing with First Fit for RSA Problem Variants AU - Rouskas, George N. AU - Bandikatla, Chaitanya T2 - 2021 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM) AB - We present a new single-step solution approach for the routing and spectrum allocation (RSA) problem that integrates the first-fit (FF) heuristic with a new routing strategy that we refer to as “parameterized exhaustive routing.” Our approach is to explore the whole routing space for a subset of the traffic requests, e.g., those with the largest demands or those of higher priority or importance. For each of the remaining requests we employ a greedy heuristic to select one of the candidate paths jointly with spectrum allocation. Our solution represents a two-parameter family of algorithms that bridges the gap between an exhaustive search of the routing space and current two-step methodologies for the RSA problem that select paths for each traffic request in isolation. The parameter values may be used to trade off the quality of the final solution and the computational requirements. Our results indicate that exploring the joint routing space of even a few large requests leads to better solutions than purely greedy approaches. DA - 2021/// PY - 2021/// DO - 10.1109/GLOBECOM46510.2021.9685126 SP - SN - 2576-6813 ER - TY - JOUR TI - Advanced Secure DNS Name Autoconfiguration with Authentication for Enterprise IoT Network AU - Kim, Tae Hyun AU - Reeves, Douglas AU - Dutta, Rudra T2 - 2021 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM) AB - Internet of Things (IoT) is an intelligent infrastructure and service technology that connects objects to people for monitoring and control. The number of IoT devices is rapidly increasing in various environments. Although the DNS protocol is being applied to IoT networks to create unique identifiers, it is burdensome for users to manually create and configure a globally unique name for each device. DNS Name Autoconfiguration (DNSNA) was proposed to register the DNS name of IoT devices automatically and utilize IoT devices globally. However, DNSNA without secure authentication and authorization leads to potential threats, such as the registration of malicious IoT devices, and other IoT security attacks. In this paper, we propose an Advanced Secure DNS name autoconfiguration with Authentication and Authorization for enterprise IoT network (ASDAI). Especially, we provide the first model using the convergence of extended OAuth 2.0 and Kerberos v5. The proposed protocol supports (1) reliable device / administrator registration, (2) secure DNS name autoconfiguration, and (3) user / service authentication and authorization procedure for the heterogeneity and scalability of enterprise IoT networks. DA - 2021/// PY - 2021/// DO - 10.1109/GLOBECOM46510.2021.9685237 SP - SN - 2576-6813 KW - Internet of Things KW - security KW - DNS KW - authentication KW - authorization KW - enterprise IoT network ER - TY - JOUR TI - Maintenance of Social Commitments in Multiagent Systems AU - Telang, Pankaj R. AU - Singh, Munindar P. AU - Yorke-Smith, Neil T2 - Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI) DA - 2021/2// PY - 2021/2// VL - 35 IS - 13 SP - 11369-11377 UR - https://ojs.aaai.org/index.php/AAAI/article/view/17355 ER - TY - CONF TI - Tango: Declarative Semantics for Multiagent Communication Protocols AU - Singh, Munindar P. AU - V., Samuel H. Christie AB - A flexible communication protocol is necessary to build a decentralized multiagent system whose member agents are not coupled to each other's decision making. Information-based protocol languages capture a protocol in terms of causality and integrity constraints based on the information exchanged by the agents. Thus, they enable highly flexible enactments in which the agents proceed asynchronously and messages may be arbitrarily reordered. However, the existing semantics for such languages can produce a large number of protocol enactments, which makes verification of a protocol property intractable. This paper formulates a protocol semantics declaratively via inference rules that determine when a message emission or reception becomes enabled during an enactment, and its effect on the local state of an agent. The semantics enables heuristics for determining when alternative extensions of a current enactment would be equivalent, thereby helping produce parsimonious models and yielding improved protocol verification methods. C2 - 2021/8// C3 - Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence DA - 2021/8// DO - 10.24963/ijcai.2021/55 PB - International Joint Conferences on Artificial Intelligence Organization UR - http://dx.doi.org/10.24963/ijcai.2021/55 ER - TY - JOUR TI - Characterizing the Security of Endogenous and Exogenous Desktop Application Network Flows AU - McNiece, Matthew R. AU - Li, Ruidan AU - Reaves, Bradley T2 - PASSIVE AND ACTIVE MEASUREMENT, PAM 2021 AB - Most desktop applications use the network, and insecure communications can have a significant impact on the application, the system, the user, and the enterprise. Understanding at scale whether desktop application use the network securely is a challenge because the application provenance of a given network packet is rarely available at centralized collection points. In this paper, we collect flow data from 39,758 MacOS devices on an enterprise network to study the network behaviors of individual applications. We collect flows locally on-device and can definitively identify the application responsible for every flow. We also develop techniques to distinguish “endogenous” flows common to most executions of a program from “exogenous” flows likely caused by unique inputs. We find that popular MacOS applications are in fact using the network securely, with 95.62% of the applications we study using HTTPS. Notably, we observe security sensitive-services (including certificate management and mobile device management) do not use ports associated with secure communications. Our study provides important insights for users, device and network administrators, and researchers interested in secure communication. DA - 2021/// PY - 2021/// DO - 10.1007/978-3-030-72582-2_31 VL - 12671 SP - 531-546 SN - 1611-3349 ER - TY - JOUR TI - Recurrent Neural Networks Meet Context-Free Grammar: Two Birds with One Stone AU - Guan, Hui AU - Chaudhary, Umang AU - Xu, Yuanchao AU - Ning, Lin AU - Zhang, Lijun AU - Shen, Xipeng T2 - 2021 21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2021) AB - Recurrent Neural Networks (RNN) are widely used for various prediction tasks on sequences such as text, speed signals, program traces, and system logs. Due to RNNs’ inherently sequential behavior, one key challenge for the effective adoption of RNNs is to reduce the time spent on RNN inference and to increase the scope of a prediction. This work introduces CFG-guided compressed learning, an approach that creatively integrates Context-Free Grammar (CFG) and online tokenization into RNN learning and inference for streaming inputs. Through a hierarchical compression algorithm, it compresses an input sequence to a CFG and makes predictions based on the compressed sequence. Its algorithm design employs a set of techniques to overcome the issues from the myopic nature of online tokenization, the tension between inference accuracy and compression rate, and other complexities. Experiments on 16 real-world sequences of various types validate that the proposed compressed learning can successfully recognize and leverage repetitive patterns in input sequences, and effectively translate them into dramatic (1-1762×) inference speedups as well as much (1-7830×) expanded prediction scope, while keeping the inference accuracy satisfactory. DA - 2021/// PY - 2021/// DO - 10.1109/ICDM51629.2021.00125 SP - 1078-1083 SN - 1550-4786 KW - recurrent neural networks KW - data compression KW - context free grammar KW - tokenization ER - TY - JOUR TI - Algorithms that Empower? Platformization in US Intelligence Analysis AU - Schmidt, Matthew AU - Vogel, Kathleen M. T2 - PROCEEDINGS OF THE 2020 IEEE INTERNATIONAL SYMPOSIUM ON TECHNOLOGY AND SOCIETY (ISTAS) AB - This paper discusses a computational architecture called the Analytic Component System (ACS), which aims to provide intelligence analysts with a service-oriented computational platform. This platform is designed to empower intelligence analysts by improving the integration of people, algorithms, software, tools, and manual work in the production of time-pressured intelligence assessments. Combining the perspectives of the ACS computer science design team and an embedded social scientist, this paper will use ACS to discuss the “platformization” of intelligence analysis and what this means for how we might think about and plan for reflexive design in future computational intelligence analytic systems. DA - 2021/// PY - 2021/// DO - 10.1109/ISTAS50296.2020.9555838 SP - SN - 2158-3404 KW - platforms KW - intelligence analysis KW - socio-technical systems ER - TY - JOUR TI - FRUGAL: Unlocking Semi-Supervised Learning for Software Analytics AU - Tu, Huy AU - Menzies, Tim T2 - 2021 36TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING ASE 2021 AB - Standard software analytics often involves having a large amount of data with labels in order to commission models with acceptable performance. However, prior work has shown that such requirements can be expensive, taking several weeks to label thousands of commits, and not always available when traversing new research problems and domains. Unsupervised Learning is a promising direction to learn hidden patterns within unlabelled data, which has only been extensively studied in defect prediction. Nevertheless, unsupervised learning can be ineffective by itself and has not been explored in other domains (e.g., static analysis and issue close time).Motivated by this literature gap and technical limitations, we present FRUGAL, a tuned semi-supervised method that builds on a simple optimization scheme that does not require sophisticated (e.g., deep learners) and expensive (e.g., 100% manually labelled data) methods. FRUGAL optimizes the unsupervised learner’s configurations (via a simple grid search) while validating our design decision of labelling just 2.5% of the data before prediction.As shown by the experiments of this paper FRUGAL outperforms the state-of-the-art adoptable static code warning recognizer and issue closed time predictor, while reducing the cost of labelling by a factor of 40 (from 100% to 2.5%). Hence we assert that FRUGAL can save considerable effort in data labelling especially in validating prior work or researching new problems.Based on this work, we suggest that proponents of complex and expensive methods should always baseline such methods against simpler and cheaper alternatives. For instance, a semi-supervised learner like FRUGAL can serve as a baseline to the state-of-theart software analytics. DA - 2021/// PY - 2021/// DO - 10.1109/ASE51524.2021.9678617 SP - 394-406 KW - Software Analytics KW - Data Labelling Efforts KW - Semi-Supervised Learning ER - TY - JOUR TI - One-Time Traceable Ring Signatures AU - Scafuro, Alessandra AU - Zhang, Bihan T2 - COMPUTER SECURITY - ESORICS 2021, PT II AB - A ring signature allows a party to sign messages anonymously on behalf of a group, which is called ring. Traceable ring signatures are a variant of ring signatures that limits the anonymity guarantees, enforcing that a member can sign anonymously at most one message per tag. Namely, if a party signs two different messages for the same tag, it will be de-anomymized. This property is very useful in decentralized platforms to allow members to anonymously endorse statements in a controlled manner.In this work we introduce one-time traceable ring signatures, where a member can sign anonymously only one message. This natural variant suffices in many applications for which traceable ring signatures are useful, and enables us to design a scheme that only requires a few hash evaluations and outperforms existing (non one-time) schemes.Our one-time traceable ring signature scheme presents many advantages: it is fast, with a signing time of less than 1 s for a ring of \(2^{10}\) signers (and much less for smaller rings); it is post-quantum resistant, as it only requires hash evaluations; it is extremely simple, as it requires only a black-box access to a generic hash function (modeled as a random oracle) and no other cryptographic operation is involved. From a theoretical standpoint our scheme is also the first anonymous signature scheme based on a black-box access to a symmetric-key primitive. All existing anonymous signatures are either based on specific hardness assumptions (e.g., LWE, SIS, etc.) or use the underlying symmetric-key primitive in a non-black-box way, i.e., they leverage the circuit representation of the primitive. DA - 2021/// PY - 2021/// DO - 10.1007/978-3-030-88428-4_24 VL - 12973 SP - 481-500 SN - 1611-3349 ER - TY - JOUR TI - Towards Better Support for Undergraduate Software Engineering Teams AU - Presler-Marshall, Kai T2 - ICER 2021: PROCEEDINGS OF THE 17TH ACM CONFERENCE ON INTERNATIONAL COMPUTING EDUCATION RESEARCH AB - Team-based projects are increasingly used within software engineering education because they can teach valuable communication and collaboration skills to help prepare students for professional software engineering positions. However, team-based projects are not without their downsides: in particular, poor communication or a lack of participation can endanger the success of the project. We propose identifying metrics and building a predictive model to help instructors detect when teams are facing harmful dynamics, and evaluations to assess the metrics and their impact on teams in undergraduate software engineering courses. DA - 2021/// PY - 2021/// DO - 10.1145/3446871.3469773 SP - 405-406 ER - TY - JOUR TI - You Really Need Help: Exploring Expert Reasons for Intervention During Block-based Programming Assignments AU - Dong, Yihuan AU - Shabrina, Preya AU - Marwan, Samiha AU - Barnes, Tiffany T2 - ICER 2021: PROCEEDINGS OF THE 17TH ACM CONFERENCE ON INTERNATIONAL COMPUTING EDUCATION RESEARCH AB - In recent years, research has increasingly focused on developing intelligent tutoring systems that provide data-driven support for students in need of assistance during programming assignments. One goal of such intelligent tutors is to provide students with quality interventions comparable to those human tutors would give. While most studies focused on generating different forms of on-demand support, such as next-step hints and worked examples, at any given moment during the programming assignment, there is a lack of research on why human tutors would provide different forms of proactive interventions to students in different situations. This information is critical to know to allow the intelligent programming environments to select the appropriate type of student support at the right moment. DA - 2021/// PY - 2021/// DO - 10.1145/3446871.3469764 SP - 334-346 KW - novice programming KW - proactive intervention KW - block-based environments KW - programming assignments KW - expert intervention ER - TY - JOUR TI - Exploring and Influencing Teacher Grading for Block-based Programs through Rubrics and the GradeSnap Tool AU - Milliken, Alexandra AU - Catete, Veronica AU - Limke, Ally AU - Gransbury, Isabella AU - Chipman, Hannah AU - Dong, Yihuan AU - Barnes, Tiffany T2 - ICER 2021: PROCEEDINGS OF THE 17TH ACM CONFERENCE ON INTERNATIONAL COMPUTING EDUCATION RESEARCH AB - This article examines the grading process and profiles of secondary computer science teachers as they assess block-based student programming submissions. Through an iterative design process, we have created a new tool, Gradesnap, which streamlines how teachers can open, review, and evaluate student submissions within the same interface. Our study compares teachers’ grading processes using the different assessment formats, so that we can understand how their grading processes can be augmented or supported to reduce ’pain points’ and to enable teachers to provide more constructive and formative feedback for students. We use a case study approach to examine the experiences and outcomes of four secondary computer science teachers with varied teaching and assessment experience, when grading as usual, grading with a rubric, and grading with GradeSnap. Our study shows that when participants use GradeSnap, they are able to give supportive comments to lower performing and borderline students who need critical feedback to better understand misconceptions. We also discovered that the different grading processes provided a vehicle for reflection for some teachers in understanding their grading goals and how they enact them. This research is the first to examine teacher grading processes for computer science, and highlights the need for teacher preparation and support for providing programming feedback and assessment. DA - 2021/// PY - 2021/// DO - 10.1145/3446871.3469762 SP - 101-114 KW - block-based languages KW - grading and assessment tools KW - secondary teacher tools ER - TY - JOUR TI - Browserprint: An Analysis of the Impact of Browser Features on Fingerprintability and Web Privacy AU - Akhavani, Seyed Ali AU - Jueckstock, Jordan AU - Su, Junhua AU - Kapravelos, Alexandros AU - Kirda, Engin AU - Lu, Long T2 - INFORMATION SECURITY (ISC 2021) AB - Web browsers are indispensable applications in our daily lives. Millions of users use web browsers for a wide range of activities such as social media, online shopping, emails, or surfing the web. The evolution of increasingly more complicated web applications relies on browsers constantly adding and removing features. At the same time, some of these web services use browser fingerprinting to track and profile their users with clear disregard for their web privacy. In this paper, we perform an empirical analysis of browser features evolution and aim to evaluate browser fingerprintability. By analyzing 33 Google Chrome, 31 Mozilla Firefox, and 33 Opera major browser versions released through 2016 to 2020, we discover that all of these browsers have unique feature sets which makes them different from each other. By comparing these features to the fingerprinting APIs presented in literature that have appeared in this field, we conclude that all of these browser versions are uniquely fingerprintable. Our results show an alarming trend that browsers are becoming more fingerprintable over time because newer versions contain more fingerprintable APIs compared to older ones. DA - 2021/// PY - 2021/// DO - 10.1007/978-3-030-91356-4_9 VL - 13118 SP - 161-176 SN - 1611-3349 KW - Browser security KW - Fingerprinting KW - Privacy KW - Web security ER - TY - JOUR TI - Esports and High Performance HCI AU - Watson, Benjamin AU - Spjut, Josef AU - Kim, Joohwan AU - Listman, Jennifer AU - Kim, Sunjun AU - Wimmer, Raphael AU - Putrino, David AU - Lee, Byungjoo T2 - EXTENDED ABSTRACTS OF THE 2021 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI'21) AB - Competitive esports is a growing worldwide phenomenon now rivaling traditional sports, with over 450 million views and 1 billion US dollars in revenue each year. For comparison, Major League Baseball has 500 million views and $10 billion in revenue, FIFA Soccer 900 million and $1.6 billion. Despite this significant popularity, much of the world remains unaware of esports — and in particular, research on and for esports is still extremely scarce compared to esports’ impact and potential. DA - 2021/// PY - 2021/// DO - 10.1145/3411763.3441313 SP - KW - esports KW - expert users KW - expert interaction techniques ER - TY - JOUR TI - Data to Donations: Towards In-Kind Food Donation Prediction across Two Coasts AU - Sharma, Esha AU - Davis, Lauren AU - Ivy, Julie AU - Chi, Min T2 - 2021 IEEE GLOBAL HUMANITARIAN TECHNOLOGY CONFERENCE (GHTC) AB - Our goal in this work is to build effective yet robust models to predict unreliable and inconsistent in-kind donations at both weekly and monthly levels for two food banks across coasts: the Food Bank of Central Eastern North Carolina in North Carolina and Los Angeles Regional Food Bank in California. We explore three factors: model, data length, and window type. For the model, we evaluate a series of classic time-series forecasting models against the state-of-the-art approaches such as Bayesian Structural Time Series modeling (BSTS) and deep learning models; for the data length, we vary training data from 2 weeks to 13 years; for the window type, we compare sliding vs. expanding. Our results show the effectiveness of different models heavily depends on the data length and the window type as well as characteristics of the food bank. Motivated by these findings, we investigate the effectiveness of employing an average of all predictions formed by considering all three factors at both monthly and weekly levels for both food banks. Our results show that this average of predictions significantly and consistently outperforms all classical models, deep learning, and BSTS for the donation prediction at both monthly and weekly levels for both food banks. DA - 2021/// PY - 2021/// DO - 10.1109/GHTC53159.2021.9612484 SP - 281-288 SN - 2377-6919 KW - Food Insecurity KW - Humanitarian Supply Chain KW - Bayesian Structural Time Series KW - Long Short Term Memory KW - Training Length KW - Expanding and Sliding Window ER - TY - JOUR TI - Understanding People's Attitude and Concerns towards Adopting IoT Devices AU - Lafontaine, Evan AU - Sabir, Aafaq AU - Das, Anupam T2 - EXTENDED ABSTRACTS OF THE 2021 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI'21) AB - The proliferation of the Internet of Things (IoT) has started transforming our lifestyle through automation of home appliances. However, there are users who are hesitant to adopt IoT devices due to various privacy and security concerns. In this paper, we elicit peoples’ attitude and concerns towards adopting IoT devices. We conduct an online survey and collect responses from 232 participants from three different geographic regions (United States, Europe, and India); the participants consist of both adopters and non-adopters of IoT devices. Through data analysis, we determine that there are both similarities and differences in perceptions and concerns between adopters and non-adopters. For example, even though IoT and non-IoT users share similar security and privacy concerns, IoT users are more comfortable using IoT devices in private settings compared to non-IoT users. Furthermore, when comparing users’ attitude and concerns across different geographic regions, we found similarities between participants from the US and Europe, yet participants from India showcased contrasting behavior. For instance, we found that participants from India were more trusting in their government to properly protect consumer data and were more comfortable using IoT devices in a variety of public settings, compared to participants from the US and Europe. Based on our findings, we provide recommendations to reduce users’ concerns in adopting IoT devices, and thereby enhance user trust towards adopting IoT devices. DA - 2021/// PY - 2021/// DO - 10.1145/3411763.3451633 SP - KW - Internet of Things (IoT) KW - user attitude KW - cross-societal concerns ER - TY - JOUR TI - HPCFAIR: Enabling FAIR AI for HPC Applications AU - Verma, Gaurav AU - Emani, Murali AU - Liao, Chunhua AU - Lin, Pei-Hung AU - Vanderbruggen, Tristan AU - Shen, Xipeng AU - Chapman, Barbara T2 - PROCEEDINGS OF THE WORKSHOP ON MACHINE LEARNING IN HIGH PERFORMANCE COMPUTING ENVIRONMENTS (MLHPC 2021) AB - Artificial Intelligence (AI) is being adopted in different domains at an unprecedented scale. A significant interest in the scientific community also involves leveraging machine learning (ML) to effectively run high performance computing applications at scale. Given multiple efforts in this arena, there are often duplicated efforts when existing rich data sets and ML models could be leveraged instead. The primary challenge is a lack of an ecosystem to reuse and reproduce the models and datasets. In this work, we propose HPCFAIR, a modular, extensible framework to enable AI models to be Findable, Accessible, Interoperable and Reproducible (FAIR). It enables users with a structured approach to search, load, save and reuse the models in their codes. We present the design, implementation of our framework and highlight how it can be seamlessly integrated to ML-driven applications for high performance computing applications and scientific machine learning workloads. DA - 2021/// PY - 2021/// DO - 10.1109/MLHPC54614.2021.00011 SP - 58-68 KW - HPC KW - FAIR KW - AI models KW - datasets KW - neural networks ER - TY - JOUR TI - HPC Ontology: Towards a Unified Ontology for Managing Training Datasets and AI Models for High-Performance Computing AU - Liao, Chunhua AU - Lin, Pei-Hung AU - Verma, Gaurav AU - Vanderbruggen, Tristan AU - Emani, Murali AU - Nan, Zifan AU - Shen, Xipeng T2 - PROCEEDINGS OF THE WORKSHOP ON MACHINE LEARNING IN HIGH PERFORMANCE COMPUTING ENVIRONMENTS (MLHPC 2021) AB - Machine learning (ML) techniques have been widely studied to address various challenges of productively and efficiently running large-scale scientific applications on heterogeneous supercomputers. However, it is extremely difficult to generate, access, and maintain training datasets and AI models to accelerate ML-based research. The Future of Research Communications and e-Scholarship has proposed the FAIR data principles describing Findability, Accessibility, Interoperability, and Reusability. In this paper, we present our ongoing work of designing an ontology for high-performance computing (named HPC ontology) in order to make training datasets and AI models FAIR. Our ontology provides controlled vocabularies, explicit semantics, and formal knowledge representations. Our design uses an extensible two-level pattern, capturing both high-level meta information and low-level data content for software, hardware, experiments, workflows, training datasets, AI models, and so on. Preliminary evaluation shows that HPC ontology is effective to annotate selected data and support a set of SPARQL queries. DA - 2021/// PY - 2021/// DO - 10.1109/MLHPC54614.2021.00012 SP - 69-80 KW - Ontology KW - HPC KW - FAIR KW - datasets KW - AI models ER - TY - JOUR TI - Cookie Swap Party: Abusing First-PartyCookies for Web Tracking AU - Chen, Quan AU - Ilia, Panagiotis AU - Polychronakis, Michalis AU - Kapravelos, Alexandros T2 - PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2021 (WWW 2021) AB - As a step towards protecting user privacy, most web browsers perform some form of third-party HTTP cookie blocking or periodic deletion by default, while users typically have the option to select even stricter blocking policies. As a result, web trackers have shifted their efforts to work around these restrictions and retain or even improve the extent of their tracking capability. DA - 2021/// PY - 2021/// DO - 10.1145/3442381.3449837 SP - 2117-2129 ER - TY - JOUR TI - Toward Efficient Interactions between Python and Native Libraries AU - Tan, Jialiang AU - Chen, Yu AU - Liu, Zhenming AU - Ren, Bin AU - Song, Shuaiwen Leon AU - Shen, Xipeng AU - Liu, Xu T2 - PROCEEDINGS OF THE 29TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '21) AB - Python has become a popular programming language because of its excellent programmability. Many modern software packages utilize Python for high-level algorithm design and depend on native libraries written in C/C++/Fortran for efficient computation kernels. Interaction between Python code and native libraries introduces performance losses because of the abstraction lying on the boundary of Python and native libraries. On the one side, Python code, typically run with interpretation, is disjoint from its execution behavior. On the other side, native libraries do not include program semantics to understand algorithm defects. To understand the interaction inefficiencies, we extensively study a large collection of Python software packages and categorize them according to the root causes of inefficiencies. We extract two inefficiency patterns that are common in interaction inefficiencies. Based on these patterns, we develop PieProf, a lightweight profiler, to pinpoint interaction inefficiencies in Python applications. The principle of PieProf is to measure the inefficiencies in the native execution and associate inefficiencies with high-level Python code to provide a holistic view. Guided by PieProf, we optimize 17 real-world applications, yielding speedups up to 6.3$\times$ on application level. DA - 2021/// PY - 2021/// DO - 10.1145/3468264.3468541 SP - 1117-1128 KW - Python KW - profiling KW - PMU KW - debug register ER - TY - JOUR TI - Infiltrating Security into Development: Exploring the World' Largest Software Security Study AU - Weir, Charles AU - Migues, Sammy AU - Ware, Mike AU - Williams, Laurie T2 - PROCEEDINGS OF THE 29TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '21) DA - 2021/// PY - 2021/// DO - 10.1145/34682643473926 SP - 1326-1336 KW - Software engineering KW - Software security KW - Developer centered security KW - Software security group KW - Secure software development lifecycle KW - SDLC KW - DevSecOps ER - TY - JOUR TI - Documenting Evidence of a Replication of 'Populating a Release History Database from Version Control and Bug Tracking Systems' AU - Yang, Xueqi AU - Menzies, Tim T2 - PROCEEDINGS OF THE 29TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '21) AB - We report here the use of a keyword-based and regular expression-based approach to identify bug-fixing commits by linking commit messages and issue tracker data in a recent FSE '20 paper by Penta et al. in their paper "On the Relationship between Refactoring Actions and Bugs: A Differentiated Replication". The approach replicated is a keyword-based and regular expression-based approach as studied by Fischer et al. DA - 2021/// PY - 2021/// DO - 10.1145/3468264.3477218 SP - 1601-1601 UR - https://doi.org/10.1145/3468264.3477218 KW - reuse KW - replication KW - bug fixing KW - text tagging ER - TY - JOUR TI - Understanding and Detecting Server-Side Request Races in Web Applications AU - Qiu, Zhengyi AU - Zhao, Qi AU - Shao, Shudi AU - Jin, Guoliang T2 - PROCEEDINGS OF THE 29TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '21) AB - Modern web sites often run web applications on the server to handle HTTP requests from users and generate dynamic responses. Due to their concurrent nature, web applications are vulnerable to server-side request races. The problem becomes more severe with the ever-increasing popularity of web applications. DA - 2021/// PY - 2021/// DO - 10.1145/3468264.3468594 SP - 842-854 KW - web-application request races KW - characteristic study KW - race detection KW - happens-before relationships ER - TY - JOUR TI - Documenting Evidence of a Reuse of 'RefactoringMiner 2.0' AU - Lustosa, Andre AU - Menzies, Tim T2 - PROCEEDINGS OF THE 29TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '21) AB - This submission is a report on the reuse of Tsantalis et al.'s Refactoring Miner (RMiner) package by Penta et al. DA - 2021/// PY - 2021/// DO - 10.1145/3468264.3477215 SP - 1598-1598 UR - https://doi.org/10.1145/3468264.3477215 KW - reuse KW - refactoring KW - bug introduction KW - mining software repositories ER - TY - JOUR TI - Documenting Evidence of a Reuse of 'A Systematic Literature Review of Techniques and Metrics to Reduce the Cost of Mutation Testing' AU - Lustosa, Andre AU - Menzies, Tim T2 - PROCEEDINGS OF THE 29TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '21) AB - This submission is a report on the reuse of Pizzoleto et al.'s Systematic Literature Review by Guizzo et al. DA - 2021/// PY - 2021/// DO - 10.1145/3468264.3477214 SP - 1597-1597 UR - https://doi.org/10.1145/3468264.3477214 KW - reuse KW - reproduction KW - mutation testing KW - systematic literature review ER - TY - JOUR TI - Cross-Language Code Search using Static and Dynamic Analyses AU - Mathew, George AU - Stolee, Kathryn T. T2 - PROCEEDINGS OF THE 29TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '21) AB - As code search permeates most activities in software development,code-to-code search has emerged to support using code as a query and retrieving similar code in the search results. Applications include duplicate code detection for refactoring, patch identification for program repair, and language translation. Existing code-to-code search tools rely on static similarity approaches such as the comparison of tokens and abstract syntax trees (AST) to approximate dynamic behavior, leading to low precision. Most tools do not support cross-language code-to-code search, and those that do, rely on machine learning models that require labeled training data. We present Code-to-Code Search Across Languages (COSAL), a cross-language technique that uses both static and dynamic analyses to identify similar code and does not require a machine learning model. Code snippets are ranked using non-dominated sorting based on code token similarity, structural similarity, and behavioral similarity. We empirically evaluate COSAL on two datasets of 43,146Java and Python files and 55,499 Java files and find that 1) code search based on non-dominated ranking of static and dynamic similarity measures is more effective compared to single or weighted measures; and 2) COSAL has better precision and recall compared to state-of-the-art within-language and cross-language code-to-code search tools. We explore the potential for using COSAL on large open-source repositories and discuss scalability to more languages and similarity metrics, providing a gateway for practical,multi-language code-to-code search. DA - 2021/// PY - 2021/// DO - 10.1145/3468264.3468538 SP - 205-217 KW - code-to-code search KW - cross-language code search KW - non-dominated sorting KW - static analysis KW - dynamic analysis ER - TY - JOUR TI - Documenting Evidence of a Reuse of "'Why Should I Trust You?": Explaining the Predictions of Any Classifier' AU - Peng, Kewen AU - Menzies, Tim T2 - PROCEEDINGS OF THE 29TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '21) AB - We report here the following example of reuse. LIME is a local instance-based explanation generation framework that was originally proposed by Ribeiro et al. in their paper "'Why Should I Trust You?': Explaining the Predictions of Any Classifier". The framework was reused by Peng et al. in their paper "Defect Reduction Planning (using TimeLIME)". The paper used the original implementation of LIME as one of the core components in the proposed framework. DA - 2021/// PY - 2021/// DO - 10.1145/3468264.3477217 SP - 1600-1600 UR - https://doi.org/10.1145/3468264.3477217 KW - Software analytics KW - Actionable analysis ER - TY - JOUR TI - Bias in Machine Learning Software: Why? How? What to Do? AU - Chakraborty, Joymallya AU - Majumder, Suvodeep AU - Menzies, Tim T2 - PROCEEDINGS OF THE 29TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '21) AB - Increasingly, software is making autonomous decisions in case of criminal sentencing, approving credit cards, hiring employees, and so on. Some of these decisions show bias and adversely affect certain social groups (e.g. those defined by sex, race, age, marital status). Many prior works on bias mitigation take the following form: change the data or learners in multiple ways, then see if any of that improves fairness. Perhaps a better approach is to postulate root causes of bias and then applying some resolution strategy. This paper postulates that the root causes of bias are the prior decisions that affect- (a) what data was selected and (b) the labels assigned to those examples. Our Fair-SMOTE algorithm removes biased labels; and rebalances internal distributions such that based on sensitive attribute, examples are equal in both positive and negative classes. On testing, it was seen that this method was just as effective at reducing bias as prior approaches. Further, models generated via Fair-SMOTE achieve higher performance (measured in terms of recall and F1) than other state-of-the-art fairness improvement algorithms. To the best of our knowledge, measured in terms of number of analyzed learners and datasets, this study is one of the largest studies on bias mitigation yet presented in the literature. DA - 2021/// PY - 2021/// DO - 10.1145/3468264.3468537 SP - 429-440 UR - https://doi.org/10.1145/3468264.3468537 KW - Software Fairness KW - Fairness Metrics KW - Bias Mitigation ER - TY - JOUR TI - Documenting Evidence of a Reproduction of Is There A "Golden" Feature Set for Static Warning Identification? - An Experimental Evaluation' AU - Yang, Xueqi AU - Menzies, Tim T2 - PROCEEDINGS OF THE 29TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '21) AB - We report here the use of the static analysis dataset generated by FindBugs in a recent EMSE '21 paper by Yang et al. The artifact reproduced is supervised models to perform static analysis based on a golden feature set as studied by Wang et al. DA - 2021/// PY - 2021/// DO - 10.1145/3468264.3477220 SP - 1603-1603 UR - https://doi.org/10.1145/3468264.3477220 KW - reuse KW - reproduction KW - static analysis KW - deep learning ER - TY - JOUR TI - Documenting Evidence of a Replication of 'Analyze This! 145 Questions for Data Scientists in Software Engineering' AU - Yang, Xueqi AU - Menzies, Tim T2 - PROCEEDINGS OF THE 29TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '21) AB - We report here the use of the 145 software engineering questions for data scientists presented in the Microsoft study in a recent FSE~'20 paper by Huijgens et al. The study by Begel et al. was replicated by Huijgens et al. DA - 2021/// PY - 2021/// DO - 10.1145/3468264.3477219 SP - 1602-1602 UR - https://doi.org/10.1145/3468264.3477219 KW - reuse KW - replication KW - data science KW - software analysis ER - TY - JOUR TI - Documenting Evidence of a Reuse of 'What is a Feature? A Qualitative Study of Features in Industrial Software Product Lines' AU - Peng, Kewen AU - Menzies, Tim T2 - PROCEEDINGS OF THE 29TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '21) AB - We report here the following example of reuse. The original paper is a prior work about features in product lines by Berger et al. The paper "Dimensions of software configuration: on the configuration context in modern software development" by Siegmund et al. reused definitions and theories about configuration features in the original paper. DA - 2021/// PY - 2021/// DO - 10.1145/3468264.3477216 SP - 1599-1599 UR - https://doi.org/10.1145/3468264.3477216 KW - Software analytics KW - Software product lines KW - Software configuration ER - TY - JOUR TI - Mapping Constraint Problems onto Quantum Gate and Annealing Devices AU - Wilson, Ellis AU - Mueller, Frank AU - Pakin, Scott T2 - PROCEEDINGS OF SECOND INTERNATIONAL WORKSHOP ON QUANTUM COMPUTING SOFTWARE (QCS 2021) AB - This work presents NchooseK, a unified programming model for constraint satisfaction problems that can be mapped to both quantum circuit and annealing devices through Quadratic Unconstrained Binary Operators (QUBOs). Our mapping provides an approachable and effective way to program both types of quantum computers. We provide examples of NchooseK being used. DA - 2021/// PY - 2021/// DO - 10.1109/QCS54837.2021.00016 SP - 110-117 KW - circuit-model quantum computing KW - quantum annealing KW - programming models ER - TY - JOUR TI - COVID-19 Knowledge Extractor (COKE): A Curated Repository of Drug-Target Associations Extracted from the CORD-19 Corpus of Scientific Publications on COVID-19 AU - Korn, Daniel AU - Pervitsky, Vera AU - Bobrowski, Tesia AU - Alves, Vinicius M. AU - Schmitt, Charles AU - Bizon, Chris AU - Baker, Nancy AU - Chirkova, Rada AU - Cherkasov, Artem AU - Muratov, Eugene AU - Tropsha, Alexander T2 - JOURNAL OF CHEMICAL INFORMATION AND MODELING AB - The COVID-19 pandemic has catalyzed a widespread effort to identify drug candidates and biological targets of relevance to SARS-COV-2 infection, which resulted in large numbers of publications on this subject. We have built the COVID-19 Knowledge Extractor (COKE), a web application to extract, curate, and annotate essential drug–target relationships from the research literature on COVID-19. SciBiteAI ontological tagging of the COVID Open Research Data set (CORD-19), a repository of COVID-19 scientific publications, was employed to identify drug–target relationships. Entity identifiers were resolved through lookup routines using UniProt and DrugBank. A custom algorithm was used to identify co-occurrences of the target protein and drug terms, and confidence scores were calculated for each entity pair. COKE processing of the current CORD-19 database identified about 3000 drug–protein pairs, including 29 unique proteins and 500 investigational, experimental, and approved drugs. Some of these drugs are presently undergoing clinical trials for COVID-19. The COKE repository and web application can serve as a useful resource for drug repurposing against SARS-CoV-2. COKE is freely available at https://coke.mml.unc.edu/, and the code is available at https://github.com/DnlRKorn/CoKE. DA - 2021/12/27/ PY - 2021/12/27/ DO - 10.1021/acs.jcim.1c01285 VL - 61 IS - 12 SP - 5734-5741 SN - 1549-960X ER - TY - JOUR TI - Ensuring Data Readiness for Quality Requirements with Help from Procedure Reuse AU - Chirkova, Rada AU - Doyle, Jon AU - Reutter, Juan T2 - ACM JOURNAL OF DATA AND INFORMATION QUALITY AB - Assessing and improving the quality of data are fundamental challenges in Big-Data applications. These challenges have given rise to numerous solutions targeting transformation, integration, and cleaning of data. However, while schema design, data cleaning, and data migration are nowadays reasonably well understood in isolation, not much attention has been given to the interplay between standalone tools in these areas. In this article, we focus on the problem of determining whether the available data-transforming procedures can be used together to bring about the desired quality characteristics of the data in business or analytics processes. For example, to help an organization avoid building a data-quality solution from scratch when facing a new analytics task, we ask whether the data quality can be improved by reusing the tools that are already available, and if so, which tools to apply, and in which order, all without presuming knowledge of the internals of the tools, which may be external or proprietary. Toward addressing this problem, we conduct a formal study in which individual data cleaning, data migration, or other data-transforming tools are abstracted as black-box procedures with only some of the properties exposed, such as their applicability requirements, the parts of the data that the procedure modifies, and the conditions that the data satisfy once the procedure has been applied. As a proof of concept, we provide foundational results on sequential applications of procedures abstracted in this way, to achieve prespecified data-quality objectives, for the use case of relational data and for procedures described by standard relational constraints. We show that, while reasoning in this framework may be computationally infeasible in general, there exist well-behaved cases in which these foundational results can be applied in practice for achieving desired data-quality results on Big Data. DA - 2021/9// PY - 2021/9// DO - 10.1145/3428154 VL - 13 IS - 3 SP - SN - 1936-1955 KW - Data and information quality KW - data integration in Big Data KW - data cleaning in Big Data KW - Big Data quality and analytics KW - Big Data quality in business process KW - Big Data quality management processes, frameworks and models ER - TY - JOUR TI - Quantum Annealing Stencils with Applications to Fuel Loading of a Nuclear Reactor AU - Fustero, Joseph AU - Palmtag, Scott AU - Mueller, Frank T2 - 2021 IEEE INTERNATIONAL CONFERENCE ON QUANTUM COMPUTING AND ENGINEERING (QCE 2021) / QUANTUM WEEK 2021 AB - A method for mapping quadratic unconstrained binary optimizations expressed as nearest neighbor stencils onto contemporary quantum annealing machines is developed. The method is shown to be scalable in providing higher utilization of annealing hardware resources than prior work. Applying the technique to the problem of determining an effective fuel loading pattern for nuclear reactors shows that densely mapped quantum stencils result in higher fidelity solutions of optimization problems then the sparser default solutions. These results are likely to generalize to quadratic unconstrained binary optimizations that can be expressed as dense quantum stencils, thereby improving optimization results obtained from noisy quantum devices. DA - 2021/// PY - 2021/// DO - 10.1109/QCE52317.2021.00044 SP - 265-275 KW - quantum annealing KW - noisy intermediate-scale quantum computing KW - topology graph embeddings ER - TY - JOUR TI - Interaction-Oriented Programming: An Application Semantics Approach for Engineering Decentralized Applications AU - Chopra, Amit K. AU - Christie, Samuel H. AU - Singh, Munindar P. T2 - PROCEEDINGS OF THE 2021 ACM SYMPOSIUM ON PRINCIPLES OF DISTRIBUTED COMPUTING (PODC '21) AB - Interaction-Oriented Programming (IOP) refers to multiagent concepts, languages, and programming models for engineering applications that are characterized by interactions between autonomous parties. Such applications arise in domains such as e-commerce, health care, and finance. Owing to the autonomy of the principals involved, such applications are conceptually decentralized. We demonstrate how to specify a decentralized application flexibly and how to engineer correct, fault-tolerant endpoints (agents) for the principals in a straightforward manner. Notably, the entire application is realized as agents communicating over an unordered, unreliable messaging infrastructure (our implementations in fact use UDP). IOP departs from traditional distributed systems approaches that rely on guarantees in the application's communication infrastructure, e.g., for ordering and fault tolerance. Notably, IOP shows how to address application semantics, the holy grail of distributed systems. DA - 2021/// PY - 2021/// DO - 10.1145/3465084.3467486 SP - 575-576 KW - Commitments KW - information protocol KW - programming model ER - TY - BOOK TI - An Introduction to IoT Analytics AU - Perros, Harry G. AB - This book covers techniques that can be used to analyze data from IoT sensors and addresses questions regarding the performance of an IoT system. It strikes a balance between practice and theory so one can learn how to apply these tools in practice with a good understanding of their inner workings. This is an introductory book for readers who have no familiarity with these techniques. The techniques presented in An Introduction to IoT Analytics come from the areas of machine learning, statistics, and operations research. Machine learning techniques are described that can be used to analyze IoT data generated from sensors for clustering, classification, and regression. The statistical techniques described can be used to carry out regression and forecasting of IoT sensor data and dimensionality reduction of data sets. Operations research is concerned with the performance of an IoT system by constructing a model of the system under study and then carrying out a what-if analysis. The book also describes simulation techniques. Key Features IoT analytics is not just machine learning but also involves other tools, such as forecasting and simulation techniques. Many diagrams and examples are given throughout the book to fully explain the material presented. Each chapter concludes with a project designed to help readers better understand the techniques described. The material in this book has been class tested over several semesters. Practice exercises are included with solutions provided online at www.routledge.com/9780367686314 Harry G. Perros is a Professor of Computer Science at North Carolina State University, an Alumni Distinguished Graduate Professor, and an IEEE Fellow. He has published extensively in the area of performance modeling of computer and communication systems. DA - 2021/3/31/ PY - 2021/3/31/ DO - 10.1201/9781003139041 OP - PB - Chapman and Hall/CRC SN - 9781003139041 UR - http://dx.doi.org/10.1201/9781003139041 DB - Crossref ER - TY - CONF TI - Lessons learned from hyper-parameter tuning for microservice candidate identi cation AU - Yedida, R. AU - Krishna, R. AU - Kalia, A. AU - Menzies, T. AU - Xiao, J. AU - Vukovic, M. T2 - 36th IEEE/ACM International Conference on Automated Software Engineering C2 - 2021/// C3 - Proceedings of the thirty-sixth IEEE/ACM International Conference on Automated Software Engineering (ASE) CY - (Virtual) DA - 2021/// PY - 2021/11/14/ PB - Association for Computing Machinery ER - TY - RPRT TI - Crowdsourcing the State of the Art(ifacts AU - Baldassarre, M.T. AU - Ernst, N. AU - Hermann, B. AU - Menzies, T. AU - Yedida, R. DA - 2021/// PY - 2021/// M1 - 2108.06821 M3 - arXiv preprint SN - 2108.06821 ER - TY - JOUR TI - Towards Realistic and Reproducible Web Crawl Measurements AU - Jueckstock, Jordan AU - Sarker, Shaown AU - Snyder, Peter AU - Beggs, Aidan AU - Papadopoulos, Panagiotis AU - Varvello, Matteo AU - Livshits, Benjamin AU - Kapravelos, Alexandros T2 - PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2021 (WWW 2021) AB - Accurate web measurement is critical for understanding and improving security and privacy online. Such measurements implicitly assume that automated crawls generalize to typical web user experience. But anecdotal evidence suggests the web behaves differently when seen via well-known measurement endpoints or measurement automation frameworks, for various reasons. Our work improves the state of web privacy and security by investigating how key measurements differ when using naive crawling tool defaults vs. careful attempts to match “real” users across the Tranco top 25k web domains. We find web privacy and security measurements significantly affected by vantage point and browser configuration. We conclude that unless researchers ensure their web measurement tools match real world user experience, the research community is likely missing important signals systematically. For example, we find browser configuration alone causing shifts in 19% of known ad and tracking domains encountered and altering the loading frequency of up to 10% of distinct JavaScript code units executed. We find network vantage point having similar, though less dramatic, effects on the same web metrics. To ensure reproducibility, we carefully document our methodology and publish both our code and collected data. DA - 2021/// PY - 2021/// DO - 10.1145/3442381.3450050 SP - 80-91 ER - TY - JOUR TI - Revisit the Scalability of Deep Auto-Regressive Models for Graph Generation AU - Yang, Shuai AU - Shen, Xipeng AU - Lim, Seung-Hwan T2 - 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) AB - As a new promising approach to graph generations, deep auto-regressive graph generation has drawn increasing attention. It however has been commonly deemed as hard to scale up to work with large graphs. In existing studies, it is perceived that the consideration of the full non-local graph dependences is indispensable for this approach to work, which entails the needs for keeping the entire graph's info in memory and hence the perceived “inherent” scalability limitation of the approach. This paper revisits the common perception. It proposes three ways to relax the dependences and conducts a series of empirical measurements. It concludes that the perceived “inherent” scalability limitation is a misperception; with the right design and implementation, deep auto-regressive graph generation can be applied to graphs much larger than the device memory. The rectified perception removes a fundamental barrier for this approach to meet practical needs. DA - 2021/// PY - 2021/// DO - 10.1109/IJCNN52387.2021.9534206 SP - SN - 2161-4393 ER - TY - CONF TI - Knowing both when and where: Temporal-ASTNN for Early Prediction of Student Success in Novice Programming Tasks C2 - 2021/// C3 - Educational Data Mining 2021 DA - 2021/// UR - https://eric.ed.gov/?id=ED615543 ER - TY - CONF TI - Just a Few Expert Constraints Can Help: Humanizing Data-Driven Subgoal Detection for Novice Programming C2 - 2021/// C3 - Educational Data Mining 2021 DA - 2021/// UR - https://eric.ed.gov/?id=ED615599 ER - TY - CONF TI - More With Less: Exploring How to Use Deep Learning Effectively through Semi-supervised Learning for Automatic Bug Detection in Student Code C2 - 2021/// C3 - Educational Data Mining 2021 DA - 2021/// UR - https://eric.ed.gov/?id=ED615586 ER - TY - CONF TI - Increasing Women's Persistence in Computer Science by Decreasing Gendered Self-Assessments of Computing Ability AU - Fisk, Susan R. AU - Wingate, Tiah AU - Battestilli, Lina AU - Stolee, Kathryn T. AB - Gender stereotypes about women's computing ability contribute to the dearth of women in computing by causing women to experience gender bias. These gender stereotypes are doubly disadvantaging to women because they create gender differences in self-assessments of computing ability, decreasing the likelihood that women will persist in Computer Science (CS). This is because students need to believe they have sufficient ability in a field in order to pursue it as a career. C2 - 2021/6/26/ C3 - Proceedings of the 26th ACM Conference on Innovation and Technology in Computer Science Education V. 1 DA - 2021/6/26/ DO - 10.1145/3430665.3456374 PB - ACM UR - http://dx.doi.org/10.1145/3430665.3456374 ER - TY - JOUR TI - Documenting Evidence of a Reuse of 'A Systematic Study of the Class Imbalance Problem in Convolutional Neural Networks' AU - Yedida, Rahul AU - Menzies, Tim T2 - PROCEEDINGS OF THE 29TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '21) AB - We report here the reuse of oversampling, and modifications to the basic approach, used in a recent TSE ’21 paper by YedidaMenzies. The method reused is the oversampling technique studied by Buda et al. These methods were studied in the SE domain (specifically, for defect prediction), and extended by Yedida & Menzies. DA - 2021/// PY - 2021/// DO - 10.1145/3468264.3477212 SP - 1595-1595 UR - https://doi.org/10.1145/3468264.3477212 KW - reuse KW - replication KW - oversampling KW - defect prediction ER - TY - JOUR TI - Documenting Evidence of a Reuse of 'On the Number of Linear Regions of Deep Neural Networks' AU - Yedida, Rahul AU - Menzies, Tim T2 - PROCEEDINGS OF THE 29TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '21) AB - We report here the reuse of theoretical insights from deep learning literature, used in a recent TSE '21 paper by Yedida & Menzies. The artifact replicated is the lower bound on the number of piecewise linear regions in the decision boundary of a feedforward neural network with ReLU activations, as studied by Montufar et al. We document the reuse of Theorem 4 from Montufar et al. by Yedida & Menzies. DA - 2021/// PY - 2021/// DO - 10.1145/3468264.3477213 SP - 1596-1596 UR - https://doi.org/10.1145/3468264.3477213 KW - reuse KW - replication KW - deep learning KW - defect prediction ER - TY - JOUR TI - CrawlPhish: Large-Scale Analysis of Client-Side Cloaking Techniques in Phishing AU - Zhang, Penghui AU - Oest, Adam AU - Cho, Haehyun AU - Sun, Zhibo AU - Johnson, R. C. AU - Wardman, Brad AU - Sarker, Shaown AU - Kapravelos, Alexandros AU - Bao, Tiffany AU - Wang, Ruoyu AU - Shoshitaishvili, Yan AU - Doupe, Adam AU - Ahn, Gail-Joon T2 - IEEE SECURITY & PRIVACY AB - Phishing websites with advanced evasion techniques are a critical threat to Internet users because they delay detection by current antiphishing systems. We present CrawlPhish, a framework for automatically detecting and categorizing the client-side (e.g., JavaScript) evasion used by phishing websites. DA - 2021/12/10/ PY - 2021/12/10/ DO - 10.1109/MSEC.2021.3129992 SP - SN - 1558-4046 KW - Phishing KW - Codes KW - Browsers KW - Security KW - Crawlers KW - Visualization KW - Internet ER - TY - JOUR TI - Operationalizing Intentionality to Play Hanabi With Human Players AU - Eger, Markus AU - Martens, Chris AU - Sauma Chacon, Pablo AU - Alfaro Cordoba, Marcela AU - Hidalgo-Cespedes, Jeisson T2 - IEEE TRANSACTIONS ON GAMES AB - The cooperative card game Hanabi has become of increasing interest in the community, since it combines partially hidden information with information exchange using restricted communication channels. In this article, we describe artificial intelligence agents that are designed to play the game with human players. Our agents make use of the fact that human players expect other players to act intentionally by formulating goals of their own and planning how to achieve them. They then use the available actions available to communicate their plan to the human player. On the flip side, our agents also interpret the actions performed by the human player as containing information about their plans. We present two different variants of our agent that perform this interpretation in different ways. Additionally, since part of human communication happens in subtle indirect ways, we also demonstrate that our agent can use the timing of the human player’s actions as additional information. In order to validate our agents, we have performed two separate experiments: one was done to validate the intentional component of the agents, while the other focused on the interpretation of received information. In this article, we also present the results obtained from these two experiments. DA - 2021/12// PY - 2021/12// DO - 10.1109/TG.2020.3009359 VL - 13 IS - 4 SP - 388-397 SN - 2475-1510 KW - Games KW - Artificial intelligence KW - Timing KW - Color KW - Special issues and sections KW - Intelligent agents KW - Communication channels KW - Game artificial intelligence KW - human computer interaction KW - intelligent agents KW - logic ER - TY - JOUR TI - Whence to Learn? Transferring Knowledge in Configurable Systems Using BEETLE AU - Krishna, Rahul AU - Nair, Vivek AU - Jamshidi, Pooyan AU - Menzies, Tim T2 - IEEE TRANSACTIONS ON SOFTWARE ENGINEERING AB - As software systems grow in complexity and the space of possible configurations increases exponentially, finding the near-optimal configuration of a software system becomes challenging. Recent approaches address this challenge by learning performance models based on a sample set of configurations. However, collecting enough sample configurations can be very expensive since each such sample requires configuring, compiling, and executing the entire system using a complex test suite. When learning on new data is too expensive, it is possible to use Transfer Learning to “transfer” old lessons to the new context. Traditional transfer learning has a number of challenges, specifically, (a) learning from excessive data takes excessive time, and (b) the performance of the models built via transfer can deteriorate as a result of learning from a poor source. To resolve these problems, we propose a novel transfer learning framework called BEETLE, which is a “bellwether”-based transfer learner that focuses on identifying and learning from the most relevant source from amongst the old data. This paper evaluates BEETLE with 57 different software configuration problems based on five software systems (a video encoder, an SAT solver, a SQL database, a high-performance C-compiler, and a streaming data analytics tool). In each of these cases, BEETLE found configurations that are as good as or better than those found by other state-of-the-art transfer learners while requiring only a fraction ( $\frac{1}{7}$ th) of the measurements needed by those other methods. Based on these results, we say that BEETLE is a new high-water mark in optimally configuring software. DA - 2021/12/1/ PY - 2021/12/1/ DO - 10.1109/TSE.2020.2983927 VL - 47 IS - 12 SP - 2956-2972 SN - 1939-3520 UR - https://doi.org/10.1109/TSE.2020.2983927 KW - Performance optimization KW - SBSE KW - transfer learning KW - bellwether ER - TY - JOUR TI - Automated Debugging: Past, Present, and Future (ISSTA Impact Paper Award) AU - Parnin, Chris AU - Orso, Alessandro T2 - ISSTA '21: PROCEEDINGS OF THE 30TH ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS AB - The paper titled “Are Automated Debugging Techniques Actually Helping Programmers?” was published in the proceedings of the International Symposium on Software Testing and Analysis (ISSTA) in 2011, and has been selected to receive the ISSTA 2021 Impact Paper Award. The paper investigated, through two user studies, how developers used and benefited from popular automated debugging techniques. The results of the studies provided (1) evidence that several assumptions made by automated debugging techniques did not hold in practice and (2) insights on limitations of existing approaches and how these limitations could be addressed. In this talk, we revisit the original paper and the work that led to it. We then assess the impact of that research by reviewing how the area of automated debugging has evolved since the paper was published. Finally, we conclude the talk by reflecting on the current state of the art in this area and discussing open issues and potential directions for future work. DA - 2021/// PY - 2021/// DO - 10.1145/3460319.3472397 SP - 1-1 KW - Statistical Fault Localization KW - Automated Debugging KW - User Studies ER - TY - JOUR TI - UDF to SQL Translation through Compositional Lazy Inductive Synthesis AU - Zhang, Guoqiang AU - Xu, Yuanchao AU - Shen, Xipeng AU - Dillig, Isil T2 - PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL AB - Many data processing systems allow SQL queries that call user-defined functions (UDFs) written in conventional programming languages. While such SQL extensions provide convenience and flexibility to users, queries involving UDFs are not as efficient as their pure SQL counterparts that invoke SQL’s highly-optimized built-in functions. Motivated by this problem, we propose a new technique for translating SQL queries with UDFs to pure SQL expressions. Unlike prior work in this space, our method is not based on syntactic rewrite rules and can handle a much more general class of UDFs. At a high-level, our method is based on counterexample-guided inductive synthesis (CEGIS) but employs a novel compositional strategy that decomposes the synthesis task into simpler sub-problems. However, because there is no universal decomposition strategy that works for all UDFs, we propose a novel lazy inductive synthesis approach that generates a sequence of decompositions that correspond to increasingly harder inductive synthesis problems. Because most realistic UDF-to-SQL translation tasks are amenable to a fine-grained decomposition strategy, our lazy inductive synthesis method scales significantly better than traditional CEGIS. We have implemented our proposed technique in a tool called CLIS for optimizing Spark SQL programs containing Scala UDFs. To evaluate CLIS, we manually study 100 randomly selected UDFs and find that 63 of them can be expressed in pure SQL. Our evaluation on these 63 UDFs shows that CLIS can automatically synthesize equivalent SQL expressions in 92% of the cases and that it can solve 2.4× more benchmarks compared to a baseline that does not use our compositional approach. We also show that CLIS yields an average speed-up of 3.5× for individual UDFs and 1.3× to 3.1× in terms of end-to-end application performance. DA - 2021/10// PY - 2021/10// DO - 10.1145/3485489 VL - 5 IS - OOPSLA SP - SN - 2475-1421 UR - https://doi.org/10.1145/3485489 KW - program synthesis KW - source-to-source compiler KW - query optimization ER - TY - JOUR TI - Coarsening Optimization for Differentiable Programming AU - Shen, Xipeng AU - Zhang, Guoqiang AU - Dea, Irene AU - Andow, Samantha AU - Arroyo-Fang, Emilio AU - Gafter, Neal AU - George, Johann AU - Grueter, Melissa AU - Meijer, Erik AU - Shivers, Olin Grigsby AU - Stumpos, Steffi AU - Tempest, Alanna AU - Warden, Christy AU - Yang, Shannon T2 - PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL AB - This paper presents a novel optimization for differentiable programming named coarsening optimization. It offers a systematic way to synergize symbolic differentiation and algorithmic differentiation (AD). Through it, the granularity of the computations differentiated by each step in AD can become much larger than a single operation, and hence lead to much reduced runtime computations and data allocations in AD. To circumvent the difficulties that control flow creates to symbolic differentiation in coarsening, this work introduces phi-calculus, a novel method to allow symbolic reasoning and differentiation of computations that involve branches and loops. It further avoids "expression swell" in symbolic differentiation and balance reuse and coarsening through the design of reuse-centric segment of interest identification. Experiments on a collection of real-world applications show that coarsening optimization is effective in speeding up AD, producing several times to two orders of magnitude speedups. DA - 2021/10// PY - 2021/10// DO - 10.1145/3485507 VL - 5 IS - OOPSLA SP - SN - 2475-1421 UR - https://doi.org/10.1145/3485507 KW - differentiable programming KW - compiler KW - program optimizations KW - SSA KW - Calculus ER - TY - JOUR TI - Spatially Explicit Fuzzy Cognitive Mapping for Participatory Modeling of Stormwater Management AU - White, Corey T. AU - Mitasova, Helena AU - BenDor, Todd K. AU - Foy, Kevin AU - Pala, Okan AU - Vukomanovic, Jelena AU - Meentemeyer, Ross K. T2 - LAND AB - Addressing “wicked” problems like urban stormwater management necessitates building shared understanding among diverse stakeholders with the influence to enact solutions cooperatively. Fuzzy cognitive maps (FCMs) are participatory modeling tools that enable diverse stakeholders to articulate the components of a socio-environmental system (SES) and describe their interactions. However, the spatial scale of an FCM is rarely explicitly considered, despite the influence of spatial scale on SES. We developed a technique to couple FCMs with spatially explicit survey data to connect stakeholder conceptualization of urban stormwater management at a regional scale with specific stormwater problems they identified. We used geospatial data and flooding simulation models to quantitatively evaluate stakeholders’ descriptions of location-specific problems. We found that stakeholders used a wide variety of language to describe variables in their FCMs and that government and academic stakeholders used significantly different suites of variables. We also found that regional FCM did not downscale well to concerns at finer spatial scales; variables and causal relationships important at location-specific scales were often different or missing from the regional FCM. This study demonstrates the spatial framing of stormwater problems influences the perceived range of possible problems, barriers, and solutions through spatial cognitive filtering of the system’s boundaries. DA - 2021/11// PY - 2021/11// DO - 10.3390/land10111114 VL - 10 IS - 11 SP - SN - 2073-445X UR - http://www.scopus.com/inward/record.url?eid=2-s2.0-85118258273&partnerID=MN8TOARS KW - flooding KW - geospatial analytics KW - GRASS GIS KW - knowledge elicitation KW - spatial scale dependency KW - socio-environmental systems KW - climate change KW - urban growth KW - socio-hydrology ER - TY - JOUR TI - Regularity properties of Haar Frames AU - Jaffard, Stephane AU - Krim, Hamid T2 - COMPTES RENDUS MATHEMATIQUE AB - We prove that pointwise and global Hölder regularity can be characterized using the coefficients on the Haar tight frame obtained by using a finite union of shifted Haar bases, despite the fact that the elements composing the frame are discontinuous. DA - 2021/// PY - 2021/// DO - 10.5802/crmath.228 VL - 359 IS - 9 SP - 1107-1117 SN - 1778-3569 ER - TY - JOUR TI - Chronic Pain Patient "Advocates" and Their Focus on Opiophilia: Barking Up the Wrong Tree? AU - Schatman, Michael E. AU - Shapiro, Hannah T2 - JOURNAL OF PAIN RESEARCH DA - 2021/// PY - 2021/// DO - 10.2147/JPR.S349631 VL - 14 SP - 3627-3630 SN - 1178-7090 ER - TY - JOUR TI - Sociotechnical Perspectives on AI Ethics and Accountability AU - Kokciyan, Nadin AU - Srivastava, Biplav AU - Huhns, Michael AU - Singh, Munindar T2 - IEEE INTERNET COMPUTING AB - The articles in this special section focus on sociotechnical perspectives on artificial intelligence (AI) ethics and accountability. Suppose we were to develop a loan-processing tool based on artificial intelligence (AI) to process applications by people for financial loan products. The tool would consider application data and recommend whether to give a loan and for how much. It would even seek out prospective borrowers online for new business and offer loans. Or, suppose we were to develop a career coach that recommends career tracks and training based on a user’s career goal, biosketch, and time and money available to invest in training. Applications of AI in decision support are not hypothetical, and applications such as loan processing and career coaching are becoming mainstream. However, although like other algorithms, their inputs and outputs are data; these AI applications are embedded in society, their decisions and recommendations have direct effects on people’s lives. Denial of a loan reduces financial options and may harm a borrower’s wellbeing, while giving a loan but at usurious interest rates might expose a borrower to financial ruin. Likewise, whereas career advice can be valuable to someone who does not have strong mentors, narrow or biased career advice can impede their future and, through them, their family’s prospects. DA - 2021/11// PY - 2021/11// DO - 10.1109/MIC.2021.3117611 VL - 25 IS - 6 SP - 5-6 SN - 1941-0131 UR - https://doi.org/10.1109/MIC.2021.3117611 ER - TY - JOUR TI - Accountability as a Foundation for Requirements in Sociotechnical Systems AU - Chopra, Amit K. AU - Singh, Munindar P. T2 - IEEE INTERNET COMPUTING AB - We understand sociotechnical systems (STSs) as uniting social and technical tiers to provide abstractions for capturing how autonomous principals interact with each other. Accountability is a foundational concept in STSs and an essential component of achieving ethical outcomes. In simple terms, accountability involves identifying who can call whom to account and who must provide an accounting of what and when. Although accountability is essential in any application involving autonomous parties, established methods do not support it. We formulate an accountability requirement as one where one principal is accountable to another regarding some conditional expectation. Our metamodel for STSs captures accountability requirements as relational constructs inspired from legal concepts, such as commitments, authorization, and prohibition. We apply our metamodel to a healthcare process and show how it helps address the problems of ineffective interaction identified in the original case study. DA - 2021/11// PY - 2021/11// DO - 10.1109/MIC.2021.3106835 VL - 25 IS - 6 SP - 33-41 SN - 1941-0131 UR - https://doi.org/10.1109/MIC.2021.3106835 KW - Authorization KW - Hospitals KW - Contracts KW - Sociotechnical systems KW - Law KW - Internet KW - Delays ER - TY - BOOK TI - Continuous Human Learning Optimization with Enhanced Exploitation AU - Wang, L. AU - Huang, B. AU - Wu, X. AU - Yang, R. AB - Human Learning Optimization (HLO) is an emerging meta-heuristic with promising potential. Although HLO can be directly applied to real-coded problems as a binary algorithm, the search efficiency may be significantly spoiled due to “the curse of dimensionality”. To extend HLO, Continuous HLO (CHLO) is developed to solve real-values problems. However, the research on CHLO is still in its initial stages, and further efforts are needed to exploit the effectiveness of the CHLO. Therefore, this paper proposes a novel continuous human learning optimization with enhanced exploitation (CHLOEE), in which the social learning operator is redesigned to perform global search more efficiently so that the individual learning operator is relieved to focus on performing local search for enhancing the exploitation ability. Finally, the CHLOEE is evaluated on the benchmark problem and compared with CHLO as well as recent state-of-the-art meta-heuristics. The experimental results show that the proposed CHLOEE has better optimization performance. DA - 2021/// PY - 2021/// DO - 10.1007/978-981-16-7213-2_46 VL - 1469 CCIS SE - 472-487 UR - http://www.scopus.com/inward/record.url?eid=2-s2.0-85118999376&partnerID=MN8TOARS ER - TY - JOUR TI - Brief Industry Paper: Towards Real-Time 3D Object Detection for Autonomous Vehicles with Pruning Search AU - Zhao, Pu AU - Niu, Wei AU - Yuan, Geng AU - Cai, Yuxuan AU - Sung, Hsin-Hsuan AU - Liu, Shaoshan AU - Liu, Sijia AU - Shen, Xipeng AU - Ren, Bin AU - Wang, Yanzhi AU - Lin, Xue T2 - 2021 IEEE 27TH REAL-TIME AND EMBEDDED TECHNOLOGY AND APPLICATIONS SYMPOSIUM (RTAS 2021) AB - In autonomous driving, 3D object detection is es-sential as it provides basic knowledge about the environment. However, as deep learning based 3D detection methods are usually computation intensive, it is challenging to support realtime 3D object detection on edge-computing devices in selfdriving cars with limited computation and memory resources. To facilitate this, we propose a compiler-aware pruning search framework, to achieve real-time inference of 3D object detection on the resource-limited mobile devices. Specifically, a generator is applied to sample better pruning proposals in the search space based on current proposals with their performance, and an evaluator is adopted to evaluate the sampled pruning proposal performance. To accelerate the search, the evaluator employs Bayesian optimization with an ensemble of neural predictors. We demonstrate in experiments that for the first time, the pruning search framework can achieve real-time 3D object detection on mobile (Samsung Galaxy S20 phone) with state-of-the-art detection performance. DA - 2021/// PY - 2021/// DO - 10.1109/RTAS52030.2021.00043 SP - 425-428 SN - 1545-3421 KW - 3D object detection KW - real-time KW - point cloud ER - TY - JOUR TI - DYNAMIC GRAPH LEARNING BASED ON GRAPH LAPLACIAN AU - Jiang, Bo AU - Yu, Yiyi AU - Krim, Hamid AU - Smith, Spencer L. T2 - 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) AB - The purpose of this paper is to infer a global (collective) model of time-varying responses of a set of nodes as a dynamic graph, where the individual time series are respectively observed at each of the nodes. The motivation of this work lies in the search for a connectome model which properly captures brain functionality upon observing activities in different regions of the brain and possibly of individual neurons. We formulate the problem as a quadratic objective functional of observed node signals over short time intervals, subjected to the proper regularization reflecting the graph smoothness and other dynamics involving the underlying graph’s Laplacian, as well as the time evolution smoothness of the underlying graph. The resulting joint optimization is solved by a continuous relaxation and an introduced novel gradient-projection scheme. We apply our algorithm to a real-world dataset comprising recorded activities of individual brain cells. The resulting model is shown to not only be viable but also efficiently computable. DA - 2021/// PY - 2021/// DO - 10.1109/ICASSP39728.2021.9413744 SP - 1090-1094 KW - Dynamic Graph Learning KW - Graph Signal Processing KW - Sparse Signal KW - Convex Optimization ER - TY - JOUR TI - GENERATIVE INFORMATION FUSION AU - Tran, Kenneth AU - Sakla, Wesam AU - Krim, Hamid T2 - 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) AB - In this work, we demonstrate the ability to exploit sensing modalities for mitigating an unrepresented modality or for potentially re-targeting resources. This is tantamount to developing proxy sensing capabilities for multi-modal learning. In classical fusion, multiple sensors are required to capture different information about the same target. Maintaining and collecting samples from multiple sensors can be financially demanding. Additionally, the effort necessary to ensure a logical mapping between the modalities may be prohibitively limiting. We examine the scenario where we have access to all modalities during training, but only a single modality at testing. In our approach, we initialize the parameters of our single modality inference network with weights learned from the fusion of multiple modalities through both classification and GANs losses. Our experiments show that emulating a multi-modal system by perturbing a single modality with noise can help us achieve competitive results compared to using multiple modalities. DA - 2021/// PY - 2021/// DO - 10.1109/ICASSP39728.2021.9414284 SP - 3990-3994 KW - multimodal fusion KW - remote sensing KW - gans ER - TY - JOUR TI - DEEP TRANSFORM AND METRIC LEARNING NETWORKS AU - Tang, Wen AU - Chouzenoux, Emilie AU - Pesquet, Jean-Christophe AU - Krim, Hamid T2 - 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) AB - Based on its great successes in inference and denosing tasks, Dictionary Learning (DL) and its related sparse optimization formulations have garnered a lot of research interest. While most solutions have focused on single layer dictionaries, the recently improved Deep DL methods have also fallen short on a number of issues. We hence propose a novel Deep DL approach where each DL layer can be formulated and solved as a combination of one linear layer and a Recurrent Neural Network, where the RNN is flexibly regraded as a layer-associated learned metric. Our proposed work unveils new insights between the Neural Networks and Deep DL, and provides a novel, efficient and competitive approach to jointly learn the deep transforms and metrics. Extensive experiments are carried out to demonstrate that the proposed method can not only outperform existing Deep DL, but also state-of-the-art generic Convolutional Neural Networks. DA - 2021/// PY - 2021/// DO - 10.1109/ICASSP39728.2021.9414990 SP - 2735-2739 KW - Deep Dictionary Learning KW - Deep Neural Network KW - Metric Learning KW - Transform Learning KW - Proximal Operator KW - Differentiable Programming ER - TY - JOUR TI - Improving Vulnerability Inspection Efficiency Using Active Learning AU - Yu, Zhe AU - Theisen, Christopher AU - Williams, Laurie AU - Menzies, Tim T2 - IEEE TRANSACTIONS ON SOFTWARE ENGINEERING AB - Software engineers can find vulnerabilities with less effort if they are directed towards code that might contain more vulnerabilities. HARMLESS is an incremental support vector machine tool that builds a vulnerability prediction model from the source code inspected to date, then suggests what source code files should be inspected next. In this way, HARMLESS can reduce the time and effort required to achieve some desired level of recall for finding vulnerabilities. The tool also provides feedback on when to stop (at that desired level of recall) while at the same time, correcting human errors by double-checking suspicious files. This paper evaluates HARMLESS on Mozilla Firefox vulnerability data. HARMLESS found 80, 90, 95, 99 percent of the vulnerabilities by inspecting 10, 16, 20, 34 percent of the source code files. When targeting 90, 95, 99 percent recall, HARMLESS could stop after inspecting 23, 30, 47 percent of the source code files. Even when human reviewers fail to identify half of the vulnerabilities (50 percent false negative rate), HARMLESS could detect 96 percent of the missing vulnerabilities by double-checking half of the inspected files. Our results serve to highlight the very steep cost of protecting software from vulnerabilities (in our case study that cost is, for example, the human effort of inspecting 28,750 × 20% = 5,750 source code files to identify 95 percent of the vulnerabilities). While this result could benefit the mission-critical projects where human resources are available for inspecting thousands of source code files, the research challenge for future work is how to further reduce that cost. The conclusion of this paper discusses various ways that goal might be achieved. DA - 2021/11/1/ PY - 2021/11/1/ DO - 10.1109/TSE.2019.2949275 VL - 47 IS - 11 SP - 2401-2420 SN - 1939-3520 UR - https://doi.org/10.1109/TSE.2019.2949275 KW - Inspection KW - Software KW - Tools KW - Security KW - Predictive models KW - Error correction KW - NIST KW - Active learning KW - security KW - vulnerabilities KW - software engineering KW - error correction ER - TY - JOUR TI - Counter-Collusion Smart Contracts for Watchtowers in Payment Channel Networks AU - Zhang, Yuhui AU - Yang, Dejun AU - Xue, Guoliang AU - Yu, Ruozhou T2 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2021) AB - Payment channel networks (PCNs) are proposed to improve the cryptocurrency scalability by settling off-chain transactions. However, PCN introduces an undesirable assumption that a channel participant must stay online and be synchronized with the blockchain to defend against frauds. To alleviate this issue, watchtowers have been introduced, such that a hiring party can employ a watchtower to monitor the channel for fraud. However, a watchtower might profit from colluding with a cheating counterparty and fail to perform this job. Existing solutions either focus on heavy cryptographic techniques or require a large collateral. In this work, we leverage smart contracts through economic approaches to counter collusions for watchtowers in PCNs. This brings distrust between the watchtower and the counterparty, so that rational parties do not collude or cheat. We provide detailed analyses on the contracts and rigorously prove that the contracts are effective to counter collusions with minimal on-chain operations. In particular, a watchtower only needs to lock a small collateral, which incentivizes participation of watchtowers and users. We also provide an implementation of the contracts in Solidity and execute them on Ethereum to demonstrate the scalability and efficiency of the contracts. DA - 2021/// PY - 2021/// DO - 10.1109/INFOCOM42981.2021.9488831 SP - SN - 0743-166X ER - TY - JOUR TI - A Survey of Defensive Deception: Approaches Using Game Theory and Machine Learning AU - Zhu, Mu AU - Anwar, Ahmed H. AU - Wan, Zelin AU - Cho, Jin-Hee AU - Kamhoua, Charles A. AU - Singh, Munindar P. T2 - IEEE COMMUNICATIONS SURVEYS AND TUTORIALS AB - Defensive deception is a promising approach for cyber defense. Via defensive deception, a defender can anticipate and prevent attacks by misleading or luring an attacker, or hiding some of its resources. Although defensive deception is garnering increasing research attention, there has not been a systematic investigation of its key components, the underlying principles, and its tradeoffs in various problem settings. This survey focuses on defensive deception research centered on game theory and machine learning, since these are prominent families of artificial intelligence approaches that are widely employed in defensive deception. This paper brings forth insights, lessons, and limitations from prior work. It closes with an outline of some research directions to tackle major gaps in current defensive deception research. DA - 2021/// PY - 2021/// DO - 10.1109/COMST.2021.3102874 VL - 23 IS - 4 SP - 2460-2493 SN - 1553-877X UR - https://doi.org/10.1109/COMST.2021.3102874 KW - Games KW - Tutorials KW - Taxonomy KW - Computer security KW - Planning KW - Monitoring KW - Measurement KW - Defensive deception KW - cybersecurity KW - game theory KW - machine learning ER - TY - JOUR TI - Unifying Domain Adaptation and Domain Generalization for Robust Prediction Across Minority Racial Groups AU - Khoshnevisan, Farzaneh AU - Chi, Min T2 - MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES AB - In clinical deployment, the performance of a model trained from one or more medical systems often deteriorates on another system and such deterioration is especially evident among minority patients who often have limited data. In this work, we present a multi-source adversarial domain separation (MS-ADS) framework which unifies domain adaptation and domain generalization. MS-ADS is designed to address two types of discrepancies: covariate shift stemming from differences in patient populations, and systematic bias on account of differences in data collection procedures across medical systems. We evaluate MS-ADS for early prediction of septic shock on three tasks. On a task of domain adaptation across three medical systems, we show that by leveraging data from multiple systems while accounting for both types of discrepancies, MS-ADS improves the prediction performance across all three systems; on a task of domain generalization to an unseen medical system, we show that MS-ADS can perform better than or close to the gold standard supervised models built for the system; last but not least, on a task that involves both domain adaptation and domain generalization: generalization to unseen racial groups across medical systems, MS-ADS shows robust out-performance by addressing covariate shift across different racial groups and systematic bias across medical systems simultaneously. DA - 2021/// PY - 2021/// DO - 10.1007/978-3-030-86486-6_32 VL - 12975 SP - 521-537 SN - 1611-3349 KW - Domain adaptation KW - Domain generalization KW - Cross-racial transfer KW - Septic shock ER - TY - JOUR TI - Deserv: Decentralized Serverless Computing AU - Christie, Samuel H. AU - Chopra, Amit K. AU - Singh, Munindar P. T2 - 2021 IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES, ICWS 2021 AB - A decentralized application involves multiple autonomous principals, e.g., humans and organizations. Autonomy motivates (i) specifying a decentralized application via a protocol that captures the interactions between the principals, and (ii) a programming model that enables each principal to independently (from other principals) construct its own protocol-compliant agent. An agent encodes its principal's decision making and represents it in the application. We contribute Deserv, the first protocol-based programming model for decentralized applications that is suited to the cloud. Specifically, Deserv demonstrates how to leverage function-as-a-service (FaaS), a popular serverless programming model, to implement agents. A notable feature of Deserv is the use declarative protocols to specify interactions. Declarative protocols support implementing stateful agents in a manner that naturally exploits the concurrency and autoscaling benefits offered by serverless computing. DA - 2021/// PY - 2021/// DO - 10.1109/ICWS53863.2021.00020 SP - 51-60 UR - https://doi.org/10.1109/ICWS53863.2021.00020 KW - multiagent systems KW - protocols KW - programming model ER - TY - JOUR TI - SQLRepair: Identifying and Repairing Mistakes in Student-Authored SQL Queries AU - Presler-Marshall, Kai AU - Heckman, Sarah AU - Stolee, Kathryn T. T2 - 2021 IEEE/ACM 43RD INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: JOINT TRACK ON SOFTWARE ENGINEERING EDUCATION AND TRAINING (ICSE-JSEET 2021) AB - Computer science educators seek to understand the types of mistakes that students make when learning a new (programming) language so that they can help students avoid those mistakes in the future. While educators know what mistakes students regularly make in languages such as C and Python, students struggle with SQL and regularly make mistakes when working with it. We present an analysis of mistakes that students made when first working with SQL, classify the types of errors introduced, and provide suggestions on how to avoid them going forward. In addition, we present an automated tool, SQLRepair, that is capable of repairing errors introduced by undergraduate programmers when writing SQL queries. Our results show that students find repairs produced by our tool comparable in understandability to queries written by themselves or by other students, suggesting that SQL repair tools may be useful in an educational context. We also provide to the community a benchmark of SQL queries written by the students in our study that we used for evaluation of SQLRepair. DA - 2021/// PY - 2021/// DO - 10.1109/ICSE-SEET52601.2021.00030 SP - 199-210 ER - TY - JOUR TI - SOSRepair: Expressive Semantic Search for Real-World Program Repair AU - Afzal, Afsoon AU - Motwani, Manish AU - Stolee, Kathryn T. AU - Brun, Yuriy AU - Le Goues, Claire T2 - IEEE TRANSACTIONS ON SOFTWARE ENGINEERING AB - Automated program repair holds the potential to significantly reduce software maintenance effort and cost. However, recent studies have shown that it often produces low-quality patches that repair some but break other functionality. We hypothesize that producing patches by replacing likely faulty regions of code with semantically-similar code fragments, and doing so at a higher level of granularity than prior approaches can better capture abstraction and the intended specification, and can improve repair quality. We create SOSRepair, an automated program repair technique that uses semantic code search to replace candidate buggy code regions with behaviorally-similar (but not identical) code written by humans. SOSRepair is the first such technique to scale to real-world defects in real-world systems. On a subset of the ManyBugs benchmark of such defects, SOSRepair produces patches for 22 (34%) of the 65 defects, including 3, 5, and 6 defects for which previous state-of-the-art techniques Angelix, Prophet, and GenProg do not, respectively. On these 22 defects, SOSRepair produces more patches (9, 41%) that pass all independent tests than the prior techniques. We demonstrate a relationship between patch granularity and the ability to produce patches that pass all independent tests. We then show that fault localization precision is a key factor in SOSRepair's success. Manually improving fault localization allows SOSRepair to patch 23 (35%) defects, of which 16 (70%) pass all independent tests. We conclude that (1) higher-granularity, semantic-based patches can improve patch quality, (2) semantic search is promising for producing high-quality real-world defect repairs, (3) research in fault localization can significantly improve the quality of program repair techniques, and (4) semi-automated approaches in which developers suggest fix locations may produce high-quality patches. DA - 2021/10/1/ PY - 2021/10/1/ DO - 10.1109/TSE.2019.2944914 VL - 47 IS - 10 SP - 2162-2181 SN - 1939-3520 KW - Maintenance engineering KW - Semantic search KW - Encoding KW - Benchmark testing KW - Computer bugs KW - Software KW - Automated program repair KW - semantic code search KW - patch quality KW - program repair quality KW - SOSRepair ER - TY - JOUR TI - How to "DODGE" Complex Software Analytics AU - Agrawal, Amritanshu AU - Fu, Wei AU - Chen, Di AU - Shen, Xipeng AU - Menzies, Tim T2 - IEEE TRANSACTIONS ON SOFTWARE ENGINEERING AB - Machine learning techniques applied to software engineering tasks can be improved by hyperparameter optimization, i.e., automatic tools that find good settings for a learner's control parameters. We show that such hyperparameter optimization can be unnecessarily slow, particularly when the optimizers waste time exploring "redundant tunings"', i.e., pairs of tunings which lead to indistinguishable results. By ignoring redundant tunings, DODGE, a tuning tool, runs orders of magnitude faster, while also generating learners with more accurate predictions than seen in prior state-of-the-art approaches. DA - 2021/10/1/ PY - 2021/10/1/ DO - 10.1109/TSE.2019.2945020 VL - 47 IS - 10 SP - 2182-2194 SN - 1939-3520 UR - https://doi.org/10.1109/TSE.2019.2945020 KW - Tuning KW - Text mining KW - Software KW - Task analysis KW - Optimization KW - Software engineering KW - Tools KW - Software analytics KW - hyperparameter optimization KW - defect prediction KW - text mining ER - TY - JOUR TI - Program Comprehension and Code Complexity Metrics: A Replication Package of an fMRI Study AU - Peitek, Norman AU - Apel, Sven AU - Parnin, Chris AU - Brechmann, Andre AU - Siegmund, Janet T2 - 2021 IEEE/ACM 43RD INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: COMPANION PROCEEDINGS (ICSE-COMPANION 2021) AB - In this artifact, we document our publicly shared data set of our functional magnetic resonance imaging (fMRI) study on programmers. We have conducted an fMRI study with 19 participants observing program comprehension of short code snippets at varying complexity levels. We dissected four classes of code complexity metrics and their relationship to neuronal, behavioral, and subjective correlates of program comprehension. Our data corroborate that complexity metrics can-to a limited degree-explain programmers' cognition in program comprehension. In the paper on the fMRI study, we outline several follow-up experiments investigating fine-grained effects of code complexity and describe possible refinements to code complexity metrics. We view our conducted experiment as a starting point to link code complexity metrics to neural and behavioral correlates. To enable future research to continue this line of work, we aim to provide as much support as possible to conduct similar studies with this artifact. DA - 2021/// PY - 2021/// DO - 10.1109/ICSE-Companion52605.2021.00071 SP - 168-169 SN - 2574-1926 ER - TY - JOUR TI - Vulnerability Detection is Just the Beginning AU - Elder, Sarah T2 - 2021 IEEE/ACM 43RD INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: COMPANION PROCEEDINGS (ICSE-COMPANION 2021) AB - Vulnerability detection plays a key role in secure software development. There are many different vulnerability detection tools and techniques to choose from, and insufficient information on which vulnerability detection techniques to use and when. The goal of this research is to assist managers and other decision-makers on software projects in making informed choices about the use of different software vulnerability detection techniques through empirical analysis of the efficiency and effectiveness of each technique. We will examine the relationships between the vulnerability detection technique used to find a vulnerability, the type of vulnerability found, the exploitability of the vulnerability, and the effort needed to fix a vulnerability on two projects where we ensure all vulnerabilities found have been fixed. We will then examine how these relationships are seen in Open Source Software more broadly where practitioners may use different vulnerability detection techniques, or may not fix all vulnerabilities found due to resource constraints. DA - 2021/// PY - 2021/// DO - 10.1109/ICSE-Companion52605.2021.00133 SP - 304-308 SN - 2574-1926 KW - Security Management KW - Computer Security KW - Software Testing ER - TY - JOUR TI - Structuring a Comprehensive Software Security Course Around the OWASP Application Security Verification Standard AU - Elder, Sarah E. AU - Zahan, Nusrat AU - Kozarev, Val AU - Shu, Rui AU - Menzies, Tim AU - Williams, Laurie T2 - 2021 IEEE/ACM 43RD INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: JOINT TRACK ON SOFTWARE ENGINEERING EDUCATION AND TRAINING (ICSE-JSEET 2021) AB - Lack of security expertise among software practitioners is a problem with many implications. First, there is a deficit of security professionals to meet current needs. Additionally, even practitioners who do not plan to work in security may benefit from increased understanding of security. The goal of this paper is to aid software engineering educators in designing a comprehensive software security course by sharing an experience running a software security course for the eleventh time. Through all the eleven years of running the software security course, the course objectives have been comprehensive - ranging from security testing, to secure design and coding, to security requirements to security risk management. For the first time in this eleventh year, a theme of the course assignments was to map vulnerability discovery to the security controls of the Open Web Application Security Project (OWASP) Application Security Verification Standard (ASVS). Based upon student performance on a final exploratory penetration testing project, this mapping may have increased students' depth of understanding of a wider range of security topics. The students efficiently detected 191 unique and verified vulnerabilities of 28 different Common Weakness Enumeration (CWE) types during a three-hour period in the OpenMRS project, an electronic health record application in active use. DA - 2021/// PY - 2021/// DO - 10.1109/ICSE-SEET52601.2021.00019 SP - 95-104 UR - http://dx.doi.org/10.1109/icse-seet52601.2021.00019 KW - Security and Protection KW - Computer and Information Science Education KW - Industry-Standards ER - TY - JOUR TI - Visual Fatigue Alleviating in Stereo Imaging of Anaglyphs by Reducing Retinal Rivalry and Color Distortion Based on Mobile Virtual Reality Technology AU - Qi, Min AU - Cui, Shanshan AU - Du, Qianmin AU - Xu, Yuelei AU - McAllister, David F. T2 - WIRELESS COMMUNICATIONS & MOBILE COMPUTING AB - Stereoscopic display is the means of showing scenes in Virtual Reality (VR). As a type of stereo images, anaglyphs can be displayed not only on the screen, but are currently the only solution of stereo images that can be displayed on paper. However, its deficiencies, like retinal rivalry and color distortion, could cause visual fatigue. To address this issue, an algorithm is proposed for anaglyph generation. Unlike previous studies only considering one aspect, it considers both retinal rivalry and color distortion at the same time. The algorithm works in the CIE L a b color space and focuses on matching the perceptual color attributes especially the hue, rather than directly minimizes the sum of the distances between the perceived anaglyph color and the stereo image pair. In addition, the paper builds a relatively complete framework to generate anaglyphs so that it is more controllable to adjust the parameters and choose the appropriate process. The subjective tests are conducted to compare the results with several techniques which generate anaglyphs including empirical methods and computing methods. Results show that the proposed algorithm has a good performance. DA - 2021/9/16/ PY - 2021/9/16/ DO - 10.1155/2021/1285712 VL - 2021 SP - SN - 1530-8677 ER - TY - JOUR TI - Supporting Legacy Libraries on Non-Volatile Memory: A User-Transparent Approach AU - Ye, Chencheng AU - Xu, Yuanchao AU - Shen, Xipeng AU - Liao, Xiaofei AU - Jin, Hai AU - Solihin, Yan T2 - 2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2021) AB - As mainstream computing is poised to embrace the advent of byte-addressable non-volatile memory (NVM), an important roadblock has remained largely unnoticed, support of legacy libraries on NVM. Libraries underpin modern software everywhere. As current NVM programming interfaces all designate special types and constructs for NVM objects and references, legacy libraries, being incompatible with these data types, will face major obstacles for working with future applications written for NVM. This paper introduces a simple approach to mitigating the issue. The novel approach centers around user-transparent persistent reference, a new concept that allows programmers to reference a persistent object in the same way as reference a normal (volatile) object. The paper presents the implementation of the concept, carefully examines its soundness, and describes compiler and simple architecture support for keeping performance overheads very low. DA - 2021/// PY - 2021/// DO - 10.1109/ISCA52012.2021.00042 SP - 443-455 SN - 1063-6897 ER - TY - JOUR TI - MASS Communication for Constrained Devices AU - Huang, Cheng AU - Tay, Zeng Huy AU - Harfoush, Khaled T2 - 30TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS AND NETWORKS (ICCCN 2021) AB - In this paper, we introduce MASS, a Multiple channel Access solution for constrained devices relying on M-Ary direct sequence Spread Spectrum. MASS is uncoordinated in the sense that it does not require a per-device pre-shared key with the access point. It does not require per-device signal power adaptation to counter the popular near-far problem, and does not assume fixed size messages. As such, MASS does not need expensive coordination or complex hardware for the constrained sensors, and its multi-access solution leads to higher throughput and longer (and more predictable) lifetimes for constrained IoT devices compared to typical contention-based media access protocols. These benefits are achieved by trading-off the processing capacity at powerful access points for more efficient communication and more power savings for resource-constrained devices. Experimental results highlight the efficacy of MASS communication. DA - 2021/// PY - 2021/// DO - 10.1109/ICCCN52240.2021.9522167 SP - SN - 1095-2055 KW - Direct-Sequence Spread Spectrum KW - M-ary Spreading Code KW - Multiple Channel Access ER - TY - JOUR TI - Accurately Decoding MIMO Streams in VLC AU - Venkatnarayan, Raghav H. AU - Shahzad, Muhammad T2 - 30TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS AND NETWORKS (ICCCN 2021) AB - Among the efforts to overcome the problem of rapidly saturating RF bands, visible light communication (VLC) is garnering a renewed interest as it can be enabled using commodity LED lamps. As indoor spaces are typically illuminated by multiple lamps comprising multiple LEDs each, a natural approach to efficiently utilize the bandwidth of all the LEDs is to use MIMO communication. The state of the art approach to decode MIMO streams is to use channel matrix. Although channel matrix based decoding method (CMDM) works very well in conventional RF technologies, when used in VLC, it suffers from several limitations, such as high sensitivity to environmental conditions, and need for sophisticated receivers. To overcome these limitations, we propose PCDM, a novel parallelogram - clustering based decoding method, which is fundamentally different from CMDM and achieves an order of magnitude lower bit error rate compared to CMDM. We implement and extensively evaluate these two methods using a real VLC MIMO testbed. Our results show that PCDM outperformed CMDM in all scenarios. DA - 2021/// PY - 2021/// DO - 10.1109/ICCCN52240.2021.9522352 SP - SN - 1095-2055 ER - TY - JOUR TI - Characterizing the Performance of QUIC on Android and Wear OS Devices AU - Ganji, Anirudh AU - Shahzad, Muhammad T2 - 30TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS AND NETWORKS (ICCCN 2021) AB - Google’s QUIC protocol has become popular over the past few years and is being rapidly adopted as the transport protocol of choice by popular Internet services in their mobile applications. Considering this, it is crucial to understand the performance and implementation issues of integrating QUIC with mobile and wearable applications. In this paper, we conduct a comprehensive measurement analysis and comparison of QUIC with TCP on mobile and wearable platforms. Our experiments cover a wide range of environments, including different request sizes, traffic directions, and connectivity types. From our experiments, we found that the benefits of using QUIC instead of TCP to service HTTP requests are not uniform across different scenarios. We also found a bug in the current implementation of QUIC in Android’s Cronet library that prevents the applications from reverting back to using WiFi after a connection migration from LTE happens. Our experiences from this measurement study has lead us to propose a probabilistic framework, which we call Dynamic Transport Selection, that adaptively chooses the appropriate transport protocol for a given network environment. We implemented and evaluated this framework in Android and Wear OS devices and found that it improves the overall request completion performance of the application by as much as 41.76% when compared to using either QUIC or TCP alone DA - 2021/// PY - 2021/// DO - 10.1109/ICCCN52240.2021.9522258 SP - SN - 1095-2055 ER - TY - JOUR TI - Toward Semi-Automatic Misconception Discovery Using Code Embeddings AU - Shi, Yang AU - Shah, Krupal AU - Wang, Wengran AU - Marwan, Samiha AU - Penmetsa, Poorvaja AU - Price, Thomas W. T2 - LAK21 CONFERENCE PROCEEDINGS: THE ELEVENTH INTERNATIONAL CONFERENCE ON LEARNING ANALYTICS & KNOWLEDGE AB - Understanding students' misconceptions is important for effective teaching and assessment. However, discovering such misconceptions manually can be time-consuming and laborious. Automated misconception discovery can address these challenges by highlighting patterns in student data, which domain experts can then inspect to identify misconceptions. In this work, we present a novel method for the semi-automated discovery of problem-specific misconceptions from students' program code in computing courses, using a state-of-the-art code classification model. We trained the model on a block-based programming dataset and used the learned embedding to cluster incorrect student submissions. We found these clusters correspond to specific misconceptions about the problem and would not have been easily discovered with existing approaches. We also discuss potential applications of our approach and how these misconceptions inform domain-specific insights into students' learning processes. DA - 2021/// PY - 2021/// DO - 10.1145/3448139.3448205 SP - 606-612 UR - http://dx.doi.org/10.1145/3448139.3448205 KW - Neural Network KW - Code Analysis KW - Automatic Assessment KW - Learning Representation ER - TY - JOUR TI - Real-time quantum calculations of phase shifts using wave packet time delays AU - Gustafson, Erik AU - Zhu, Yingyue AU - Dreher, Patrick AU - Linke, Norbert M. AU - Meurice, Yannick T2 - PHYSICAL REVIEW D AB - We present a method to extract the phase shift of a scattering process using the real-time evolution in the early and intermediate stages of the collision in order to estimate the time delay of a wave packet. This procedure is convenient when using noisy quantum computers for which the asymptotic out-state behavior is unreachable. We demonstrate that the challenging Fourier transforms involved in the state preparation and measurements can be implemented in $1+1$ dimensions with current trapped ion devices and IBM quantum computers. We compare quantum computation of the time delays obtained in the one-particle quantum mechanics limit and the scalable quantum field theory formulation with accurate numerical results. We discuss the finite volume effects in the Wigner formula connecting time delays to phase shifts. The results reported involve two- and four-qubit calculations, and we discuss the possibility of larger scale computations in the near future. DA - 2021/9/16/ PY - 2021/9/16/ DO - 10.1103/PhysRevD.104.054507 VL - 104 IS - 5 SP - SN - 2470-0029 ER - TY - CONF TI - Progression Trajectory-Based Student Modeling for Novice Block-Based Programming AU - Fahid, Fahmid Morshed AU - Tian, Xiaoyi AU - Emerson, Andrew AU - Wiggins, Joseph B. AU - Bounajim, Dolly AU - Smith, Andy AU - Wiebe, Eric AU - Mott, Bradford AU - Boyer, Kristy Elizabeth AU - Lester, James AB - Block-based programming environments are widely used in computer science education. However, these environments pose significant challenges for student modeling. Given a series of problem-solving actions taken by students in block-based programming environments, student models need to accurately infer problem-solving students’ programming abilities in real time to enable adaptive feedback and hints that are tailored to students’ abilities. While student models for block-based programming offer the potential to support student-adaptivity, creating student models for these environments is challenging because students can develop a broad range of solutions to a given programming activity. To address these challenges, we introduce a progression trajectory-based student modeling framework for modeling novice student block-based programming across multiple learning activities. Student trajectories utilize a time series representation that employs code analysis to incrementally compare student programs to expert solutions as students undertake block-based programming activities. This paper reports on a study in which progression trajectories were collected from more than 100 undergraduate students engaging in a series of block-based programming activities in an introductory computer science course. Using progression trajectory-based student modeling, we identified three distinct trajectory classes: Early Quitting, High Persistence, and Efficient Completion. Analysis of these trajectories revealed that they exhibit significantly different characteristics with respect to students’ actions and can be used to accurately predict students’ programming behaviors on future programming activities compared to competing baseline models. The findings suggest that progression trajectory-based student models can accurately model students’ block-based programming problem solving and hold potential for informing adaptive support in block-based programming environments. C2 - 2021/6/21/ C3 - Proceedings of the 29th ACM Conference on User Modeling, Adaptation and Personalization DA - 2021/6/21/ DO - 10.1145/3450613.3456833 PB - ACM UR - http://dx.doi.org/10.1145/3450613.3456833 ER - TY - JOUR TI - Mining Workflows for Anomalous Data Transfers AU - Tu, Huy AU - Papadimitriou, George AU - Kiran, Mariam AU - Wang, Cong AU - Mandal, Anirban AU - Deelman, Ewa AU - Menzies, Tim T2 - 2021 IEEE/ACM 18TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR 2021) AB - Modern scientific workflows are data-driven and are often executed on distributed, heterogeneous, high-performance computing infrastructures. Anomalies and failures in the work-flow execution cause loss of scientific productivity and inefficient use of the infrastructure. Hence, detecting, diagnosing, and mitigating these anomalies are immensely important for reliable and performant scientific workflows. Since these workflows rely heavily on high-performance network transfers that require strict QoS constraints, accurately detecting anomalous network performance is crucial to ensure reliable and efficient workflow execution. To address this challenge, we have developed X-FLASH, a network anomaly detection tool for faulty TCP workflow transfers. X-FLASH incorporates novel hyperparameter tuning and data mining approaches for improving the performance of the machine learning algorithms to accurately classify the anomalous TCP packets. X-FLASH leverages XGBoost as an ensemble model and couples XGBoost with a sequential optimizer, FLASH, borrowed from search-based Software Engineering to learn the optimal model parameters. X-FLASH found configurations that outperformed the existing approach up to 28%, 29%, and 40% relatively for F-measure, G-score, and recall in less than 30 evaluations. From (1) large improvement and (2) simple tuning, we recommend future research to have additional tuning study as a new standard, at least in the area of scientific workflow anomaly detection. DA - 2021/// PY - 2021/// DO - 10.1109/MSR52588.2021.00013 SP - 1-12 SN - 2160-1852 KW - Scientific Workflow KW - TCP Signatures KW - Anomaly Detection KW - Hyper-Parameter Tuning KW - Sequential Optimization ER - TY - JOUR TI - Hummingbird: efficient performance prediction for executing genomic applications in the cloud AU - Bahmani, Amir AU - Xing, Ziye AU - Krishnan, Vandhana AU - Ray, Utsab AU - Mueller, Frank AU - Alavi, Amir AU - Tsao, Philip S. AU - Snyder, Michael P. AU - Pan, Cuiping T2 - BIOINFORMATICS AB - A major drawback of executing genomic applications on cloud computing facilities is the lack of tools to predict which instance type is the most appropriate, often resulting in an over- or under- matching of resources. Determining the right configuration before actually running the applications will save money and time. Here, we introduce Hummingbird, a tool for predicting performance of computing instances with varying memory and CPU on multiple cloud platforms.Our experiments on three major genomic data pipelines, including GATK HaplotypeCaller, GATK Mutect2 and ENCODE ATAC-seq, showed that Hummingbird was able to address applications in command line specified in JSON format or workflow description language (WDL) format, and accurately predicted the fastest, the cheapest and the most cost-efficient compute instances in an economic manner.Hummingbird is available as an open source tool at: https://github.com/StanfordBioinformatics/Hummingbird.Supplementary data are available at Bioinformatics online. DA - 2021/9/1/ PY - 2021/9/1/ DO - 10.1093/bioinformatics/btab161 VL - 37 IS - 17 SP - 2537-2543 SN - 1460-2059 ER - TY - JOUR TI - Systemic Assessment of Node Failures in HPC Production Platforms AU - Das, Anwesha AU - Mueller, Frank AU - Rountree, Barry T2 - 2021 IEEE 35TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS) AB - Production HPC clusters endure failures reducing computational capability and resource availability. Despite the presence of various failure prediction schemes for large-scale computing systems, a comprehensive understanding of how nodes fail considering various components and layers of the system is required for sustained resilience. This work performs a holistic diagnosis of node failures using a measurement-driven approach on contemporary system logs that can help vendors and system administrators support exascale resilience.Our work shows that external environmental influence is not strongly correlated with node failures in terms of the root cause. Though hardware and software faults trigger failures, the underlying root cause often lies in the application malfunctioning causing the system to fail. Furthermore, lead time enhancements are feasible for nodes showing fail slow characteristics. This study excavates such helpful empirical observations, which could facilitate better failure handling in production systems. DA - 2021/// PY - 2021/// DO - 10.1109/IPDPS49936.2021.00035 SP - 267-276 SN - 1530-2075 KW - Root Cause KW - Node Failures KW - Holistic Analysis ER - TY - JOUR TI - Survey on test case generation, selection and prioritization for cyber-physical systems AU - Sadri-Moshkenani, Zahra AU - Bradley, Justin AU - Rothermel, Gregg T2 - SOFTWARE TESTING VERIFICATION & RELIABILITY AB - Summary A cyber‐physical system (CPS) is a collection of computing devices that communicate with each other, operate in the target environment via actuators and interact with the physical world through sensors in a feedback loop. CPSs need to be safe and reliable and function in accordance with their requirements. Testing, focusing on a CPS model and/or its code, is the primary approach used by engineers to achieve this. Generating, selecting and prioritizing test cases that can reveal faults in CPSs, from the wide range of possible input values and stimuli that affect their operation, are of central importance in this process. To date, however, in our search of the literature, we have found no comprehensive survey of research on test case generation, selection and prioritization for CPSs. In this article, therefore, we report the results of a survey of approaches for generating, selecting and prioritizing test cases for CPSs; the results illustrate the progress that has been made on these approaches to date, the properties that characterize the approaches and the challenges that remain open in these areas of research. DA - 2021/9/15/ PY - 2021/9/15/ DO - 10.1002/stvr.1794 SP - SN - 1099-1689 KW - cyber-physical system KW - embedded-control systems KW - test case generation KW - test case selection KW - test case prioritization KW - testing ER - TY - JOUR TI - Indexed improvements for real-time trotter evolution of a (1 AU - Gustafson, Erik AU - Dreher, Patrick AU - Hang, Zheyue AU - Meurice, Yannick T2 - QUANTUM SCIENCE AND TECHNOLOGY AB - Today's quantum computers offer the possibility of performing real-time calculations for quantum field theory scattering processes motivated by high energy physics. In order to follow the successful roadmap which has been established for the calculation of static properties at Euclidean time, it is crucial to develop new algorithmic methods to deal with the limitations of current noisy intermediate-scale quantum (NISQ) devices and to establish quantitative measures of the progress made with different devices. In this paper, we report recent progress in these directions. We show that nonlinear aspects of the trotter errors allow us to take much larger step then suggested by low-order analysis. This is crucial to reach physically relevant time scales with today's NISQ technology. We propose to use an index averaging absolute values of the difference between the accurately calculated trotter evolution of site occupations and their actual measurements on NISQ machines (G index) as a measure to compare results that have been obtained from different hardware platforms. Using the transverse Ising model in one spatial dimension with four sites we apply this metric across several hardware platforms. We study the results including readout mitigation and Richardson extrapolations and show that the mitigated measurements are very effective based on the analysis of the trotter step size modifications. We discuss how this advance in the trotter step size procedures can improve quantum computing physics scattering results and how this technical advance can be applied to other machines and noise mitigation methods. DA - 2021/10// PY - 2021/10// DO - 10.1088/2058-9565/ac1dff VL - 6 IS - 4 SP - SN - 2058-9565 KW - quantum computing KW - Ising model KW - trotterization KW - error mitigation KW - benchmarking ER - TY - JOUR TI - An interpretable framework for investigating the neighborhood effect in POI recommendation AU - Yuan, Guangchao AU - Singh, Munindar P. AU - Murukannaiah, Pradeep K. T2 - PLOS ONE AB - Geographical characteristics have been proven to be effective in improving the quality of point-of-interest (POI) recommendation. However, existing works on POI recommendation focus on cost (time or money) of travel for a user. An important geographical aspect that has not been studied adequately is the neighborhood effect , which captures a user’s POI visiting behavior based on the user’s preference not only to a POI, but also to the POI’s neighborhood. To provide an interpretable framework to fully study the neighborhood effect, first, we develop different sets of insightful features, representing different aspects of neighborhood effect. We employ a Yelp data set to evaluate how different aspects of the neighborhood effect affect a user’s POI visiting behavior. Second, we propose a deep learning–based recommendation framework that exploits the neighborhood effect. Experimental results show that our approach is more effective than two state-of-the-art matrix factorization–based POI recommendation techniques. DA - 2021/8/5/ PY - 2021/8/5/ DO - 10.1371/journal.pone.0255685 VL - 16 IS - 8 SP - SN - 1932-6203 ER - TY - JOUR TI - Game-Based Learning Analytics for Supporting Adolescents' Reflection AU - Cloude, Elizabeth B. AU - Carpenter, Dan AU - Dever, Daryn A. AU - Azevedo, Roger AU - Lester, James T2 - JOURNAL OF LEARNING ANALYTICS AB - Reflection is critical for adolescents’ problem solving and learning in game-based learning environments (GBLEs). Yet challenges exist in the literature because most studies lack a theoretical perspective and clear operational definition to inform how and when reflection should be scaffolded during game-based learning. In this paper, we address these issues by studying the quantity and quality of 120 adolescents’ written reflections and their relation to their learning and problem solving with Crystal Island, a GBLE. Specifically, we (1) define reflection and how it relates to skill and knowledge acquisition; (2) review studies examining reflection and its relation to problem solving and learning with emerging technologies; and (3) provide direction for building reflection prompts into GBLEs that are aligned with the learning goals built into the learning session (e.g., learn about microbiology versus successfully solve a problem) to maximize adolescents’ reflection, learning, and performance. Overall, our findings emphasize how important it is to examine not only the quantity of reflection but also the depth of written reflection as it relates to specific learning goals. We discuss the implications of using game-learning analytics to guide instructional decision making in the classroom. DA - 2021/// PY - 2021/// DO - 10.18608/jla.2021.7371 VL - 8 IS - 2 SP - 51-72 SN - 1929-7750 KW - reflection KW - game-learning analytics KW - adolescents KW - problem solving KW - knowledge acquisition ER - TY - JOUR TI - G-TADOC: Enabling Efficient GPU-Based Text Analytics without Decompression AU - Zhang, Feng AU - Pan, Zaifeng AU - Zhou, Yanliang AU - Zhai, Jidong AU - Shen, Xipeng AU - Mutlu, Onur AU - Du, Xiaoyong T2 - 2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021) AB - Text analytics directly on compression (TADOC) has proven to be a promising technology for big data analytics. GPUs are extremely popular accelerators for data analytics systems. Unfortunately, no work so far shows how to utilize GPUs to accelerate TADOC. We describe G-TADOC, the first framework that provides GPU-based text analytics directly on compression, effectively enabling efficient text analytics on GPUs without decompressing the input data. G-TADOC solves three major challenges. First, TADOC involves a large amount of dependencies, which makes it difficult to exploit massive parallelism on a GPU. We develop a novel fine-grained thread-level workload scheduling strategy for GPU threads, which partitions heavily-dependent loads adaptively in a fine-grained manner. Second, in developing G-TADOC, thousands of GPU threads writing to the same result buffer leads to inconsistency while directly using locks and atomic operations lead to large synchronization overheads. We develop a memory pool with thread-safe data structures on GPUs to handle such difficulties. Third, maintaining the sequence information among words is essential for lossless compression. We design a sequence-support strategy, which maintains high GPU parallelism while ensuring sequence information. Our experimental evaluations show that G-TADOC provides 31.1× average speedup compared to state-of-the-art TADOC. DA - 2021/// PY - 2021/// DO - 10.1109/ICDE51399.2021.00148 SP - 1679-1690 SN - 1084-4627 KW - TADOC KW - GPU KW - parallelism KW - analytics on compressed data ER - TY - JOUR TI - Visual Analytics of Text Conversation Sentiment and Semantics AU - Healey, Christopher G. AU - Dinakaran, Gowtham AU - Padia, Kalpesh AU - Nie, Shaoliang AU - Benson, J. Riley AU - Caira, Dave AU - Shaw, Dean AU - Catalfu, Gary AU - Devarajan, Ravi T2 - Computer Graphics Forum AB - Abstract This paper describes the design and implementation of a web‐based system to visualize large collections of text conversations integrated into a hierarchical four‐level‐of‐detail design. Viewers can visualize conversations: (1) in a streamgraph topic overview for a user‐specified time period; (2) as emotion patterns for a topic chosen from the streamgraph; (3) as semantic sequences for a user‐selected emotion pattern, and (4) as an emotion‐driven conversation graph for a single conversation. We collaborated with the Live Chatcustomer service group at SAS Institute to design and evaluate our system's strengths and limitations. DA - 2021/8/13/ PY - 2021/8/13/ DO - https://doi.org/10.1111/cgf.14391 VL - n/a IS - n/a SP - UR - https://onlinelibrary.wiley.com/doi/10.1111/cgf.14391 ER - TY - JOUR TI - A Taxonomy and Survey on Experimentation Scenarios for Aerial Advanced Wireless Testbed Platforms AU - Chowdhury, Md Moin Uddin AU - Anjinappa, Chethan K. AU - Guvenc, Ismail AU - Sichitiu, Mihail AU - Ozdemir, Ozgur AU - Bhattacherjee, Udita AU - Dutta, Rudra AU - Marojevic, Vuk AU - Floyd, Brian T2 - 2021 IEEE AEROSPACE CONFERENCE (AEROCONF 2021) AB - There are various works in the recent literature on fundamental research and experimentation on unmanned aerial vehicle (UAV) communications. On the other hand, to our best knowledge, there is no taxonomy and survey on experimentation possibilities with a software-defined aerial wireless platform. The goal of this paper is first to have a brief overview of large-scale advanced wireless experimentation platforms broadly available to the wireless research community, including also the Aerial Experimentation and Research Platform for Advanced Wireless (AERPAW). We then provide a detailed taxonomy and a comprehensive survey of experimentation possibilities that can be carried out in a platform such as AERPAW. In particular, we conceptualize and present eleven different classes of advanced and aerial wireless experiments, provide several example experiments for each class, and discuss some of the existing related works in the literature. The paper will help to develop a better understanding of the equipment and software resources that can be available for experimentation in mid-scale wireless platforms, as well as the capabilities and limitations of such platforms. DA - 2021/// PY - 2021/// DO - 10.1109/AERO50100.2021.9438449 SP - SN - 1095-323X ER - TY - JOUR TI - Program Comprehension and Code Complexity Metrics: An fMRI Study AU - Peitek, Norman AU - Apel, Sven AU - Parnin, Chris AU - Brechmann, Andre AU - Siegmund, Janet T2 - 2021 IEEE/ACM 43RD INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2021) AB - Background: Researchers and practitioners have been using code complexity metrics for decades to predict how developers comprehend a program. While it is plausible and tempting to use code metrics for this purpose, their validity is debated, since they rely on simple code properties and rarely consider particularities of human cognition. Aims: We investigate whether and how code complexity metrics reflect difficulty of program comprehension. Method: We have conducted a functional magnetic resonance imaging (fMRI) study with 19 participants observing program comprehension of short code snippets at varying complexity levels. We dissected four classes of code complexity metrics and their relationship to neuronal, behavioral, and subjective correlates of program comprehension, overall analyzing more than 41 metrics. Results: While our data corroborate that complexity metrics can-to a limited degree-explain programmers' cognition in program comprehension, fMRI allowed us to gain insights into why some code properties are difficult to process. In particular, a code's textual size drives programmers' attention, and vocabulary size burdens programmers' working memory. Conclusion: Our results provide neuro-scientific evidence supporting warnings of prior research questioning the validity of code complexity metrics and pin down factors relevant to program comprehension. Future Work: We outline several follow-up experiments investigating fine-grained effects of code complexity and describe possible refinements to code complexity metrics. DA - 2021/// PY - 2021/// DO - 10.1109/ICSE43902.2021.00056 SP - 524-536 SN - 0270-5257 ER - TY - JOUR TI - Early Life Cycle Software Defect Prediction. Why? How? AU - Shrikanth, N. C. AU - Majumder, Suvodeep AU - Menzies, Tim T2 - 2021 IEEE/ACM 43RD INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2021) AB - Many researchers assume that, for software analytics, "more data is better." We write to show that, at least for learning defect predictors, this may not be true. To demonstrate this, we analyzed hundreds of popular GitHub projects. These projects ran for 84 months and contained 3,728 commits (median values). Across these projects, most of the defects occur very early in their life cycle. Hence, defect predictors learned from the first 150 commits and four months perform just as well as anything else. This means that, at least for the projects studied here, after the first few months, we need not continually update our defect prediction models. We hope these results inspire other researchers to adopt a "simplicity-first" approach to their work. Some domains require a complex and data-hungry analysis. But before assuming complexity, it is prudent to check the raw data looking for "short cuts" that can simplify the analysis. DA - 2021/// PY - 2021/// DO - 10.1109/ICSE43902.2021.00050 SP - 448-459 SN - 0270-5257 KW - sampling KW - early KW - defect prediction KW - analytics ER - TY - JOUR TI - NUMA-aware memory coloring for multicore real-time systems AU - Pan, Xing AU - Mueller, Frank T2 - JOURNAL OF SYSTEMS ARCHITECTURE AB - Non-uniform memory access (NUMA) systems are characterized by varying memory latencies so that execution times may become unpredictable in a multicore real-time system. This results in overly conservative scheduling with low utilization due to loose bounds on a task’s worst-case execution time (WCET). This work contributes a controller/node-aware memory coloring (CAMC) allocator inside the Linux kernel for the entire address space to reduce access conflicts and latencies by isolating tasks from one another. CAMC improves timing predictability and performance over Linux’ buddy allocator and prior coloring methods. It provides core isolation with respect to banks and memory controllers for real-time systems. This work is the first to consider multiple memory controllers in real-time systems, combine them with bank coloring, and assess its performance on a NUMA architecture, to the best of our knowledge. DA - 2021/9// PY - 2021/9// DO - 10.1016/j.sysarc.2021.102188 VL - 118 SP - SN - 1873-6165 KW - Memory access KW - NUMA KW - Real-time predictability ER - TY - JOUR TI - Predictive models with end user preference AU - Zhao, Yifan AU - Yang, Xian AU - Bolnykh, Carolina AU - Harenberg, Steve AU - Korchiev, Nodirbek AU - Yerramsetty, Saavan Raj AU - Vellanki, Bhanu Prasad AU - Kodumagulla, Ramakanth AU - Samatova, Nagiza F. T2 - STATISTICAL ANALYSIS AND DATA MINING AB - Abstract Classical machine learning models typically try to optimize the model based on the most discriminatory features of the data; however, they do not usually account for end user preferences. In certain applications, this can be a serious issue as models not aware of user preferences could become costly, untrustworthy, or privacy‐intrusive to use, thus becoming irrelevant and/or uninterpretable. Ideally, end users with domain knowledge could propose preferable features that the predictive model could then take into account. In this paper, we propose a generic modeling method that respects end user preferences via a relative ranking system to express multi‐criteria preferences and a regularization term in the model's objective function to incorporate the ranked preferences. In a more generic perspective, this method is able to plug user preferences into existing predictive models without creating completely new ones. We implement this method in the context of decision trees and are able to achieve a comparable classification accuracy while reducing the use of undesirable features. DA - 2021/8/26/ PY - 2021/8/26/ DO - 10.1002/sam.11545 VL - 8 SP - SN - 1932-1872 KW - child support KW - decision tree KW - predictive model KW - regularization KW - relative ranking KW - user preference ER - TY - JOUR TI - Emotions and the Comprehension of Single versus Multiple Texts during Game-based Learning AU - Dever, Daryn A. AU - Wiedbusch, Megan D. AU - Cloude, Elizabeth B. AU - Lester, James AU - Azevedo, Roger T2 - DISCOURSE PROCESSES AB - This study examined 57 learners’ emotions (i.e., joy, anger, confusion, frustration) as they engaged with scientific content while learning about microbiology with Crystal Island, a game-based learning environment (GBLE). Measures of learners’ prior knowledge, in-game text comprehension, facial expressions of emotion, and posttest reading comprehension were collected to examine the relationship between emotions and single- and multiple-text comprehension. Analyses found that both discrete and non-discrete emotions were expressed during reading and answering in-game assessments of single-text comprehension. Learners expressed greater joy during reading and greater expressions of anger, confusion, and frustration during in-game assessments. Further results found that learners who expressed a high number of different emotions throughout reading and completing in-game assessments tended to have lower in-game comprehension scores whereas a higher number of different expressed emotions while completing in-game assessments was associated with greater posttest comprehension. Finally, while increased prior knowledge was associated with higher single- and multiple-text comprehension, there was no interaction between prior knowledge and emotions on multiple-text comprehension. Overall, this study found that (1) learners often express more than one emotion during GBLE activities, (2) emotions expressed while learning with a GBLE shift across different activities, and (3) emotions are related to demonstrated comprehension, but the type of activity influences this relationship. Results from this study provide implications for how emotions can be examined as learners engage in GBLE activities as well as the design of GBLEs to support learners’ emotions accounting for different activity demands to increase comprehension of single and multiple texts. DA - 2021/8/27/ PY - 2021/8/27/ DO - 10.1080/0163853X.2021.1950450 SP - SN - 1532-6950 ER - TY - JOUR TI - Am I Playing Better Now? The Effects of G-SYNC in 60Hz Gameplay AU - Riahi, Maryam AU - Watson, Benjamin Allen T2 - PROCEEDINGS OF THE ACM ON COMPUTER GRAPHICS AND INTERACTIVE TECHNIQUES AB - G-SYNC technology matches formerly regular display refreshes to irregular frame updates, improving frame rates and interactive latency. In a previous study of gaming at the 30Hz frame rates common on consoles, players of Battlefield 4 were unable to discern when G-SYNC was in use, but scored higher with G-SYNC and were affected emotionally. We build on that study with the first examination of G-SYNC's effects at the 60Hz frame rate more common in PC gaming and on emerging consoles. Though G-SYNC's effects are less at 60Hz than they were at 30Hz, G-SYNC can still improve the performance of veteran players, particularly when games are challenging. G-SYNC's effects on emotion and experience were limited. DA - 2021/5// PY - 2021/5// DO - 10.1145/3451269 VL - 4 IS - 1 SP - SN - 2577-6193 KW - refresh rate KW - frame rate KW - latency KW - computer games KW - user experience ER - TY - JOUR TI - Local Clustering with Mean Teacher for Semi-supervised learning AU - Chen, Zexi AU - Dutton, Benjamin AU - Ramachandra, Bharathkumar AU - Wu, Tianfu AU - Vatsavai, Ranga Raju T2 - 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR) AB - The Mean Teacher (MT) model of Tarvainen and Valpola has shown good performance on several semi-supervised benchmark datasets. MT maintains a teacher model's weights as the exponential moving average of a student model's weights and minimizes the divergence between their probability predictions under diverse perturbations of the inputs. However, MT is known to suffer from confirmation bias, that is, reinforcing incorrect teacher model predictions. In this work, we propose a simple yet effective method called Local Clustering (LC) to mitigate the effect of confirmation bias. In MT, each data point is considered independent of other points during training; however, data points are likely to be close to each other in feature space if they share similar features. Motivated by this, we cluster data points locally by minimizing the pairwise distance between neighboring data points in feature space. Combined with a standard classification cross-entropy objective on labeled data points, the misclassified unlabeled data points are pulled towards high-density regions of their correct class with the help of their neighbors, thus improving model performance. We demonstrate on semi-supervised benchmark datasets SVHN and CIFAR-10 that adding our LC loss to MT yields significant improvements compared to MT and performance comparable to the state of the art in semi-supervised learning 11 The code is available at: https://github.com/jay1204/local_clustering_with_mt_for_ssl. DA - 2021/// PY - 2021/// DO - 10.1109/ICPR48806.2021.9412469 SP - 6243-6250 SN - 1051-4651 ER - TY - JOUR TI - Toward a rational and ethical sociotechnical system of autonomous vehicles: A novel application of multi-criteria decision analysis AU - Dubljevic, Veljko AU - List, George AU - Milojevich, Jovan AU - Ajmeri, Nirav AU - Bauer, William A. AU - Singh, Munindar P. AU - Bardaka, Eleni AU - Birkland, Thomas A. AU - Edwards, Charles H. W. AU - Mayer, Roger C. AU - Muntean, Ioan AU - Powers, Thomas M. AU - Rakha, Hesham A. AU - Ricks, Vance A. AU - Samandar, M. Shoaib T2 - PLOS ONE AB - The impacts of autonomous vehicles (AV) are widely anticipated to be socially, economically, and ethically significant. A reliable assessment of the harms and benefits of their large-scale deployment requires a multi-disciplinary approach. To that end, we employed Multi-Criteria Decision Analysis to make such an assessment. We obtained opinions from 19 disciplinary experts to assess the significance of 13 potential harms and eight potential benefits that might arise under four deployments schemes. Specifically, we considered: (1) the status quo, i.e., no AVs are deployed; (2) unfettered assimilation, i.e., no regulatory control would be exercised and commercial entities would "push" the development and deployment; (3) regulated introduction, i.e., regulatory control would be applied and either private individuals or commercial fleet operators could own the AVs; and (4) fleets only, i.e., regulatory control would be applied and only commercial fleet operators could own the AVs. Our results suggest that two of these scenarios, (3) and (4), namely regulated privately-owned introduction or fleet ownership or autonomous vehicles would be less likely to cause harm than either the status quo or the unfettered options. DA - 2021/// PY - 2021/// DO - 10.1371/journal.pone.0256224 VL - 16 IS - 8 SP - SN - 1932-6203 ER - TY - JOUR TI - Significance of multi-hazard risk in design of buildings under earthquake and wind loads AU - Kwag, Shinyoung AU - Gupta, Abhinav AU - Baugh, John AU - Kim, Hyun-Su T2 - ENGINEERING STRUCTURES AB - • Development of a performance-based framework to consider multiple hazards. • Significance of multi-hazard design is shown through retrofit solutions in buildings. • Cost-effective damper design is explored under two different hazards. Traditionally, external hazards are considered in the design of a building through the various combinations of loads prescribed in relevant design codes and standards. It is often the case that the design is governed by a single dominant hazard at a given geographic location. This is particularly true for earthquake and wind hazards, both of which impart time-dependent dynamic loads on the structure. Engineers may nevertheless wonder if a building designed for one of the two dominant hazards will satisfactorily withstand the other. Prior studies have indicated that in some cases, when a building is designed for a single dominant hazard, it does not necessarily provide satisfactory performance against the other hazard. In this paper, we propose a novel framework that builds upon performance-based design requirements and determines whether the design of a building is governed primarily by a single hazard or multiple hazards. It integrates site-dependent hazard characteristics with the performance criteria for a given building type and building geometry. The framework is consistent with the burgeoning area of probabilistic risk assessment, and yet can easily be extended to traditional, deterministically characterized design requirements as illustrated herein. DA - 2021/9/15/ PY - 2021/9/15/ DO - 10.1016/j.engstruct.2021.112623 VL - 243 SP - SN - 1873-7323 KW - Earthquake and wind hazards KW - Performance-based design KW - Risk-based multi-hazard approach KW - Multi-hazard risk map KW - Multi-hazard scenario KW - Magneto-rheological damper KW - Adjacent buildings ER - TY - JOUR TI - Favocado: Fuzzing the Binding Code of JavaScript Engines Using Semantically Correct Test Cases AU - Dinh, Sung Ta AU - Cho, Haehyun AU - Martin, Kyle AU - Oest, Adam AU - Zeng, Kyle AU - Kapravelos, Alexandros AU - Ahn, Gail-Joon AU - Bao, Tiffany AU - Wang, Ruoyu AU - Doupe, Adam AU - Shoshitaishvili, Yan T2 - 28TH ANNUAL NETWORK AND DISTRIBUTED SYSTEM SECURITY SYMPOSIUM (NDSS 2021) AB - JavaScript runtime systems include some specialized programming interfaces, called binding layers.Binding layers translate data representations between JavaScript and unsafe low-level languages, such as C and C++, by converting data between different types.Due to the wide adoption of JavaScript (and JavaScript engines) in the entire computing ecosystem, discovering bugs in JavaScript binding layers is critical.Nonetheless, existing JavaScript fuzzers cannot adequately fuzz binding layers due to two major challenges: Generating syntactically and semantically correct test cases and reducing the size of the input space for fuzzing.In this paper, we propose Favocado, a novel fuzzing approach that focuses on fuzzing binding layers of JavaScript runtime systems.Favocado can generate syntactically and semantically correct JavaScript test cases through the use of extracted semantic information and careful maintaining of execution states.This way, test cases that Favocado generates do not raise unintended runtime exceptions, which substantially increases the chance of triggering binding code.Additionally, exploiting a unique feature (relative isolation) of binding layers, Favocado significantly reduces the size of the fuzzing input space by splitting DOM objects into equivalence classes and focusing fuzzing within each equivalence class.We demonstrate the effectiveness of Favocado in our experiments and show that Favocado outperforms a stateof-the-art DOM fuzzer.Finally, during the evaluation, we find 61 previously unknown bugs in four JavaScript runtime systems (Adobe Acrobat Reader, Foxit PDF Reader, Chromium, and WebKit).33 of these bugs are security vulnerabilities. DA - 2021/// PY - 2021/// DO - 10.14722/ndss.2021.24224 SP - ER - TY - JOUR TI - Hey Alexa, is this Skill Safe?: Taking a Closer Look at the Alexa Skill Ecosystem AU - Lentzsch, Christopher AU - Shah, Sheel Jayesh AU - Andow, Benjamin AU - Degeling, Martin AU - Das, Anupam AU - Enck, William T2 - 28TH ANNUAL NETWORK AND DISTRIBUTED SYSTEM SECURITY SYMPOSIUM (NDSS 2021) AB - Amazon's voice-based assistant, Alexa, enables users to directly interact with various web services through natural language dialogues.It provides developers with the option to create third-party applications (known as Skills) to run on top of Alexa.While such applications ease users' interaction with smart devices and bolster a number of additional services, they also raise security and privacy concerns due to the personal setting they operate in.This paper aims to perform a systematic analysis of the Alexa skill ecosystem.We perform the first largescale analysis of Alexa skills, obtained from seven different skill stores totaling to 90,194 unique skills.Our analysis reveals several limitations that exist in the current skill vetting process.We show that not only can a malicious user publish a skill under any arbitrary developer/company name, but she can also make backend code changes after approval to coax users into revealing unwanted information.We, next, formalize the different skillsquatting techniques and evaluate the efficacy of such techniques.We find that while certain approaches are more favorable than others, there is no substantial abuse of skill squatting in the real world.Lastly, we study the prevalence of privacy policies across different categories of skill, and more importantly the policy content of skills that use the Alexa permission model to access sensitive user data.We find that around 23.3 % of such skills do not fully disclose the data types associated with the permissions requested.We conclude by providing some suggestions for strengthening the overall ecosystem, and thereby enhance transparency for end-users.1 Example of a skill that interacts with cars: https://amazon.com/Alexa- DA - 2021/// PY - 2021/// DO - 10.14722/ndss.2021.23111 SP - ER - TY - JOUR TI - Hercule: Representing and Reasoning About Norms as a Foundation for Declarative Contracts Over Blockchain AU - Christie, Samuel H. AU - Singh, Munindar P. AU - Chopra, Amit K. T2 - IEEE INTERNET COMPUTING AB - Current blockchain approaches for business contracts are based on smart contracts, namely, software programs placed on a blockchain that are automatically executed to realize a contract. However, smart contracts lack flexibility and interfere with the autonomy of the parties concerned. We propose Hercule, an approach for declaratively specifying blockchain applications in a manner that reflects business contracts. Hercule represents a contract via regulatory norms that capture the involved parties’ expectations of one another. It computes the states of norms (hence, of contracts) from events in the blockchain. Hercule’s novelty and significance lie in that it operationalizes declarative contracts over semistructured databases, the underlying representation for practical blockchain such as Hyperledger Fabric and Ethereum. Specifically, it exploits the map–reduce capabilities of such stores to compute norm states. We demonstrate that our implementation over Hyperledger Fabric can process thousands of events per second, sufficient for many applications. DA - 2021/// PY - 2021/// DO - 10.1109/MIC.2021.3080982 VL - 25 IS - 4 SP - 67-75 SN - 1941-0131 UR - https://doi.org/10.1109/MIC.2021.3080982 KW - Blockchain KW - History KW - Smart contracts KW - Distributed ledger KW - Law KW - Authorization KW - Blockchain KW - Contract KW - Regulatory norm KW - Document store ER - TY - JOUR TI - A Gray Box Conceptual Model for Accountability and Ethics in Business Contracts AU - Singh, Munindar P. AU - Gao, Xibin T2 - IEEE INTERNET COMPUTING AB - Current computational models are inadequate for the purposes of modeling interactions between autonomous parties in a way that highlights and supports their accountability. We propose a new conceptual model for business contracts based on norms motivated by a review of real-life business contracts. Our conception is of a gray box, reflecting the idea that a contract makes the participants accountable to one another and to outside entities, and therefore calls for the exposure of sufficient implementation details. The model consists of a recursively applicable taxonomy of clause types. In a preliminary study, we found that computer scientists are able to effectively identify the concepts introduced in this model, thereby indicating its potential for building Internet applications that support accountability. DA - 2021/// PY - 2021/// DO - 10.1109/MIC.2021.3083295 VL - 25 IS - 4 SP - 13-19 SN - 1941-0131 UR - https://doi.org/10.1109/MIC.2021.3083295 KW - Ethics KW - Computational modeling KW - Taxonomy KW - Business KW - Internet KW - Contracts ER - TY - JOUR TI - A Theoretical and Evidence-Based Conceptual Design of MetaDash: An Intelligent Teacher Dashboard to Support Teachers' Decision Making and Students' Self-Regulated Learning AU - Wiedbusch, Megan D. AU - Kite, Vance AU - Yang, , Xi AU - Park, Soonhye AU - Chi, Min AU - Taub, Michelle AU - Azevedo, Roger T2 - FRONTIERS IN EDUCATION AB - Teachers’ ability to self-regulate their own learning is closely related to their competency to enhance self-regulated learning (SRL) in their students. Accordingly, there is emerging research for the design of teacher dashboards that empower instructors by providing access to quantifiable evidence of student performance and SRL processes. Typically, they capture evidence of student learning and performance to be visualized through activity traces (e.g., bar charts showing correct and incorrect response rates, etc.) and SRL data (e.g., eye-tracking on content, log files capturing feature selection, etc.) in order to provide teachers with monitoring and instructional tools. Critics of the current research on dashboards used in conjunction with advanced learning technologies (ALTs) such as simulations, intelligent tutoring systems, and serious games, argue that the state of the field is immature and has 1) focused only on exploratory or proof-of-concept projects, 2) investigated data visualizations of performance metrics or simplistic learning behaviors, and 3) neglected most theoretical aspects of SRL including teachers’ general lack of understanding their’s students’ SRL. Additionally, the work is mostly anecdotal, lacks methodological rigor, and does not collect critical process data (e.g. frequency, duration, timing, or fluctuations of cognitive, affective, metacognitive, and motivational (CAMM) SRL processes) during learning with ALTs used in the classroom. No known research in the areas of learning analytics, teacher dashboards, or teachers’ perceptions of students’ SRL and CAMM engagement has systematically and simultaneously examined the deployment, temporal unfolding, regulation, and impact of all these key processes during complex learning. In this manuscript, we 1) review the current state of ALTs designed using SRL theoretical frameworks and the current state of teacher dashboard design and research, 2) report the important design features and elements within intelligent dashboards that provide teachers with real-time data visualizations of their students’ SRL processes and engagement while using ALTs in classrooms, as revealed from the analysis of surveys and focus groups with teachers, and 3) propose a conceptual system design for integrating reinforcement learning into a teacher dashboard to help guide the utilization of multimodal data collected on students’ and teachers’ CAMM SRL processes during complex learning. DA - 2021/2/19/ PY - 2021/2/19/ DO - 10.3389/feduc.2021.570229 VL - 6 SP - SN - 2504-284X KW - self-regulated learning (SRL) KW - teacher decision making KW - learning KW - multimodal data KW - teacher dashboards ER - TY - JOUR TI - Event driven sensor fusion AU - Roheda, Siddharth AU - Krim, Hamid AU - Luo, Zhi-Quan AU - Wu, Tianfu T2 - SIGNAL PROCESSING AB - Multi sensor fusion has long been of interest in target detection and tracking. Different sensors are capable of observing different characteristics about a target, hence, providing additional information toward determining a target’s identity. If used constructively, any additional information should have a positive impact on the performance of the system. In this paper, we consider such a scenario and present a principled approach toward ensuring constructive combination of the various sensors. We look at Decision Level Sensor Fusion under a different light wherein each sensor is said to make a decision on occurrence of certain events that it is capable of observing rather than making a decision on whether a certain target is present. These events are formalized to each sensor according to its potentially extracted attributes to define targets. The proposed technique also explores the extent of dependence between features/events being observed by the sensors, and hence generates more informed probability distributions over the events. In our case, we will study two different datasets. The first one, combines a Radar sensor with an optical sensor for detection of space debris, while the second one combines a seismic sensor with an acoustic sensor in order to detect human and vehicular targets in a field of interest. Provided some additional information about the features of the object, this fusion technique can outperform other existing decision level fusion approaches that may not take into account the relationship between different features. Furthermore, this paper also addresses the issue of coping with damaged sensors when using the model, by learning a hidden space between sensor modalities which can be exploited to safeguard detection performance. DA - 2021/11// PY - 2021/11// DO - 10.1016/j.sigpro.2021.108241 VL - 188 SP - SN - 1872-7557 KW - Sensor fusion KW - Multi-modal fusion KW - Event driven classification ER - TY - JOUR TI - The persistent threat of emerging plant disease pandemics to global food security AU - Ristaino, Jean B. AU - Anderson, Pamela K. AU - Bebber, Daniel P. AU - Brauman, Kate A. AU - Cunniffe, Nik J. AU - Fedoroff, Nina V AU - Finegold, Cambria AU - Garrett, Karen A. AU - Gilligan, Christopher A. AU - Jones, Christopher M. AU - Martin, Michael D. AU - MacDonald, Graham K. AU - Neenan, Patricia AU - Records, Angela AU - Schmale, David G. AU - Tateosian, Laura AU - Wei, Qingshan T2 - PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA AB - Plant disease outbreaks are increasing and threaten food security for the vulnerable in many areas of the world. Now a global human pandemic is threatening the health of millions on our planet. A stable, nutritious food supply will be needed to lift people out of poverty and improve health outcomes. Plant diseases, both endemic and recently emerging, are spreading and exacerbated by climate change, transmission with global food trade networks, pathogen spillover, and evolution of new pathogen lineages. In order to tackle these grand challenges, a new set of tools that include disease surveillance and improved detection technologies including pathogen sensors and predictive modeling and data analytics are needed to prevent future outbreaks. Herein, we describe an integrated research agenda that could help mitigate future plant disease pandemics. DA - 2021/6/8/ PY - 2021/6/8/ DO - 10.1073/pnas.2022239118 VL - 118 IS - 23 SP - SN - 0027-8424 KW - emerging plant disease KW - plant pathology KW - food security ER - TY - JOUR TI - Multiscale Sensor Fusion for Display-Centered Head Tracking AU - Wu, Tianyu AU - Watson, Benjamin T2 - 2021 IEEE CONFERENCE ON VIRTUAL REALITY AND 3D USER INTERFACES ABSTRACTS AND WORKSHOPS (VRW 2021) AB - Emerging display usage scenarios require head tracking both at short (<; 1m) and modest (<; 3m) ranges. Yet it is difficult to find low-cost, unobtrusive tracking solutions that remain accurate across this range. By combining multiple head tracking solutions, we can mitigate the weaknesses of one solution with the strengths of another and improve head tracking overall. We built such a combination of two widely available and low-cost trackers, a Tobii Eye Tracker and a Kinect. The resulting system is more effective than Kinect at short range, and than the Tobii at a more distant range. In this paper, we discuss how we accomplish this sensor fusion and compare our combined system to an existing mechanical tracker to evaluate its accuracy across its combined range. DA - 2021/// PY - 2021/// DO - 10.1109/VRW52623.2021.00143 SP - 522-523 KW - Computing methodologies KW - Computer graphics KW - Graphics systems and interfaces KW - Virtual reality KW - Computing methodologies KW - Artificial intelligence KW - Computer vision problems KW - Tracking ER - TY - JOUR TI - Leveraging Granularity: Hierarchical Reinforcement Learning for Pedagogical Policy Induction AU - Zhou, Guojing AU - Azizsoltani, Hamoon AU - Ausin, Markel Sanz AU - Barnes, Tiffany AU - Chi, Min T2 - INTERNATIONAL JOURNAL OF ARTIFICIAL INTELLIGENCE IN EDUCATION DA - 2021/8/16/ PY - 2021/8/16/ DO - 10.1007/s40593-021-00269-9 VL - 8 SP - SN - 1560-4306 KW - Hierarchical reinforcement learning KW - Decision granularity KW - Pedagogical policy ER - TY - JOUR TI - Automated tracking of S. pombe spindle elongation dynamics AU - Uzsoy, Ana Sofia M. AU - Zareiesfandabadi, Parsa AU - Jennings, Jamie AU - Kemper, Alexander F. AU - Elting, Mary Williard T2 - JOURNAL OF MICROSCOPY AB - The mitotic spindle is a microtubule-based machine that pulls the two identical sets of chromosomes to opposite ends of the cell during cell division. The fission yeast Schizosaccharomyces pombe is an important model organism for studying mitosis due to its simple, stereotyped spindle structure and well-established genetic toolset. S. pombe spindle length is a useful metric for mitotic progression, but manually tracking spindle ends in each frame to measure spindle length over time is laborious and can limit experimental throughput. We have developed an ImageJ plugin that can automatically track S. pombe spindle length over time and replace manual or semi-automated tracking of spindle elongation dynamics. Using an algorithm that detects the principal axis of the spindle and then finds its ends, we reliably track the length of the spindle as the cell divides. The plugin integrates with existing ImageJ features, exports its data for further analysis outside of ImageJ and does not require any programming by the user. Thus, the plugin provides an accessible tool for quantification of S. pombe spindle length that will allow automatic analysis of large microscopy data sets and facilitate screening for effects of cell biological perturbations on mitotic progression.The mitotic spindle is a biological machine that pulls the two identical sets of DNA to opposite ends of the cell during cell division. Incorrect cell division can result in serious issues like cancer and miscarriages. Schizosaccharomyces pombe (S. pombe), a kind of yeast, is commonly used to study cell division because its mitotic spindle is essentially linear in shape and its DNA sequence is well known, allowing for more complex experiments. To measure how well a cell divides, we measure the length of the spindle over time, but this can be tedious to do by hand for many cell images. We have developed software that interfaces with ImageJ (a common image analysis tool) that automatically tracks the length of S. pombe spindles over time and can replace manual tracking. Our software calculates the spindle's lines of symmetry, while allows us to accurately measure the length and track the ends over time. It integrates with existing ImageJ features, exports its data for further analysis outside of ImageJ, and does not require any programming by the user. Thus, the plugin provides an accessible tool for measuring S. pombe spindle length that will allow automatic analysis of large microscopy data sets and facilitate screening for effects of defects in cell division. This will facilitate the study of the basic fundamental process of how cells divide, and could have significant long term medical impacts. DA - 2021/7/8/ PY - 2021/7/8/ DO - 10.1111/jmi.13044 SP - SN - 1365-2818 KW - automation KW - cell division KW - fission yeast KW - fluorescence imaging KW - image analysis KW - live-cell imaging KW - mitotic spindle ER - TY - JOUR TI - Prompting collaborative and exploratory discourse: An epistemic network analysis study AU - Vandenberg, Jessica AU - Zakaria, Zarifa AU - Tsan, Jennifer AU - Iwanski, Anna AU - Lynch, Collin AU - Boyer, Kristy Elizabeth AU - Wiebe, Eric T2 - INTERNATIONAL JOURNAL OF COMPUTER-SUPPORTED COLLABORATIVE LEARNING DA - 2021/8/7/ PY - 2021/8/7/ DO - 10.1007/s11412-021-09349-3 SP - SN - 1556-1615 KW - Epistemic network analysis KW - Primary grades KW - Discourse KW - Pair programming KW - Collaboration ER - TY - JOUR TI - Automated Object Manipulation Using Vision-Based Mobile Robotic System for Construction Applications AU - Asadi, Khashayar AU - Haritsa, Varun R. AU - Han, Kevin AU - Ore, John-Paul T2 - JOURNAL OF COMPUTING IN CIVIL ENGINEERING AB - In the last decade, automated object manipulation for construction applications has received much attention. However, the majority of existing systems are situated in a fixed location. They are mostly static systems surrounded by necessary tools to manipulate objects within their workspace. Mobility is an essential and key challenge for different construction applications, such as material handling and site cleaning. To fill this gap, this paper presents a mobile robotic system capable of vision-based object manipulation for construction applications. This system integrates scene understanding and autonomous navigation with object grasping. To achieve this, two stereo cameras and a robotic arm are mounted on a mobile platform. This integrated system uses a global-to-local control planning strategy to reach the objects of interest (in this study, bricks, wood sticks, and pipes). Then, the scene perception, together with grasp and control planning, enables the system to detect the objects of interest, pick, and place them in a predetermined location depending on the application. The system is implemented and validated in a construction-like environment for pick-and-place activities. The results demonstrate the effectiveness of this fully autonomous system using solely onboard sensing for real-time applications with end-effector positioning accuracy of less than a centimeter. DA - 2021/1/1/ PY - 2021/1/1/ DO - 10.1061/(ASCE)CP.1943-5487.0000946 VL - 35 IS - 1 SP - SN - 1943-5487 ER - TY - JOUR TI - Original Learning Drug-Disease-Target Embedding (DDTE) from knowledge graphs to inform drug repurposing hypotheses AU - Moon, Changsung AU - Jin, Chunming AU - Dong, Xialan AU - Abrar, Saad AU - Zheng, Weifan AU - Chirkova, Rada Y. AU - Tropsha, Alexander T2 - JOURNAL OF BIOMEDICAL INFORMATICS AB - We aimed to develop and validate a new graph embedding algorithm for embedding drug-disease-target networks to generate novel drug repurposing hypotheses. Our model denotes drugs, diseases and targets as subjects, predicates and objects, respectively. Each entity is represented by a multidimensional vector and the predicate is regarded as a translation vector from a subject to an object vectors. These vectors are optimized so that when a subject-predicate-object triple represents a known drug-disease-target relationship, the summed vector between the subject and the predicate is to be close to that of the object; otherwise, the summed vector is distant from the object. The DTINet dataset was utilized to test this algorithm and discover unknown links between drugs and diseases. In cross-validation experiments, this new algorithm outperformed the original DTINet model. The MRR (Mean Reciprocal Rank) values of our models were around 0.80 while those of the original model were about 0.70. In addition, we have identified and verified several pairs of new therapeutic relations as well as adverse effect relations that were not recorded in the original DTINet dataset. This approach showed excellent performance, and the predicted drug-disease and drug-side-effect relationships were found to be consistent with literature reports. This novel method can be used to analyze diverse types of emerging biomedical and healthcare-related knowledge graphs (KG). DA - 2021/7// PY - 2021/7// DO - 10.1016/j.jbi.2021.103838 VL - 119 SP - SN - 1532-0480 KW - Data mining KW - Graph embedding KW - Knowledge graph KW - Drug repurposing ER - TY - JOUR TI - Beginning with machine learning: a comprehensive primer AU - Yedida, Rahul AU - Saha, Snehanshu T2 - EUROPEAN PHYSICAL JOURNAL-SPECIAL TOPICS DA - 2021/7/21/ PY - 2021/7/21/ DO - 10.1140/epjs/s11734-021-00209-7 VL - 7 IS - 10 SP - SN - 1951-6401 UR - http://www.scopus.com/inward/record.url?eid=2-s2.0-85110638036&partnerID=MN8TOARS ER - TY - JOUR TI - A reinforcement learning approach to adaptive remediation in online training AU - Spain, Randall AU - Rowe, Jonathan AU - Smith, Andy AU - Goldberg, Benjamin AU - Pokorny, Robert AU - Mott, Bradford AU - Lester, James T2 - JOURNAL OF DEFENSE MODELING AND SIMULATION-APPLICATIONS METHODOLOGY TECHNOLOGY-JDMS AB - Advances in artificial intelligence (AI) and machine learning can be leveraged to tailor training based on the goals, learning needs, and preferences of learners. A key component of adaptive training systems is tutorial planning, which controls how scaffolding is structured and delivered to learners to create dynamically personalized learning experiences. The goal of this study was to induce data-driven policies for tutorial planning using reinforcement learning (RL) to provide adaptive scaffolding based on the Interactive, Constructive, Active, Passive framework for cognitive engagement. We describe a dataset that was collected to induce RL-based scaffolding policies, and we present the results of our policy analyses. Results showed that the best performing policies optimized learning gains by inducing an adaptive fading approach in which learners received less cognitively engaging forms of remediation as they advanced through the training course. This policy was consistent with preliminary analyses that showed constructive remediation became less effective as learners progressed through the training session. Results also showed that learners’ prior knowledge impacted the type of scaffold that was recommended, thus showing evidence of an aptitude–treatment interaction. We conclude with a discussion of how AI-based training can be leveraged to enhance training effectiveness as well as directions for future research. DA - 2021/7/23/ PY - 2021/7/23/ DO - 10.1177/15485129211028317 VL - 7 SP - SN - 1557-380X KW - Tutorial planning KW - adaptive remediation KW - reinforcement learning KW - adaptive instructional systems ER - TY - JOUR TI - Nova: Value-based Negotiation of Norms AU - Aydogan, Reyhan AU - Kafali, Ozgur AU - Arslan, Furkan AU - Jonker, Catholijn M. AU - Singh, Munindar P. T2 - ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY AB - Specifying a normative multiagent system (nMAS) is challenging, because different agents often have conflicting requirements. Whereas existing approaches can resolve clear-cut conflicts, tradeoffs might occur in practice among alternative nMAS specifications with no apparent resolution. To produce an nMAS specification that is acceptable to each agent, we model the specification process as a negotiation over a set of norms. We propose an agent-based negotiation framework, where agents’ requirements are represented as values (e.g., patient safety, privacy, and national security), and an agent revises the nMAS specification to promote its values by executing a set of norm revision rules that incorporate ontology-based reasoning. To demonstrate that our framework supports creating a transparent and accountable nMAS specification, we conduct an experiment with human participants who negotiate against our agent. Our findings show that our negotiation agent reaches better agreements (with small p -value and large effect size) faster than a baseline strategy. Moreover, participants perceive that our agent enables more collaborative and transparent negotiations than the baseline (with small p -value and large effect size in particular settings) toward reaching an agreement. DA - 2021/8// PY - 2021/8// DO - 10.1145/3465054 VL - 12 IS - 4 SP - SN - 2157-6912 UR - https://doi.org/10.1145/3465054 KW - Sociotechnical systems KW - conflicting requirements KW - human-agent negotiation ER - TY - JOUR TI - Traffic Analysis in Support of Hybrid SDN Campus Architectures for Enhanced Cybersecurity AU - Brockelsby, William AU - Dutta, Rudra T2 - 2021 24TH CONFERENCE ON INNOVATION IN CLOUDS, INTERNET AND NETWORKS AND WORKSHOPS (ICIN) AB - The scale and complexity of campus networks continues to accelerate due to recent paradigms such as the Internet of Things (IoT) resulting in a heightened awareness of the need for enhanced cybersecurity. Traditional cybersecurity approaches such as the placement of firewalls and other policy enforcement mechanisms at strategic choke points effectively divide the network into zones and are unable to regulate intrazone host-to-host communication. This traditional approach introduces significant risk as there is little in place to prevent the horizontal propagation of malware or other unwanted traffic within a given zone. In this paper we explore approaches for improving cybersecurity in campus networks by analyzing contemporary campus traffic patterns and propose several architectural enhancements in light of these patterns which introduce strategically placed hardware or hardware-accelerated software data planes which are evaluated from performance and effectiveness perspectives. DA - 2021/// PY - 2021/// DO - 10.1109/ICIN51074.2021.9385530 SP - SN - 2472-8144 KW - cybersecurity KW - campus network architecture ER - TY - JOUR TI - Teachable Agent as an Interactive Tool for Cognitive Task Analysis: A Case Study for Authoring an Expert Model AU - Matsuda, Noboru T2 - INTERNATIONAL JOURNAL OF ARTIFICIAL INTELLIGENCE IN EDUCATION DA - 2021/7/12/ PY - 2021/7/12/ DO - 10.1007/s40593-021-00265-z VL - 7 SP - SN - 1560-4306 KW - Authoring tools and methods KW - Cognitive task analysis KW - Human-computer interaction KW - Intelligent tutoring systems KW - Interactive learning environment ER - TY - JOUR TI - Hardware-Based Address-Centric Acceleration of Key-Value Store AU - Ye, Chencheng AU - Xu, Yuanchao AU - Shen, Xipeng AU - Liao, Xiaofei AU - Jin, Hai AU - Solihin, Yan T2 - 2021 27TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2021) AB - Efficiently retrieving data is essential for key-value store applications. A major part of the retrieving time is on data addressing, that is, finding the location of the value in memory that corresponds to a key. This paper introduces an address-centric approach to speed up the addressing by creating a shortcut for the translation of a key to the physical address of the value. The new technique is materialized with a novel in-memory table, STLT, a virtual-physical address buffer, and two new instructions. It creates a fast path for data addressing and meanwhile opens up opportunities for the use of simpler and faster hash tables to strike a better tradeoff between hashing conflicts and hashing overhead. Together, the new technique brings up to 1.4× speedups on key-value store application Redis and up to 13× speedups on some widely used indexing data structures, consistently outperforming prior solutions significantly. DA - 2021/// PY - 2021/// DO - 10.1109/HPCA51647.2021.00067 SP - 736-748 SN - 1530-0897 ER - TY - JOUR TI - Characterizing Crowds to Better Optimize Worker Recommendation in Crowdsourced Testing AU - Wang, Junjie AU - Wang, Song AU - Chen, Jianfeng AU - Menzies, Tim AU - Cui, Qiang AU - Xie, Miao AU - Wang, Qing T2 - IEEE TRANSACTIONS ON SOFTWARE ENGINEERING AB - Crowdsourced testing is an emerging trend, in which test tasks are entrusted to the online crowd workers. Typically, a crowdsourced test task aims to detect as many bugs as possible within a limited budget. However not all crowd workers are equally skilled at finding bugs; Inappropriate workers may miss bugs, or report duplicate bugs, while hiring them requires nontrivial budget. Therefore, it is of great value to recommend a set of appropriate crowd workers for a test task so that more software bugs can be detected with fewer workers. This paper first presents a new characterization of crowd workers and characterizes them with testing context, capability, and domain knowledge. Based on the characterization, we then propose Multi-Objective Crowd wOrker recoMmendation approach (MOCOM), which aims at recommending a minimum number of crowd workers who could detect the maximum number of bugs for a crowdsourced testing task. Specifically, MOCOM recommends crowd workers by maximizing the bug detection probability of workers, the relevance with the test task, the diversity of workers, and minimizing the test cost. We experimentally evaluate MOCOM on 532 test tasks, and results show that MOCOM significantly outperforms five commonly-used and state-of-the-art baselines. Furthermore, MOCOM can reduce duplicate reports and recommend workers with high relevance and larger bug detection probability; because of this it can find more bugs with fewer workers. DA - 2021/6/1/ PY - 2021/6/1/ DO - 10.1109/TSE.2019.2918520 VL - 47 IS - 6 SP - 1259-1276 SN - 1939-3520 UR - https://doi.org/10.1109/TSE.2019.2918520 KW - Crowdsourced testing KW - crowd worker recommendation KW - multi-objective optimization ER - TY - JOUR TI - CoCoPIE: Enabling Real-Time AI on Off-the-Shelf Mobile Devices via Compression-Compilation Co-Design AU - Guan, Hui AU - Liu, Shaoshan AU - Ma, Xiaolong AU - Niu, Wei AU - Ren, Bin AU - Shen, Xipeng AU - Wang, Yanzhi AU - Zhao, Pu T2 - COMMUNICATIONS OF THE ACM AB - A new framework allows intelligence on mainstream end devices without special hardware. DA - 2021/6// PY - 2021/6// DO - 10.1145/3418297 VL - 64 IS - 6 SP - 62-68 SN - 1557-7317 ER - TY - JOUR TI - Tackling the Credit Assignment Problem in Reinforcement Learning-Induced Pedagogical Policies with Neural Networks AU - Ausin, Markel Sanz AU - Maniktala, Mehak AU - Barnes, Tiffany AU - Chi, Min T2 - ARTIFICIAL INTELLIGENCE IN EDUCATION (AIED 2021), PT I AB - Intelligent Tutoring Systems (ITS) provide a powerful tool for students to learn in an adaptive, personalized, and goal-oriented manner. In recent years, Reinforcement Learning (RL) has shown to be capable of leveraging previous student data to induce effective pedagogical policies for future students. One of the most desirable goals of these policies is to maximize student learning gains while minimizing the training time. However, this metric is often not available until a student has completed the entire tutor. For this reason, the reinforcement signal of the effectiveness of the tutor is delayed. Assigning credit for each intermediate action based on a delayed reward is a challenging problem denoted the temporal Credit Assignment Problem (CAP). The CAP makes it difficult for most RL algorithms to assign credit to each action. In this work, we develop a general Neural Network-based algorithm that tackles the CAP by inferring immediate rewards from delayed rewards. We perform two empirical classroom studies, and the results show that this algorithm, in combination with a Deep RL agent, can improve student learning performance while reducing training time. DA - 2021/// PY - 2021/// DO - 10.1007/978-3-030-78292-4_29 VL - 12748 SP - 356-368 SN - 1611-3349 UR - https://doi.org/10.1007/978-3-030-78292-4_29 KW - Pedagogical agent KW - Credit assignment problem KW - Deep reinforcement learning ER - TY - JOUR TI - An improved text classification modelling approach to identify security messages in heterogeneous projects AU - Oyetoyan, Tosin Daniel AU - Morrison, Patrick T2 - SOFTWARE QUALITY JOURNAL AB - Abstract Security remains under-addressed in many organisations, illustrated by the number of large-scale software security breaches. Preventing breaches can begin during software development if attention is paid to security during the software’s design and implementation. One approach to security assurance during software development is to examine communications between developers as a means of studying the security concerns of the project. Prior research has investigated models for classifying project communication messages (e.g., issues or commits) as security related or not. A known problem is that these models are project-specific, limiting their use by other projects or organisations. We investigate whether we can build a generic classification model that can generalise across projects. We define a set of security keywords by extracting them from relevant security sources, dividing them into four categories: asset, attack/threat, control/mitigation, and implicit. Using different combinations of these categories and including them in the training dataset, we built a classification model and evaluated it on industrial, open-source, and research-based datasets containing over 45 different products. Our model based on harvested security keywords as a feature set shows average recall from 55 to 86%, minimum recall from 43 to 71% and maximum recall from 60 to 100%. An average f-score between 3.4 and 88%, an average g-measure of at least 66% across all the dataset, and an average AUC of ROC from 69 to 89%. In addition, models that use externally sourced features outperformed models that use project-specific features on average by a margin of 26–44% in recall, 22–50% in g-measure, 0.4–28% in f-score, and 15–19% in AUC of ROC. Further, our results outperform a state-of-the-art prediction model for security bug reports in all cases. We find using sound statistical and effect size tests that (1) using harvested security keywords as features to train a text classification model improve classification models and generalise to other projects significantly. (2) Including features in the training dataset before model construction improve classification models significantly. (3) Different security categories represent predictors for different projects. Finally, we introduce new and promising approaches to construct models that can generalise across different independent projects. DA - 2021/5/27/ PY - 2021/5/27/ DO - 10.1007/s11219-020-09546-7 SP - SN - 1573-1367 KW - Security KW - Classification model KW - Text classification KW - Software repository KW - Machine learning ER - TY - JOUR TI - COVID-KOP: integrating emerging COVID-19 data with the ROBOKOP database AU - Korn, Daniel AU - Bobrowski, Tesia AU - Li, Michael AU - Kebede, Yaphet AU - Wang, Patrick AU - Owen, Phillips AU - Vaidya, Gaurav AU - Muratov, Eugene AU - Chirkova, Rada AU - Bizon, Chris AU - Tropsha, Alexander T2 - BIOINFORMATICS AB - Abstract Summary In response to the COVID-19 pandemic, we established COVID-KOP, a new knowledgebase integrating the existing Reasoning Over Biomedical Objects linked in Knowledge Oriented Pathways (ROBOKOP) biomedical knowledge graph with information from recent biomedical literature on COVID-19 annotated in the CORD-19 collection. COVID-KOP can be used effectively to generate new hypotheses concerning repurposing of known drugs and clinical drug candidates against COVID-19 by establishing respective confirmatory pathways of drug action. Availability and implementation COVID-KOP is freely accessible at https://covidkop.renci.org/. For code and instructions for the original ROBOKOP, see: https://github.com/NCATS-Gamma/robokop. DA - 2021/2/15/ PY - 2021/2/15/ DO - 10.1093/bioinformatics/btaa718 VL - 37 IS - 4 SP - 586-587 SN - 1460-2059 ER - TY - JOUR TI - Development of a Dissemination Platform for Spatiotemporal and Phylogenetic Analysis of Avian Infectious Bronchitis Virus AU - Jara, Manuel AU - Crespo, Rocio AU - Roberts, David L. AU - Chapman, Ashlyn AU - Banda, Alejandro AU - Machado, Gustavo T2 - FRONTIERS IN VETERINARY SCIENCE AB - Infecting large portions of the global poultry populations, the avian infectious bronchitis virus (IBV) remains a major economic burden in North America. With more than 30 serotypes globally distributed, Arkansas, Connecticut, Delaware, Georgia, and Massachusetts are among the most predominant serotypes in the United States. Even though vaccination is widely used, the high mutation rate exhibited by IBV is continuously triggering the emergence of new viral strains and hindering control and prevention measures. For that reason, targeted strategies based on constantly updated information on the IBV circulation are necessary. Here, we sampled IBV-infected farms from one US state and collected and analyzed 65 genetic sequences coming from three different lineages along with the immunization information of each sampled farm. Phylodynamic analyses showed that IBV dispersal velocity was 12.3 km/year. The majority of IBV infections appeared to have derived from the introduction of the Arkansas DPI serotype, and the Arkansas DPI and Georgia 13 were the predominant serotypes. When analyzed against IBV sequences collected across the United States and deposited in the GenBank database, the most likely viral origin of our sequences was from the states of Alabama, Georgia, and Delaware. Information about vaccination showed that the MILDVAC-MASS+ARK vaccine was applied on 26% of the farms. Using a publicly accessible open-source tool for real-time interactive tracking of pathogen spread and evolution, we analyzed the spatiotemporal spread of IBV and developed an online reporting dashboard. Overall, our work demonstrates how the combination of genetic and spatial information could be used to track the spread and evolution of poultry diseases, providing timely information to the industry. Our results could allow producers and veterinarians to monitor in near-real time the current IBV strain circulating, making it more informative, for example, in vaccination-related decisions. DA - 2021/5/4/ PY - 2021/5/4/ DO - 10.3389/fvets.2021.624233 VL - 8 SP - SN - 2297-1769 KW - infectious bronchitis KW - virus evolution KW - outbreak analytics KW - avian disease KW - evolutionary epidemiology ER - TY - RPRT TI - When SIMPLE is better than complex: A case study on deep learning for predicting Bugzilla issue close time AU - Yedida, R. AU - Yang, X. AU - Menzies, T. DA - 2021/// PY - 2021/// M1 - 2101.06319 M3 - arXiv preprint SN - 2101.06319 ER - TY - JOUR TI - On the Value of Oversampling for Deep Learning in Software Defect Prediction AU - Yedida, Rahul AU - Menzies, Tim T2 - IEEE Transactions on Software Engineering AB - One truism of deep learning is that the automatic feature engineering (seen in the first layers of those networks) excuses data scientists from performing tedious manual feature engineering prior to running DL. For the specific case of deep learning for defect prediction, we show that that truism is false. Specifically, when we pre-process data with a novel oversampling technique called fuzzy sampling, as part of a larger pipeline called GHOST (Goal-oriented Hyper-parameter Optimization for Scalable Training), then we can do significantly better than the prior DL state of the art in 14/20 defect data sets. Our approach yields state-of-the-art results significantly faster deep learners. These results present a cogent case for the use of oversampling prior to applying deep learning on software defect prediction datasets. DA - 2021/// PY - 2021/// DO - 10.1109/TSE.2021.3079841 SP - 1-1 J2 - IIEEE Trans. Software Eng. OP - SN - 0098-5589 1939-3520 2326-3881 UR - http://dx.doi.org/10.1109/TSE.2021.3079841 DB - Crossref KW - Deep learning KW - Tuning KW - Predictive models KW - Standards KW - Prediction algorithms KW - Training KW - Tools KW - Defect prediction KW - oversampling KW - class imbalance KW - neural networks ER - TY - JOUR TI - Simpler Hyperparameter Optimization for Software Analytics: Why, How, When AU - Agrawal, Amritanshu AU - Yang, Xueqi AU - Agrawal, Rishabh AU - Yedida, Rahul AU - Shen, Xipeng AU - Menzies, Tim T2 - IEEE Transactions on Software Engineering AB - How can we make software analytics simpler and faster? One method is to match the complexity of analysis to the intrinsic complexity of the data being explored. For example, hyperparameter optimizers find the control settings for data miners that improve the predictions generated via software analytics. Sometimes, very fast hyperparameter optimization can be achieved by "DODGE-ing"; i.e. simply steering way from settings that lead to similar conclusions. But when is it wise to use that simple approach and when must we use more complex (and much slower) optimizers?} To answer this, we applied hyperparameter optimization to 120 SE data sets that explored bad smell detection, predicting Github issue close time, bug report analysis, defect prediction, and dozens of other non-SE problems. We find that the simple DODGE works best for data sets with low "intrinsic dimensionality" (u ~ 3) and very poorly for higher-dimensional data (u > 8). Nearly all the SE data seen here was intrinsically low-dimensional, indicating that DODGE is applicable for many SE analytics tasks. DA - 2021/// PY - 2021/// DO - 10.1109/TSE.2021.3073242 SP - 1-1 J2 - IIEEE Trans. Software Eng. OP - SN - 0098-5589 1939-3520 2326-3881 UR - http://dx.doi.org/10.1109/TSE.2021.3073242 DB - Crossref KW - Software analytics KW - hyperparameter optimization KW - defect prediction KW - bad smell detection KW - issue close time KW - bug reports ER - TY - JOUR TI - Reuse-centric k-means configuration AU - Zhang, Lijun AU - Guan, Hui AU - Ding, Yufei AU - Shen, Xipeng AU - Krim, Hamid T2 - INFORMATION SYSTEMS AB - K-means configuration is to find a configuration of k-means (e.g., the number of clusters, feature sets) that maximize some objectives. It is a time-consuming process due to the iterative nature of k-means. This paper proposes reuse-centric k-means configuration to accelerate k-means configuration. It is based on the observation that the explorations of different configurations share lots of common or similar computations. Effectively reusing the computations from prior trials of different configurations could largely shorten the configuration time. To materialize the idea, the paper presents a set of novel techniques, including reuse-based filtering, center reuse, and a two-phase design to capitalize on the reuse opportunities on three levels: validation, number of clusters, and feature sets. Experiments on k-means–based data classification tasks show that reuse-centric k-means configuration can speed up a heuristic search-based configuration process by a factor of 5.8, and a uniform search-based attainment of classification error surfaces by a factor of 9.1. The paper meanwhile provides some important insights on how to effectively apply the acceleration techniques to tap into a full potential. DA - 2021/9// PY - 2021/9// DO - 10.1016/j.is.2021.101787 VL - 100 SP - SN - 1873-6076 UR - https://doi.org/10.1016/j.is.2021.101787 KW - K-means KW - Algorithm configuration KW - Computation reuse ER - TY - JOUR TI - Crosstalk-Aware Shared Backup Path Protection in Multi-Core Fiber Elastic Optical Networks AU - Tang, Fengxian AU - Shen, Gangxiang AU - Rouskas, George N. T2 - JOURNAL OF LIGHTWAVE TECHNOLOGY AB - Elastic optical networks employing multi-core fibers (MCF-EON) have the potential to expand significantly the transmission capacity of optical transport. However, wide deployment of such networks depends on addressing effectively two critical challenges: inter-core crosstalk, which may cause serious signal performance degradation in an MCF link, and survivability against network failures that may cause enormous data loss. In this article, we consider the design of MCF-EONs with shared-backup path protection (SBPP), one of the most efficient techniques for protecting network traffic. Specifically, we tackle the crosstalk-aware routing, core, and spectrum assignment (CA-RCSA) problem with the objective of jointly minimizing the network spectrum resources used and the total inter-core crosstalk. We formulate the problem as an integer linear programming (ILP) model subject to strict inter-core crosstalk limits for each provisioned lightpath, and we also propose an auxiliary graph (AG) based heuristic algorithm for lightpath provisioning. Simulation studies show that our algorithm is effective in terms of the objectives, and it is efficient to perform close to the ILP model in small networks, for which solving the ILP is feasible. DA - 2021/5/15/ PY - 2021/5/15/ DO - 10.1109/JLT.2021.3064935 VL - 39 IS - 10 SP - 3025-3036 SN - 1558-2213 UR - https://doi.org/10.1109/JLT.2021.3064935 KW - Crosstalk KW - Optical fiber networks KW - Optical crosstalk KW - Resource management KW - Routing KW - Optical fibers KW - Heuristic algorithms KW - Inter-core crosstalk KW - MCF-EON KW - RCSA KW - SBPP KW - survivability ER - TY - JOUR TI - Co-teaching with an immersive digital game: supporting teacher-game instructional partnerships AU - Mutch-Jones, Karen AU - Boulden, Danielle C. AU - Gasca, Santiago AU - Lord, Trudi AU - Wiebe, Eric AU - Reichsman, Frieda T2 - ETR&D-EDUCATIONAL TECHNOLOGY RESEARCH AND DEVELOPMENT DA - 2021/5/24/ PY - 2021/5/24/ DO - 10.1007/s11423-021-10000-z VL - 5 SP - SN - 1556-6501 KW - Digital games KW - Teachers KW - Instruction KW - Biology KW - Classrooms ER - TY - JOUR TI - Different Kind of Smells: Security Smells in Infrastructure as Code Scripts AU - Rahman, Akond AU - Williams, Laurie T2 - IEEE SECURITY & PRIVACY AB - In this article, we summarize our recent research findings related to infrastructure as code (IaC) scripts, where we have identified 67,801 occurrences of security smells that include 9,175 hard-coded passwords. We hope our work will facilitate awareness among practitioners who use IaC. DA - 2021/// PY - 2021/// DO - 10.1109/MSEC.2021.3065190 VL - 19 IS - 3 SP - 33-41 SN - 1558-4046 ER - TY - JOUR TI - Assessing practitioner beliefs about software engineering AU - Shrikanth, N. C. AU - Nichols, William AU - Fahid, Fahmid Morshed AU - Menzies, Tim T2 - EMPIRICAL SOFTWARE ENGINEERING AB - Software engineering is a highly dynamic discipline. Hence, as times change, so too might our beliefs about core processes in this field. This paper checks some five beliefs that originated in the past decades that comment on the relationships between (i) developer productivity; (ii) software quality and (iii) years of developer experience. Using data collected from 1,356 developers in the period 1995 to 2006, we found support for only one of the five beliefs titled “Quality entails productivity.” We found no clear support for four other beliefs based on programming languages and software developers. However, from the sporadic evidence of the four other beliefs, we learned that a narrow scope could delude practitioners in misinterpreting certain effects to hold in their day-to-day work. Lastly, through an aggregated view of assessing the five beliefs, we find programming languages act as a confounding factor for developer productivity and software quality. Thus the overall message of this work is that it is both important and possible to revisit old beliefs in software engineering. Researchers and practitioners should routinely retest old beliefs. DA - 2021/7// PY - 2021/7// DO - 10.1007/s10664-021-09957-5 VL - 26 IS - 4 SP - SN - 1573-7616 KW - Software analytics KW - Beliefs KW - Productivity KW - Quality KW - Experience ER - TY - JOUR TI - The Impact of Looking Further Ahead: A Comparison of Two Data-driven Unsolicited Hint Types on Performance in an Intelligent Data-driven Logic Tutor AU - Cody, Christa AU - Maniktala, Mehak AU - Lytle, Nicholas AU - Chi, Min AU - Barnes, Tiffany T2 - INTERNATIONAL JOURNAL OF ARTIFICIAL INTELLIGENCE IN EDUCATION AB - Research has shown assistance can provide many benefits to novices lacking the mental models needed for problem solving in a new domain. However, varying approaches to assistance, such as subgoals and next-step hints, have been implemented with mixed results. Next-Step hints are common in data-driven tutors due to their straightforward generation from historical student data, as well as research showing positive impacts on student learning. However, there is a lack of research exploring the possibility of extending data-driven methods to provide higher-level assistance. Therefore, we modified our data-driven Next-Step hint generator to provide Waypoints, hints that are a few steps ahead, representing problem-solving subgoals. We hypothesized that Waypoints would benefit students with high prior knowledge, and that Next-Step hints would most benefit students with lower prior knowledge. In this study, we investigated the influence of data-driven hint type, Waypoints versus Next-Step hints, on student learning in a logic proof tutoring system, Deep Thought, in a discrete mathematics course. We found that Next-Step hints were more beneficial for the majority of students in terms of time, efficiency, and accuracy on the posttest. However, higher totals of successfully used Waypoints were correlated with improvements in efficiency and time in the posttest. These results suggest that Waypoint hints could be beneficial, but more scaffolding may be needed to help students follow them. DA - 2021/5/21/ PY - 2021/5/21/ DO - 10.1007/s40593-021-00237-3 SP - SN - 1560-4306 KW - Tutoring system KW - Hints KW - Assistance KW - Data-driven methods ER - TY - CONF TI - The Virtual Pivot: Transitioning Computational Thinking PD for Middle and High School Content Area Teachers AU - Jocius, R. AU - Joshi, D. AU - Albert, J. AU - Barnes, T. AU - Robinson, R. AU - Cateté, V. AU - Dong, Y. AU - Blanton, M. AU - O’Byrne, I. AU - Andrews, A. AB - In 2018 and 2019, Infusing Computing offered face-to-face summer PD workshops to support middle and high school teachers in integrating computational thinking into their classrooms through week-long summer PD workshops and academic-year support. Due to COVID-19, 151 teachers attended the Summer 2020 PD workshops in a week-long virtual conference format. In this paper, we describe Virtual Pivot: Infusing Computing, which employed emerging technology tools, pre-PD training, synchronous and asynchronous sessions, Snap! pair programming, live support, and live networking. Drawing on findings from participant interviews and post-PD surveys, we argue that three categories of changes (digital tools, formats, and supports for teacher engagement and collaboration) were effective in increasing participants' self-efficacy in teaching CT, supporting collaboration, and enabling participants to design CT-infused content-area lessons. We conclude by discussing how elements of this virtual PD can be replicated to increase teacher and student access to CT practices in middle and high school classrooms C2 - 2021/// C3 - SIGCSE 2021 - Proceedings of the 52nd ACM Technical Symposium on Computer Science Education DA - 2021/// DO - 10.1145/3408877.3432558 SP - 1198-1204 UR - http://www.scopus.com/inward/record.url?eid=2-s2.0-85103319028&partnerID=MN8TOARS ER - TY - CONF TI - The Design and Implementation of a Method for Evaluating and Building Research Practice Partnerships AU - Rorrer, A. AU - Pugalee, D. AU - Edwards, C. AU - Boulden, D. AU - Maher, M.L. AU - Cao, L. AU - Dorodchi, M. AU - Catete, V. AU - Frye, D. AU - Barnes, T. AU - Wiebe, E. AB - We have established a research-practice partnership (RPP) to build a computer science (CS) and computational thinking (CT)-focused STEM ecosystem at two middle schools. Creating such an ecosystem to broaden student participation in computing through an RPP approach involves all stakeholders in the research process. Borrowing upon visual participatory research methods, we developed a graphic research instrument to engage teachers in the research process and elicit their perspectives on strategies for building the ecosystem. This experience report describes our research methodology across two distinct cases to demonstrate the utility of this drawing activity as an investigative and partnership development tool. The contribution is in offering a flexible approach to other university-based RPP teams that enables a synergistic partnership development tool and data collection instrument that can be tailored to a variety of RPP contexts, facilitating more productive and equitable ways of engaging stakeholders in the research process. We describe our project contexts and share results from the pilot study with practitioner-members of our RPP teams. We discuss two cases to highlight the contribution this approach made to the development of our partnerships. C2 - 2021/// C3 - SIGCSE 2021 - Proceedings of the 52nd ACM Technical Symposium on Computer Science Education DA - 2021/// DO - 10.1145/3408877.3432532 SP - 753-759 UR - http://www.scopus.com/inward/record.url?eid=2-s2.0-85103327758&partnerID=MN8TOARS ER - TY - CONF TI - PlanIT! A New Integrated Tool to Help Novices Design for Open-ended Projects AU - Milliken, Alexandra AU - Wang, Wengran AU - Cateté, Veronica AU - Martin, Sarah AU - Gomes, Neeloy AU - Dong, Yihuan AU - Harred, Rachel AU - Isvik, Amy AU - Barnes, Tiffany AU - Price, Thomas AU - Martens, Chris AB - Project-based learning can encourage and motivate students to learn through exploring their own interests, but introduces special challenges for novice programmers. Recent research has shown that novice students perceive themselves to be "bad at programming, especially when they do not know how to start writing a program, or need to create a plan before getting started. In this paper, we present PlanIT, a guided planning tool integrated with the Snap! programming environment designed to help novices plan and program their open-ended projects. Within PlanIT, students can add a description for their project, use a to do list to help break down the steps of implementation, plan important elements of their program including actors, variables, and events, and view related example projects. We report findings from a pilot study of high school students using PlanIT, showing that students who used the tool learned to make more specific and actionable plans. Results from student interviews show they appreciate the guidance that PlanIT provides, as well as the affordances it offers to more quickly create program elements. C2 - 2021/3/3/ C3 - Proceedings of the 52nd ACM Technical Symposium on Computer Science Education DA - 2021/3/3/ DO - 10.1145/3408877.3432552 SP - 232-238 PB - ACM UR - http://dx.doi.org/10.1145/3408877.3432552 ER - TY - JOUR TI - Understanding static code warnings: An incremental AI approach AU - Yang, Xueqi AU - Yu, Zhe AU - Wang, Junjie AU - Menzies, Tim T2 - EXPERT SYSTEMS WITH APPLICATIONS AB - Knowledge-based systems reason over some knowledge base. Hence, an important issue for such systems is how to acquire the knowledge needed for their inference. This paper assesses active learning methods for acquiring knowledge for “static code warnings”. Static code analysis is a widely-used method for detecting bugs and security vulnerabilities in software systems. As software becomes more complex, analysis tools also report lists of increasingly complex warnings that developers need to address on a daily basis. Such static code analysis tools are usually over-cautious; i.e. they often offer many warnings about spurious issues. Previous research work shows that about 35% to 91 % warnings reported as bugs by SA tools are actually unactionable (i.e., warnings that would not be acted on by developers because they are falsely suggested as bugs). Experienced developers know which errors are important and which can be safely ignored. How can we capture that experience? This paper reports on an incremental AI tool that watches humans reading false alarm reports. Using an incremental support vector machine mechanism, this AI tool can quickly learn to distinguish spurious false alarms from more serious matters that deserve further attention. In this work, nine open-source projects are employed to evaluate our proposed model on the features extracted by previous researchers and identify the actionable warnings in a priority order given by our algorithm. We observe that our model can identify over 90% of actionable warnings when our methods tell humans to ignore 70 to 80% of the warnings. DA - 2021/4/1/ PY - 2021/4/1/ DO - 10.1016/j.eswa.2020.114134 VL - 167 SP - SN - 1873-6793 KW - Actionable warning identification KW - Active learning KW - Static analysis KW - Selection process ER - TY - JOUR TI - A Machine Learning Based Ensemble Forecasting Optimization Algorithm for Preseason Prediction of Atlantic Hurricane Activity AU - Sun, Xia AU - Xie, Lian AU - Shah, Shahil Umeshkumar AU - Shen, Xipeng T2 - ATMOSPHERE AB - In this study, nine different statistical models are constructed using different combinations of predictors, including models with and without projected predictors. Multiple machine learning (ML) techniques are employed to optimize the ensemble predictions by selecting the top performing ensemble members and determining the weights for each ensemble member. The ML-Optimized Ensemble (ML-OE) forecasts are evaluated against the Simple-Averaging Ensemble (SAE) forecasts. The results show that for the response variables that are predicted with significant skill by individual ensemble members and SAE, such as Atlantic tropical cyclone counts, the performance of SAE is comparable to the best ML-OE results. However, for response variables that are poorly modeled by individual ensemble members, such as Atlantic and Gulf of Mexico major hurricane counts, ML-OE predictions often show higher skill score than individual model forecasts and the SAE predictions. However, neither SAE nor ML-OE was able to improve the forecasts of the response variables when all models show consistent bias. The results also show that increasing the number of ensemble members does not necessarily lead to better ensemble forecasts. The best ensemble forecasts are from the optimally combined subset of models. DA - 2021/4// PY - 2021/4// DO - 10.3390/atmos12040522 VL - 12 IS - 4 SP - SN - 2073-4433 KW - hurricane prediction KW - machine learning KW - ensemble model ER - TY - JOUR TI - Learning to recognize actionable static code warnings (is intrinsically easy) AU - Yang, Xueqi AU - Chen, Jianfeng AU - Yedida, Rahul AU - Yu, Zhe AU - Menzies, Tim T2 - EMPIRICAL SOFTWARE ENGINEERING AB - Static code warning tools often generate warnings that programmers ignore. Such tools can be made more useful via data mining algorithms that select the “actionable” warnings; i.e. the warnings that are usually not ignored. In this paper, we look for actionable warnings within a sample of 5,675 actionable warnings seen in 31,058 static code warnings from FindBugs. We find that data mining algorithms can find actionable warnings with remarkable ease. Specifically, a range of data mining methods (deep learners, random forests, decision tree learners, and support vector machines) all achieved very good results (recalls and AUC(TRN, TPR) measures usually over 95% and false alarms usually under 5%). Given that all these learners succeeded so easily, it is appropriate to ask if there is something about this task that is inherently easy. We report that while our data sets have up to 58 raw features, those features can be approximated by less than two underlying dimensions. For such intrinsically simple data, many different kinds of learners can generate useful models with similar performance. Based on the above, we conclude that learning to recognize actionable static code warnings is easy, using a wide range of learning algorithms, since the underlying data is intrinsically simple. If we had to pick one particular learner for this task, we would suggest linear SVMs (since, at least in our sample, that learner ran relatively quickly and achieved the best median performance) and we would not recommend deep learning (since this data is intrinsically very simple). DA - 2021/5// PY - 2021/5// DO - 10.1007/s10664-021-09948-6 VL - 26 IS - 3 SP - SN - 1573-7616 UR - https://doi.org/10.1007/s10664-021-09948-6 KW - Static code analysis KW - Actionable warnings KW - Deep learning KW - Linear SVM KW - Intrinsic dimensionality ER - TY - JOUR TI - Bungie: Improving Fault Tolerance via Extensible Application-Level Protocols AU - Christie, Samuel H. AU - Chopra, Amit Khushwant AU - Singh, Munindar P. T2 - COMPUTER AB - We present Bungie, an approach based on applicationlevel protocols that precisely capture the causality inherent to the interactions among agents. We show through patterns and examples how Bungie provides abstractions for achieving fault tolerance. DA - 2021/5// PY - 2021/5// DO - 10.1109/MC.2021.3052147 VL - 54 IS - 5 SP - 44-53 SN - 1558-0814 UR - https://doi.org/10.1109/MC.2021.3052147 KW - Fault tolerance KW - Protocols KW - Fault tolerant systems ER - TY - JOUR TI - REDEVELOPING A DIGITAL SEXUAL HEALTH INTERVENTION FOR ADOLESCENTS TO ALLOW FOR BROADER DISSEMINATION: IMPLICATIONS FOR HIV AND STD PREVENTION AU - Javidi, Hannah AU - Widman, Laura AU - Lipsey, Nikolette AU - Brasileiro, Julia AU - Javidi, Farhad AU - Jhala, Arnav T2 - AIDS EDUCATION AND PREVENTION AB - HIV/STDs and unintended pregnancy persist among adolescents in the United States; thus, effective sexual health interventions that can be broadly disseminated are necessary. Digital health interventions are highly promising because they allow for customization and widespread reach. The current project involved redeveloping and expanding HEART (Health Education and Relationship Training)—a brief, digital sexual health intervention efficacious at improving safer sex knowledge, self-efficacy, and behavior—onto an open-source platform to allow for greater interactivity and accessibility while reducing long-term program costs. The authors describe the process of adapting, reprogramming, and evaluating the new program, which may serve as a guide for investigators seeking to adapt behavioral interventions onto digital platforms. The final product is an open-source intervention that can be easily adapted for new populations. Among 233 adolescents (M age = 15.06; 64% girls), HEART was highly acceptable and generally feasible to administer, with no differences in acceptability by gender or sexual identity. DA - 2021/4// PY - 2021/4// DO - 10.1521/aeap.2021.33.2.89 VL - 33 IS - 2 SP - 89-102 SN - 1943-2755 KW - adolescent sexual health KW - digital health intervention KW - development KW - program adaptation KW - implementation science KW - evaluation ER - TY - JOUR TI - Atlantic Hurricane Activity Prediction: A Machine Learning Approach AU - Asthana, Tanmay AU - Krim, Hamid AU - Sun, Xia AU - Roheda, Siddharth AU - Xie, Lian T2 - ATMOSPHERE AB - Long-term hurricane predictions have been of acute interest in order to protect the community from the loss of lives, and environmental damage. Such predictions help by providing an early warning guidance for any proper precaution and planning. In this paper, we present a machine learning model capable of making good preseason-prediction of Atlantic hurricane activity. The development of this model entails a judicious and non-linear fusion of various data modalities such as sea-level pressure (SLP), sea surface temperature (SST), and wind. A Convolutional Neural Network (CNN) was utilized as a feature extractor for each data modality. This is followed by a feature level fusion to achieve a proper inference. This highly non-linear model was further shown to have the potential to make skillful predictions up to 18 months in advance. DA - 2021/4// PY - 2021/4// DO - 10.3390/atmos12040455 VL - 12 IS - 4 SP - SN - 2073-4433 KW - hurricanes KW - tropical cyclones KW - fusion networks KW - weather forecast ER - TY - JOUR TI - How to Better Distinguish Security Bug Reports (Using Dual Hyperparameter Optimization) AU - Shu, Rui AU - Xia, Tianpei AU - Chen, Jianfeng AU - Williams, Laurie AU - Menzies, Tim T2 - EMPIRICAL SOFTWARE ENGINEERING DA - 2021/5// PY - 2021/5// DO - 10.1007/s10664-020-09906-8 VL - 26 IS - 3 SP - SN - 1573-7616 UR - https://doi.org/10.1007/s10664-020-09906-8 KW - Hyperparameter Optimization KW - Data pre-processing KW - Security bug report ER - TY - JOUR TI - AERPAW emulation overview and preliminary performance evaluation AU - Panicker, Ashwin AU - Ozdemir, Ozgur AU - Sichitiu, Mihail L. AU - Guvenc, Ismail AU - Dutta, Rudra AU - Marojevic, Vuk AU - Floyd, Brian T2 - COMPUTER NETWORKS AB - The Aerial Experimentation and Research Platform for Advanced Wireless (AERPAW) has been recently funded by the National Science Foundation (NSF)’s Platforms for Advanced Wireless Research (PAWR) program. The AERPAW platform will enable experiments with programmable radios and programmable unmanned aerial vehicles (UAVs), conducted in a safe and repeatable manner. Several architectural components are crucial for enabling the envisioned capabilities of the testbed. In this paper, after providing a high level overview of AERPAW, we first present the emulation design of AERPAW vehicles. Subsequently, we describe various different options for wireless channel emulation in AERPAW. We start with a generalized model for wireless emulation, and expand that model to packet-level emulation, I-Q level emulation, and radio-frequency (RF)-level emulation. A discussion on the trade-offs among these various different emulation possibilities is also provided. DA - 2021/7/20/ PY - 2021/7/20/ DO - 10.1016/j.comnet.2021.108083 VL - 194 SP - SN - 1872-7069 UR - https://doi.org/10.1016/j.comnet.2021.108083 KW - AERPAW KW - Emulation KW - UAV ER - TY - JOUR TI - An Empirical Study on Type Annotations: Accuracy, Speed, and Suggestion Effectiveness AU - Ore, John-Paul AU - Detweiler, Carrick AU - Elbaum, Sebastian T2 - ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY AB - Type annotations connect variables to domain-specific types. They enable the power of type checking and can detect faults early. In practice, type annotations have a reputation of being burdensome to developers. We lack, however, an empirical understanding of how and why they are burdensome. Hence, we seek to measure the baseline accuracy and speed for developers making type annotations to previously unseen code. We also study the impact of one or more type suggestions. We conduct an empirical study of 97 developers using 20 randomly selected code artifacts from the robotics domain containing physical unit types. We find that subjects select the correct physical type with just 51% accuracy, and a single correct annotation takes about 2 minutes on average. Showing subjects a single suggestion has a strong and significant impact on accuracy both when correct and incorrect, while showing three suggestions retains the significant benefits without the negative effects. We also find that suggestions do not come with a time penalty. We require subjects to explain their annotation choices, and we qualitatively analyze their explanations. We find that identifier names and reasoning about code operations are the primary clues for selecting a type. We also examine two state-of-the-art automated type annotation systems and find opportunities for their improvement. DA - 2021/3// PY - 2021/3// DO - 10.1145/3439775 VL - 30 IS - 2 SP - SN - 1557-7392 KW - Type checking KW - automated static analysis KW - software reliability KW - annotations KW - program analysis KW - dimensional analysis KW - physical units KW - robotic systems ER - TY - JOUR TI - Measuring in-service teacher self-efficacy for teaching computational thinking: development and validation of the T-STEM CT AU - Boulden, Danielle Cadieux AU - Rachmatullah, Arif AU - Oliver, Kevin M. AU - Wiebe, Eric T2 - EDUCATION AND INFORMATION TECHNOLOGIES DA - 2021/7// PY - 2021/7// DO - 10.1007/s10639-021-10487-2 VL - 26 IS - 4 SP - 4663-4689 SN - 1573-7608 KW - Computational thinking KW - K-12 teaching KW - Self-efficacy KW - Construct validation KW - Rasch analysis KW - Reliability ER - TY - JOUR TI - Shockingly Simple: "Keys" for Better AI for SE AU - Menzies, Tim T2 - IEEE SOFTWARE AB - As 2020 drew to a close, I was thinking about what lessons we have learned about software engineering (SE) for artificial intelligence (AI)-things that we can believe now but, in the last century, would have seemed somewhat shocking. One very surprising lesson, at least for me, is the success of the very complex and very simple. At the complex end, there is now much evidence for the value of deep learners for high-dimensional software engineering problems. For example, consider signal processing for autonomous cars. When reasoning over (say) 10,000 wavelets collected from a vision system, then deep learning can automate much of the engineering required to cover all those data. DA - 2021/// PY - 2021/// DO - 10.1109/MS.2020.3043014 VL - 38 IS - 2 SP - 114-118 SN - 1937-4194 UR - https://doi.org/10.1109/MS.2020.3043014 ER - TY - JOUR TI - Perceptual metric learning for video anomaly detection AU - Ramachandra, Bharathkumar AU - Jones, Michael AU - Vatsavai, Ranga Raju T2 - MACHINE VISION AND APPLICATIONS DA - 2021/5// PY - 2021/5// DO - 10.1007/s00138-021-01187-5 VL - 32 IS - 3 SP - SN - 1432-1769 KW - Video anomaly detection KW - Metric learning KW - Video surveillance KW - Siamese neural networks ER - TY - CONF TI - Combining Theory and Practice in Data Structures & Algorithms Course Projects: An Experience Report AU - King, Jason AB - CS2 course projects can often be too prescriptive by telling students which algorithms or data structures are necessary to efficiently solve a given problem. Students may not fully understand why these algorithms or data structures were chosen, and they may not have an opportunity to empirically observe the impact of such design decisions. In 2019, we redesigned our CS2 course projects at North Carolina State University to help demonstrate the importance of critical-thinking and analysis activities when developing software. The objective of the redesigned course project is to connect computer science theory with software development practice by incorporating algorithm design and analysis, data structure selection, and experimental analysis as part of the software development lifecycle. For the project, students first create a design proposal that incorporates algorithm design and analysis, data structure selection, and other software design tasks. Next, students implement and test their software, while teaching staff test cases impose grade penalties for incorrect and/or inefficient implementations. Finally, students perform experimental analysis to empirically observe performance differences when using different data structures. Between January 2019 and May 2020, we collected end-of-course survey responses from 202 out of 536 students (37.7% response rate). Overall, 90.6% of respondents indicated that the project helped with understanding the importance of analysis and design when developing software. In this paper, we discuss our project activities in detail, along with lessons learned and suggestions for adopting similar projects in other CS2 courses. C2 - 2021/// C3 - Proceedings of the 52nd ACM Technical Symposium on Computer Science Education CY - New York, NY, USA DA - 2021/// DO - 10.1145/3408877.3432476 SP - 959–965 PB - Association for Computing Machinery UR - https://doi.org/10.1145/3408877.3432476 ER - TY - CONF TI - STARS Ignite: A Program for Supporting Professors in Organizing Student Cohorts for Conferences AU - Isvik, A. AU - Barnes, T. AU - Payton, J. AU - Catete, V. AU - Battestilli, L. T2 - 52nd ACM Technical Symposium on Computer Science Education AB - Academic computing departments are seeking ways to broaden participation, and many are encouraging individual faculty, staff, and students to attend diversity-oriented conferences, like the Tapia and STARS Celebrations of Diversity in Computing and Grace Hopper Celebration of Women in Computing. Such conferences present opportunities to meet a broader community of people for professional development and networking, to be inspired by leaders in computing, and to celebrate diversity. However, while many institutions sponsor these conferences and support individual student attendance, students may not know how to leverage these opportunities effectively. We argue that leading a cohort of faculty/staff and students who attend a conference with the shared goal of broadening participation can provide lasting benefits for computing departments. This workshop will prepare faculty and staff to recruit and lead a team of students and leverage conference attendance to ignite broadening participation efforts. Through a hands-on collaborative process, the workshop provides the knowledge and tools needed to successfully lead a cohort, and helps attendees tailor the provided tools to their local strengths and needs to broaden participation in computing. C2 - 2021/// C3 - SIGCSE '21: Proceedings of the 52nd ACM Technical Symposium on Computer Science Education DA - 2021/// PY - 2021/3// DO - 10.1145/3408877.3432502 SP - 1349 PB - ACM SN - 9781450380621 ER - TY - CONF TI - Strategies for Authentic Assessments of Mastery in CS Courses AU - Lin, K. AU - Battestilli, L. AU - Ball, M. T2 - 52nd ACM Technical Symposium on Computer Science Education AB - Assessing student mastery is an increasingly important aspect of a computer science (CS) course. Recent discussions in the SIGCSE community have questioned traditional assessment and grading practices, such as the use of high-stakes exams and standardized programming assignments. As an alternative, authentic assessments of mastery have been proposed with the goal of creating more equitable and inclusive classrooms that support a diversity in student discourses and epistemologies. This Birds-of-a-Feather session will provide a forum for conversations around assessment of student mastery. Although conversations will likely draw on experiences from teaching remote courses, the discussions can also inspire assessment ideas and methods that work for in-person instruction as well. The discussion leaders will begin by sharing their experiences using formative, low-stakes quizzes; two-stage individual and group assessments; student-generated video problem solutions; written research papers; and creative projects. For each assessment, the discussion leaders expect to address questions such as: What were the goals? What classes was it used in? How did we grade it? How does it scale? This session is a space for participants to expand our collective understanding of how authentic assessments can be used in CS courses and share ideas to inform research and practice toward grading for equity. Afterward, discussion notes will be compiled and publicly archived at https://kevinl.info/authentic-assessments C2 - 2021/// C3 - SIGCSE '21: Proceedings of the 52nd ACM Technical Symposium on Computer Science Education DA - 2021/// PY - 2021/3// DO - 10.1145/3408877.3439504 SP - 1361 PB - ACM SN - 9781450380621 ER - TY - CONF TI - Finding Video-watching Behavior Patterns in a Flipped CS1 Course AU - Moore, C. AU - Battestilli, L. AU - Dominguez, I. T2 - 52nd ACM Technical Symposium on Computer Science Education AB - Flipped courses often rely on pre-recorded videos that students are expected to watch before in-class time with the instructor. In this study, we investigated the video-watching behavior of students in a flipped CS1 programming course (n=490). We computed three behavioral metrics related to video watching: percentage of the videos watched, the number of times a video was opened to be watched, and when a video is watched with respect to the due date. We used k-medoids clustering on these metrics finding two distinct groups: 1) Low Video Engagement Group (53% of the students) watched 12% of the videos and 2) High Video Engagement Group (47% of the students) watched 75% of the videos. Analysis of these two different groups of engagement showed that students with prior programming experience watch fewer videos. We also found that students that watch more videos perform slightly better on summative assessments in the course. We discuss how regular video watching can be a key learning strategy for some but not all students in a flipped CS1 course, where some students can achieve good learning outcomes with minimal watching of the course videos. C2 - 2021/// C3 - SIGCSE '21: Proceedings of the 52nd ACM Technical Symposium on Computer Science Education DA - 2021/// PY - 2021/3// DO - 10.1145/3408877.3432359 SP - 768-774 PB - ACM SN - 9781450380621 ER - TY - CONF TI - Increasing Women’s Persistence in Computer Science by Decreasing Gendered Self-Assessments of Computing Ability AU - Fisk, S. AU - Stolee, K. AU - Battestilli, L. T2 - 2021 ACM Conference on Innovation and Technology in Computer Science Education. C2 - 2021/// C3 - Proceedings of the 2021 ACM Conference on Innovation and Technology in Computer Science Education. DA - 2021/// PY - 2021/// PB - ACM ER - TY - JOUR TI - Detecting Framing Changes in Topical News AU - Sheshadri, Karthik AU - Shivade, Chaitanya AU - Singh, Munindar P. T2 - IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS AB - Changes in the framing of topical news are known to foreshadow significant public, legislative, and commercial events. Automated detection of framing changes is, therefore, an important problem, which existing research has not considered. Previous approaches are manual surveys that rely on human effort and are consequently limited in scope. This article systematizes the discovery of framing changes through a fully unsupervised computational method that seeks to isolate framing change trends over several years. We demonstrate our approach by isolating framing change periods that correlate with previously known framing changes. We have prepared a new data set, consisting of over 12 000 articles from seven news topics or domains, in which earlier surveys have found framing changes. Finally, our work highlights the predictive utility of framing change detection, by identifying two domains in which framing changes foreshadowed substantial legislative activity, or preceded judicial interest. DA - 2021/6// PY - 2021/6// DO - 10.1109/TCSS.2021.3063108 VL - 8 IS - 3 SP - 780-791 SN - 2329-924X UR - https://doi.org/10.1109/TCSS.2021.3063108 KW - Obesity KW - Market research KW - Benchmark testing KW - Media KW - Current measurement KW - Standards KW - Sociology KW - Framing KW - news media ER - TY - JOUR TI - Development and validation of the teachers' digital learning identity survey AU - Zimmer, Wendi K. AU - McTigue, Erin M. AU - Matsuda, Noboru T2 - INTERNATIONAL JOURNAL OF EDUCATIONAL RESEARCH AB - Research emphasizes teachers’ attitudes and methods towards classroom digital literacy (DL) integration with minimal studies accenting teachers’ attitudes towards personal DL use. Specifically, recognizing how teachers use DL to learn (i.e., their digital learning identity (DLI) — the identity developed from perceived DL competence). The Digital Learning Identity Survey (DLIS) was created to assist teachers in self-identification and recognition of their learning identity related to DL. This study investigates the reliability and validity of the DLIS with pre-service teachers using exploratory and confirmatory factor analyses. A correlation analysis was conducted to determine if survey items correlated logically and aligned with existing theory, including elements needed for digital identity development – DL, self-regulated learning, and motivation. Results found aspects of the DLIS validly measure DLI. DA - 2021/// PY - 2021/// DO - 10.1016/j.ijer.2020.101717 VL - 105 SP - SN - 1873-538X KW - Improving classroom teaching KW - 21st century abilities KW - Teacher professional development KW - Lifelong learning ER - TY - JOUR TI - Dynamic Graph Learning: A Structure-Driven Approach AU - Jiang, Bo AU - Huang, Yuming AU - Panahi, Ashkan AU - Yu, Yiyi AU - Krim, Hamid AU - Smith, Spencer L. T2 - MATHEMATICS AB - The purpose of this paper is to infer a dynamic graph as a global (collective) model of time-varying measurements at a set of network nodes. This model captures both pairwise as well as higher order interactions (i.e., more than two nodes) among the nodes. The motivation of this work lies in the search for a connectome model which properly captures brain functionality across all regions of the brain, and possibly at individual neurons. We formulate it as an optimization problem, a quadratic objective functional and tensor information of observed node signals over short time intervals. The proper regularization constraints reflect the graph smoothness and other dynamics involving the underlying graph’s Laplacian, as well as the time evolution smoothness of the underlying graph. The resulting joint optimization is solved by a continuous relaxation of the weight parameters and an introduced novel gradient-projection scheme. While the work may be applicable to any time-evolving data set (e.g., fMRI), we apply our algorithm to a real-world dataset comprising recorded activities of individual brain cells. The resulting model is shown to be not only viable but also efficiently computable. DA - 2021/1// PY - 2021/1// DO - 10.3390/math9020168 VL - 9 IS - 2 SP - SN - 2227-7390 KW - dynamic graph learning KW - graph signal processing KW - sparse signal KW - convex optimization ER - TY - JOUR TI - Two-Computer Pair Programming: Exploring a Feedback Intervention to improve Collaborative Talk in Elementary Students. AU - Zakaria, Zarifa AU - Vandenberg, Jessica AU - Tsan, Jennifer AU - Boulden, Danielle AU - Lynch, Collin F. AU - Boyer, Kristy Elizabeth AU - Wiebe, Eric T2 - COMPUTER SCIENCE EDUCATION AB - Background and Context: Researchers and practitioners have begun to incorporate collaboration in programming because of its reported instructional and professional benefits. However, younger students need guidance on how to collaborate in environments that require substantial interpersonal interaction and negotiation. Previous research indicates that feedback fosters students’ productive collaboration.Objective: This study employs an intervention to explore the role instructor-directed feedback plays on elementary students’ dyadic collaboration during 2-computer pair programming.Method: We used a multi-study design, collecting video data on students’ dyadic collaboration. Study 1 qualitatively explored dyadic collaboration by coding video transcripts of four dyads which guided the design of Study 2 that examined conversation of six dyads using MANOVA and non-parametric tests.Findings: Result from Study 2 showed that students receiving feedback used productive conversation categories significantly higher than the control condition in the sample group considered. Results are discussed in terms of group differences in specific conversation categories.Implications: Our study highlights ways to support students in pair programming contexts so that they can maximize the benefits afforded through these experiences. DA - 2021/// PY - 2021/// DO - 10.1080/08993408.2021.1877987 KW - Pair programming KW - collaboration KW - elementary school KW - feedback KW - intervention ER - TY - JOUR TI - Security Smells in Ansible and Chef Scripts: A Replication Study AU - Rahman, Akond AU - Rahman, Md Rayhanur AU - Parnin, Chris AU - Williams, Laurie T2 - ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY AB - Context: Security smells are recurring coding patterns that are indicative of security weakness and require further inspection. As infrastructure as code (IaC) scripts, such as Ansible and Chef scripts, are used to provision cloud-based servers and systems at scale, security smells in IaC scripts could be used to enable malicious users to exploit vulnerabilities in the provisioned systems. Goal: The goal of this article is to help practitioners avoid insecure coding practices while developing infrastructure as code scripts through an empirical study of security smells in Ansible and Chef scripts. Methodology: We conduct a replication study where we apply qualitative analysis with 1,956 IaC scripts to identify security smells for IaC scripts written in two languages: Ansible and Chef. We construct a static analysis tool called Security Linter for Ansible and Chef scripts (SLAC) to automatically identify security smells in 50,323 scripts collected from 813 open source software repositories. We also submit bug reports for 1,000 randomly selected smell occurrences. Results: We identify two security smells not reported in prior work: missing default in case statement and no integrity check. By applying SLAC we identify 46,600 occurrences of security smells that include 7,849 hard-coded passwords. We observe agreement for 65 of the responded 94 bug reports, which suggests the relevance of security smells for Ansible and Chef scripts amongst practitioners. Conclusion: We observe security smells to be prevalent in Ansible and Chef scripts, similarly to that of the Puppet scripts. We recommend practitioners to rigorously inspect the presence of the identified security smells in Ansible and Chef scripts using (i) code review, and (ii) static analysis tools. DA - 2021/1// PY - 2021/1// DO - 10.1145/3408897 VL - 30 IS - 1 ER - TY - JOUR TI - WiFi based Multi-User Gesture Recognition AU - Venkatnarayan, Raghav H. AU - Mahmood, Shakir AU - Shahzad, Muhammad T2 - IEEE TRANSACTIONS ON MOBILE COMPUTING AB - WiFi based gesture recognition has received significant attention overthe past few years. However, the key limitation of prior WiFi based gesture recognition systems is that they cannot recognize the gestures of multiple users performing them simultaneously. In this article, we address this limitation and propose WiMU, a WiFi based Multi-User gesture recognition system. The key idea behind WiMU is that when it detects that some users have performed some gestures simultaneously, it first automatically determines the number of simultaneously performed gestures (Na) and then, using the training samples collected from a single user, generates virtual samples for various plausible combinations of Na gestures. The key property of these virtual samples is that the virtual samples for any given combination of gestures are identical to the real samples that would result from real users performing that combination of gestures. WiMU compares the detected sample against these virtual samples and recognizes the simultaneously performed gestures. We implemented and extensively evaluated WiMU using commodity WiFi devices. Our results show that WiMU recognizes 2, 3, 4, 5, 6, 7, and 8 simultaneously performed gestures with accuracies of 95.6, 94.9, 93.9, 92.7, 91.6, 91.0, and 90.1 percent, respectively. DA - 2021/3/1/ PY - 2021/3/1/ DO - 10.1109/TMC.2019.2954891 VL - 20 IS - 3 SP - 1242-1256 SN - 1558-0660 KW - Wireless fidelity KW - Gesture recognition KW - Training KW - Performance evaluation KW - Wireless communication KW - Wireless sensor networks ER - TY - JOUR TI - Foreword AU - Kaltofen, Erich L. T2 - JOURNAL OF SYMBOLIC COMPUTATION AB - We consider the problem of computing the nearest matrix polynomial with a non-trivial Smith Normal Form. We show that computing the Smith form of a matrix polynomial is amenable to numeric computation as an optimization problem. Furthermore, we describe an effective optimization technique to find a nearby matrix polynomial with a non-trivial Smith form. The results are then generalized to include the computation of a matrix polynomial having a maximum specified number of ones in the Smith Form (i.e., with a maximum specified McCoy rank).We discuss the geometry and existence of solutions and how our results can be used for an error analysis. We develop an optimization-based approach and demonstrate an iterative numerical method for computing a nearby matrix polynomial with the desired spectral properties. We also describe an implementation of our algorithms and demonstrate the robustness with examples in Maple. DA - 2021/// PY - 2021/// DO - 10.1016/j.jsc.2020.04.006 VL - 105 SP - 1-3 SN - 1095-855X ER - TY - JOUR TI - Modeling Secondary Students' Genetics Learning in a Game-Based Environment: Integrating the Expectancy-Value Theory of Achievement Motivation and Flow Theory AU - Rachmatullah, Arif AU - Reichsman, Frieda AU - Lord, Trudi AU - Dorsey, Chad AU - Mott, Bradford AU - Lester, James AU - Wiebe, Eric T2 - JOURNAL OF SCIENCE EDUCATION AND TECHNOLOGY DA - 2021/8// PY - 2021/8// DO - 10.1007/s10956-020-09896-8 VL - 30 IS - 4 SP - 511-528 SN - 1573-1839 KW - Outcome-expectancy KW - Flow theory KW - Game-based learning KW - Genetics KW - Self-efficacy ER - TY - JOUR TI - Software development with feature toggles: practices used by practitioners AU - Mahdavi-Hezaveh, Rezvan AU - Dremann, Jacob AU - Williams, Laurie T2 - EMPIRICAL SOFTWARE ENGINEERING AB - Background: Using feature toggles is a technique that allows developers to either turn a feature on or off with a variable in a conditional statement. Feature toggles are increasingly used by software companies to facilitate continuous integration and continuous delivery. However, using feature toggles inappropriately may cause problems which can have a severe impact, such as code complexity, dead code, and system failure. For example, the erroneous repurposing of an old feature toggle caused Knight Capital Group, an American global financial services firm, to go bankrupt due to the implications of the resultant incorrect system behavior. Aim: The goal of this research project is to aid software practitioners in the use of practices to support software development with feature toggles through an empirical study of feature toggle practice usage by practitioners. Method: We conducted a qualitative analysis of 99 artifacts from the grey literature and 10 peer-reviewed papers about feature toggles. We conducted a survey of practitioners from 38 companies. Results: We identified 17 practices in 4 categories: Management practices, Initialization practices, Implementation practices, and Clean-up practices. We observed that all of the survey respondents use a dedicated tool to create and manage feature toggles in their code. Documenting feature toggle's metadata, setting up the default value for feature toggles, and logging the changes made on feature toggles are also frequently-observed practices. Conclusions: The feature toggle development practices discovered and enumerated in this work can help practitioners more effectively use feature toggles. This work can enable future mining of code repositories to automatically identify feature toggle practices. DA - 2021/1/8/ PY - 2021/1/8/ DO - 10.1007/s10664-020-09901-z VL - 26 IS - 1 SP - SN - 1573-7616 KW - Continuous integration KW - Continuous delivery KW - Feature toggle KW - Practice ER - TY - JOUR TI - Robust Multi-Modal Sensor Fusion: An Adversarial Approach AU - Roheda, Siddharth AU - Krim, Hamid AU - Riggan, Benjamin S. T2 - IEEE SENSORS JOURNAL AB - In recent years, multi-modal fusion has attracted a lot of research interest, both in academia, and in industry. Multimodal fusion entails the combination of information from a set of different types of sensors. Exploiting complementary information from different sensors, we show that target detection and classification problems can greatly benefit from this fusion approach and result in a performance increase. To achieve this gain, the information fusion from various sensors is shown to require some principled strategy to ensure that additional information is constructively used, and has a positive impact on performance. We subsequently demonstrate the viability of the proposed fusion approach by weakening the strong dependence on the functionality of all sensors, hence introducing additional flexibility in our solution and lifting the severe limitation in unconstrained surveillance settings with potential environmental impact. Our proposed data driven approach to multimodal fusion, exploits selected optimal features from an estimated latent space of data across all modalities. This hidden space is learned via a generative network conditioned on individual sensor modalities. The hidden space, as an intrinsic structure, is then exploited in detecting damaged sensors, and in subsequently safeguarding the performance of the fused sensor system. Experimental results show that such an approach can achieve automatic system robustness against noisy/damaged sensors. DA - 2021/1/15/ PY - 2021/1/15/ DO - 10.1109/JSEN.2020.3018698 VL - 21 IS - 2 SP - 1885-1896 SN - 1558-1748 KW - Sensor fusion KW - Sensor phenomena and characterization KW - Generators KW - Sensor systems KW - Generative adversarial networks KW - Feature extraction KW - Multi-modal sensors KW - target detection KW - Generative Adversarial Networks (GAN) KW - Event Driven Fusion ER - TY - JOUR TI - On computing the degree of a Chebyshev Polynomial from its value AU - Imamoglu, Erdal AU - Kaltofen, Erich L. T2 - JOURNAL OF SYMBOLIC COMPUTATION AB - Algorithms for interpolating a polynomial f from its evaluation points whose running time depends on the sparsity t of the polynomial when it is represented as a linear combination of t Chebyshev Polynomials of the First Kind with non-zero scalar coefficients are given by Lakshman and Saunders (1995), Kaltofen and Lee (2003) and Arnold and Kaltofen (2015). The term degrees are computed from values of Chebyshev Polynomials of those degrees. We give an algorithm that computes those degrees in the manner of the Pohlig and Hellman algorithm (1978) for computing discrete logarithms modulo a prime number p when the factorization of p−1 (or p+1) has small prime factors, that is, when p−1 (or p+1) is smooth. Our algorithm can determine the Chebyshev degrees modulo such primes in bit complexity log⁡(p)O(1) times the squareroot of the largest prime factor of p−1 (or p+1). DA - 2021/// PY - 2021/// DO - 10.1016/j.jsc.2020.04.011 VL - 104 SP - 159-167 SN - 1095-855X KW - Algorithms KW - Discrete logarithms KW - Chebyshev Polynomials KW - Interpolation in terms of the Chebyshev KW - Polynomials of the First Kind ER - TY - JOUR TI - Sparse Interpolation With Errors in Chebyshev Basis Beyond Redundant-Block Decoding AU - Kaltofen, Erich L. AU - Yang, Zhi-Hong T2 - IEEE TRANSACTIONS ON INFORMATION THEORY AB - We present sparse interpolation algorithms for recovering a polynomial with ≤ B terms from N evaluations at distinct values for the variable when ≤ E of the evaluations can be erroneous. Our algorithms perform exact arithmetic in the field of scalars K and the terms can be standard powers of the variable or Chebyshev polynomials, in which case the characteristic of K is ≠ 2. Our algorithms return a list of valid sparse interpolants for the N support points and run in polynomial-time. For standard power basis our algorithms sample at N = ⌊4/3 E + 2⌋B points, which are fewer points than N = 2(E + 1)B - 1 given by Kaltofen and Pernet in 2014. For Chebyshev basis our algorithms sample at N = ⌊3/2E + 2⌋B points, which are also fewer than the number of points required by the algorithm given by Arnold and Kaltofen in 2015, which has N = 74⌊E/13 +1⌋ for B = 3 and E ≥ 222. Our method shows how to correct 2 errors in a block of 4B points for standard basis and how to correct 1 error in a block of 3B points for Chebyshev Basis. DA - 2021/1// PY - 2021/1// DO - 10.1109/TIT.2020.3027036 VL - 67 IS - 1 SP - 232-243 SN - 1557-9654 KW - Sparse polynomial interpolation KW - error correction KW - black box polynomial KW - list-decoding ER - TY - JOUR TI - Avoiding Help Avoidance: Using Interface Design Changes to Promote Unsolicited Hint Usage in an Intelligent Tutor (September, 10.1007/s40593-020-00213-3, 2020) AU - Maniktala, Mehak AU - Cody, Christa AU - Barnes, Tiffany AU - Chi, Min T2 - INTERNATIONAL JOURNAL OF ARTIFICIAL INTELLIGENCE IN EDUCATION AB - A Correction to this paper has been published: https://doi.org/10.1007/s40593-020-00232-0 DA - 2021/3// PY - 2021/3// DO - 10.1007/s40593-020-00232-0 VL - 31 IS - 1 SP - 154-155 SN - 1560-4306 ER - TY - JOUR TI - Predictive Student Modeling in Game-Based Learning Environments with Word Embedding Representations of Reflection AU - Geden, Michael AU - Emerson, Andrew AU - Carpenter, Dan AU - Rowe, Jonathan AU - Azevedo, Roger AU - Lester, James T2 - INTERNATIONAL JOURNAL OF ARTIFICIAL INTELLIGENCE IN EDUCATION DA - 2021/3// PY - 2021/3// DO - 10.1007/s40593-020-00220-4 VL - 31 IS - 1 SP - 1-23 SN - 1560-4306 KW - Student modeling KW - Early prediction KW - Game-based learning environments KW - Self-regulated learning KW - Reflection ER - TY - JOUR TI - LipschitzLR: Using theoretically computed adaptive learning rates for fast convergence AU - Yedida, Rahul AU - Saha, Snehanshu AU - Prashanth, Tejas T2 - APPLIED INTELLIGENCE AB - We present a novel theoretical framework for computing large, adaptive learning rates. Our framework makes minimal assumptions on the activations used and exploits the functional properties of the loss function. Specifically, we show that the inverse of the Lipschitz constant of the loss function is an ideal learning rate. We analytically compute formulas for the Lipschitz constant of several loss functions, and through extensive experimentation, demonstrate the strength of our approach using several architectures and datasets. In addition, we detail the computation of learning rates when other optimizers, namely, SGD with momentum, RMSprop, and Adam, are used. Compared to standard choices of learning rates, our approach converges faster, and yields better results. DA - 2021/3// PY - 2021/3// DO - 10.1007/s10489-020-01892-0 VL - 51 IS - 3 SP - 1460-1478 SN - 1573-7497 UR - https://doi.org/10.1007/s10489-020-01892-0 KW - Lipschitz constant KW - Adaptive learning KW - Machine learning KW - Deep learning ER - TY - JOUR TI - TADOC: Text analytics directly on compression AU - Zhang, Feng AU - Zhai, Jidong AU - Shen, Xipeng AU - Wang, Dalin AU - Chen, Zheng AU - Mutlu, Onur AU - Chen, Wenguang AU - Du, Xiaoyong T2 - VLDB JOURNAL AB - This article provides a comprehensive description of text analytics directly on compression (TADOC), which enables direct document analytics on compressed textual data. The article explains the concept of TADOC and the challenges to its effective realizations. Additionally, a series of guidelines and technical solutions that effectively address those challenges, including the adoption of a hierarchical compression method and a set of novel algorithms and data structure designs, are presented. Experiments on six data analytics tasks of various complexities show that TADOC can save 90.8% storage space and 87.9% memory usage, while halving data processing times. DA - 2021/3// PY - 2021/3// DO - 10.1007/s00778-020-00636-3 VL - 30 IS - 2 SP - 163-188 SN - 0949-877X KW - Text analytics KW - Document analytics KW - Compression KW - Sequitur ER - TY - JOUR TI - Efficient algorithms for finding2-mediansof a tree AU - Oudjit, Aissa AU - Stallmann, Matthias F. T2 - NETWORKS AB - Abstract The p ‐median problem for networks is NP‐hard, but polynomial time algorithms exist for trees ( n is the number of nodes): O( pn 2 ) by Tamir, and O( n lg p + 2 n ) by Benkoczi and Bhattacharya. Goldman gave an O( n ) algorithm for the 1‐median problem on trees. Mirchandani and Oudjit proved localization properties for 2‐medians on trees; these were later used to obtain an O( n lg n ) bound, and, in special cases, O( n ) . We present a framework that unifies all efficient algorithms for the 2‐median problem on trees. Our framework isolates the nonlinear part of the computation so that future time‐bound improvements are easily incorporated. We also introduce a method for reducing the search space, improving all known runtimes in many instances. Finally, we give a new algorithm for the case where edge lengths are positive integers. The associated time bound is O( n + D ) , where D is the sum of the logarithms of edge lengths. This is O( n ) if edge lengths are bounded by a constant and O( n lglg n ) if they are O(lg n ) . The algorithm is flexible enough to extend to noninteger edge lengths, preserving the time bound in some circumstances. DA - 2021/4// PY - 2021/4// DO - 10.1002/net.21978 VL - 77 IS - 3 SP - 383-402 SN - 1097-0037 UR - https://doi.org/10.1002/net.21978 KW - 2-median KW - binary search KW - linear time KW - priority queue KW - sorting KW - trees ER - TY - JOUR TI - An Automatic Synthesizer of Advising Tools for High Performance Computing AU - Guan, Hui AU - Shen, Xipeng AU - Krim, Hamid T2 - IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS AB - This article presents Egeria, the first automatic synthesizer of advising tools for High-Performance Computing (HPC). When one provides it with some HPC programming guides as inputs, Egeria automatically constructs a text retrieval tool that can advise on what to do to improve the performance of a given program. The advising tool provides a concise list of essential rules automatically extracted from the documents and can retrieve relevant optimization knowledge for optimization questions. Egeria is built based on a distinctive multi-layered design that leverages natural language processing (NLP) techniques and extends them with HPC-specific knowledge and considerations. This article presents the design, implementation, and both quantitative and qualitative evaluation results of Egeria. DA - 2021/2/1/ PY - 2021/2/1/ DO - 10.1109/TPDS.2020.3018636 VL - 32 IS - 2 SP - 330-341 SN - 1558-2183 KW - Tools KW - Optimization KW - Programming KW - Syntactics KW - Semantics KW - Guidelines KW - Natural language processing KW - Performance tools KW - natural language processing KW - code optimization ER - TY - JOUR TI - Polynomial Treedepth Bounds in Linear Colorings AU - Kun, Jeremy AU - Michael P. O'Brien, AU - Pilipczuk, Marcin AU - Sullivan, Blair D. T2 - ALGORITHMICA AB - Abstract Low-treedepth colorings are an important tool for algorithms that exploit structure in classes of bounded expansion; they guarantee subgraphs that use few colors have bounded treedepth . These colorings have an implicit tradeoff between the total number of colors used and the treedepth bound, and prior empirical work suggests that the former dominates the run time of existing algorithms in practice. We introduce p - linear colorings as an alternative to the commonly used p -centered colorings. They can be efficiently computed in bounded expansion classes and use at most as many colors as p -centered colorings. Although a set of $$k<p$$ k < p colors from a p -centered coloring induces a subgraph of treedepth at most k , the same number of colors from a p -linear coloring may induce subgraphs of larger treedepth. We establish a polynomial upper bound on the treedepth in general graphs, and give tighter bounds in trees and interval graphs via constructive coloring algorithms. We also give a co-NP-completeness reduction for recognizing p -linear colorings and discuss ways to overcome this limitation in practice. DA - 2021/1// PY - 2021/1// DO - 10.1007/s00453-020-00760-0 VL - 83 IS - 1 SP - 361-386 SN - 1432-0541 UR - http://www.scopus.com/inward/record.url?eid=2-s2.0-85090215196&partnerID=MN8TOARS KW - Linear colorings KW - p-centered colorings KW - Bounded expansion KW - Treedepth ER -