TY - RPRT TI - Towards Ontology-Based Program Analysis AU - Zhao, Y. AU - Liao, C. AU - Shen, X. A3 - North Carolina State University DA - 2016/4/28/ PY - 2016/4/28/ M1 - TR-2016-5 M3 - Technical Report PB - North Carolina State University SN - TR-2016-5 ER - TY - RPRT TI - LCD: A Fast Contrastive Divergence Based Training Algorithm for Restricted Boltzmann Machine” AU - Ning, Lin AU - Shen, Xipeng A3 - North Carolina State University DA - 2016/4/8/ PY - 2016/4/8/ M1 - TR-2016-3 PB - North Carolina State University SN - TR-2016-3 ER - TY - CONF TI - Towards Ontology-Based Program Analysis AU - Zhao, Yue AU - Chen, Guoyang AU - Liao, Chunhua AU - Shen, Xipeng A2 - Krishnamurthi, Shriram A2 - Lerner, Benjamin S. T3 - Leibniz International Proceedings in Informatics (LIPIcs) C2 - 2016/// C3 - 30th European Conference on Object-Oriented Programming (ECOOP 2016) DA - 2016/// SP - 26:1--26:25 PB - Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik ER - TY - RPRT TI - A Software Framework for Efficient Preemptive Scheduling on GPU AU - Chen, Guoyang AU - Shen, Xipeng AU - Zhou, Huiyang A3 - North Carolina State University DA - 2016/1// PY - 2016/1// M1 - TR-2016-1 M3 - Technical Report PB - North Carolina State University SN - TR-2016-1 ER - TY - BOOK TI - Languages and Compilers for Parallel Computing A3 - Shen, Xipeng A3 - Mueller, Frank A3 - Tuck, James AB - This book constitutes the thoroughly refereed post-conference proceedings of the 28th International Workshop on Languages and Compilers for Parallel Computing, LCPC 2015, held in Raleigh, NC, USA, in DA - 2016/// PY - 2016/// DO - 10.1007/978-3-319-29778-1 OP - PB - Springer International Publishing SN - 9783319297774 9783319297781 UR - http://dx.doi.org/10.1007/978-3-319-29778-1 DB - Crossref ER - TY - CONF TI - Coherence-Free Multiview AU - Chen, Guoyang AU - Shen, Xipeng T2 - the 2016 International Conference AB - A Graphic Processing Unit (GPU) system is typically equipped with many types of memory (e.g., global, constant, texture, shared, cache). Data placement determines what data are placed on which type of memory, essential for GPU memory performance. Prior optimizations of data placement always require a single view of a data object on memory, which limits the optimization effectiveness. In this work, we propose coherence-free multiview, an approach that allows multiple views of a single data object to co-exist on GPU memory during a GPU kernel execution. We demonstrate that under certain conditions, the multiple views can remain incoherent while facilitating enhanced data placement. We present a theorem and some compiler support to ensure the soundness of the usage of coherence-free multiview. We further develop reference-discerning data placement, a new way to enhance data placements on GPU. It enables more flexible data placements by using coherence-free multiview to leverage the slack in coherence requirement of some GPU programs. Experiments on three types of GPU systems show that, with less than 200KB space cost, the new data placement technique can provide a 1.6X average (up to 4.27X) speedup. C2 - 2016/// C3 - Proceedings of the 2016 International Conference on Supercomputing - ICS '16 DA - 2016/// DO - 10.1145/2925426.2926277 PB - ACM Press SN - 9781450343619 UR - http://dx.doi.org/10.1145/2925426.2926277 DB - Crossref ER - TY - JOUR TI - Bayesian Reliability: Combining Information AU - Wilson, Alyson G. AU - Fronczyk, Kassandra M. T2 - Quality Engineering AB - One of the most powerful features of Bayesian analyses is the ability to combine multiple sources of information in a principled way to perform inference. This feature can be particularly valuable in assessing the reliability of systems where testing is limited. At their most basic, Bayesian methods for reliability develop informative prior distributions using expert judgment or similar systems. Appropriate models allow the incorporation of many other sources of information, including historical data, information from similar systems, and computer models. We introduce the Bayesian approach to reliability using several examples and point to open problems and areas for future work. DA - 2016/8/26/ PY - 2016/8/26/ DO - 10.1080/08982112.2016.1211889 SP - 0-0 J2 - Quality Engineering LA - en OP - SN - 0898-2112 1532-4222 UR - http://dx.doi.org/10.1080/08982112.2016.1211889 DB - Crossref KW - assurance testing KW - hierarchical model KW - Markov chain Monte Carlo KW - posterior distribution KW - prior distribution ER - TY - JOUR TI - A multi-level anomaly detection algorithm for time-varying graph data with interactive visualization AU - Bridges, Robert A. AU - Collins, John AU - Ferragut, Erik M. AU - Laska, Jason AU - Sullivan, Blair D. T2 - Social Network Analysis and Mining AB - This work presents a modeling and analysis framework for graph sequences which addresses the challenge of detecting and contextualizing anomalies in streaming graph data. Our goal is to detect changes at multiple levels of granularity, thereby identifying specific nodes and subgraphs causing a graph to appear anomalously. In particular, the framework detects changes in community membership, density, and node degree in a sequence of graphs where these are relatively stable. In route to this end, we introduce a new graph model, a generalization of the BTER model of Seshadhri et al., by adding flexibility to community structure, and use this model to perform multi-scale graph anomaly detection. This technique provides insight into a graph’s structure and internal context that may shed light on a detected event. Additionally, this multi-scale analysis facilitates intuitive visualizations by allowing users to narrow focus from an anomalous graph to particular subgraphs or nodes causing the anomaly. For evaluation, two hierarchical anomaly detectors are tested against a baseline Gaussian method on a series of sampled graphs. We demonstrate that our graph statistics-based approach outperforms both a distribution-based detector and the baseline in a labeled setting with community structure, and it accurately detects anomalies in synthetic and real-world datasets at the node, subgraph, and graph levels. To illustrate the accessibility of information made possible via this technique, the anomaly detector and an associated interactive visualization tool are tested on NCAA football data, where teams and conferences that moved within the league are identified with perfect recall, and precision >0.786. DA - 2016/10/20/ PY - 2016/10/20/ DO - 10.1007/S13278-016-0409-Y VL - 6 IS - 1 J2 - Soc. Netw. Anal. Min. LA - en OP - SN - 1869-5450 1869-5469 UR - http://dx.doi.org/10.1007/S13278-016-0409-Y DB - Crossref KW - Anomaly detection KW - Graph sequence KW - Visualization ER - TY - JOUR TI - Incorporating concomitant medications into genome-wide analyses for the study of complex disease and drug response AU - Graham, H. T. AU - Rotroff, D. M. AU - Marvel, S. W. AU - Buse, J. B. AU - Havener, T. M. AU - Wilson, A. G. AU - Wagner, M. J. AU - Motsinger-Reif, A. A. T2 - Frontiers in Genetics DA - 2016/// PY - 2016/// VL - 7 ER - TY - CONF TI - Asymptotic analysis of equivalences and core-structures in Kronecker-style graph models AU - Chin, A. J. AU - Goodrich, T. D. AU - O'Brien, M. P. AU - Reidl, F. AU - Sullivan, Blair D. AU - Poel, A. AB - Growing interest in modeling large, complexnetworks has spurred significant research into generative graphmodels. Kronecker-style models (e.g. SKG and R-MAT) are oftenused due to their scalability and ability to mimic key propertiesof real-world networks. Although a few papers theoreticallyestablish these models' behavior for specific parameters, manyclaims used to justify their use are supported only empirically. In this work, we prove several results using asymptotic analysiswhich illustrate that empirical studies may not fully capture thetrue behavior of the models. Paramount to the widespread adoption of Kronecker-stylemodels was the introduction of a linear-time edge-samplingvariant (R-MAT), which existing literature typically treats asinterchangeable with SKG. We prove that although several R-MAT formulations are asymptotically equivalent, their behaviordiverges from that of SKG. Further, we show these resultsare observable even at relatively small graph sizes. Second, weconsider a case where asymptotic analysis reveals unexpectedbehavior within a given model. C2 - 2016/// C3 - 2016 ieee 16th international conference on data mining (icdm) DA - 2016/// DO - 10.1109/icdm.2016.0098 SP - 829–834 ER - TY - JOUR TI - OpenCL-based erasure coding on heterogeneous architectures AU - Chen, Guoyang AU - Zhou, Huiyang AU - Shen, Xipeng AU - Gahm, Josh AU - Venkat, Narayan AU - Booth, Skip AU - Marshall, John T2 - 2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP) AB - Erasure coding, Reed-Solomon coding in particular, is a key technique to deal with failures in scale-out storage systems. However, due to the algorithmic complexity, the performance overhead of erasure coding can become a significant bottleneck in storage systems attempting to meet service level agreements (SLAs). Previous work has mainly leveraged SIMD (single-instruction multiple-data) instruction extensions in general purpose processors to improve the processing throughput. In this work, we exploit state-of-art heterogeneous architectures, including GPUs, APUs, and FPGAs, to accelerate erasure coding. We leverage the OpenCL framework for our target heterogeneous architectures and propose code optimizations for each target architecture. Given their different hardware characteristics, we highlight the different optimization strategies for each of the target architectures. Using the throughput metric as the ratio of the input file size over the processing latency, we achieve 2.84 GB/s on a 28-core Xeon CPU, 3.90 GB/s on an NVIDIA K40m GPU, 0.56 GB/s on an AMD Carrizo APU, and 1.19 GB/s (5.35 GB/s if only considering the kernel execution latency) on an Altera Stratix V FPGA, when processing a 836.9MB zipped file with a 30x33 encoding matrix. In comparison, the single-thread code using the Intel's ISA-L library running on the Xeon CPU has the throughput of 0.13 GB/s. DA - 2016/7// PY - 2016/7// DO - 10.1109/asap.2016.7760770 VL - 7 SP - 33–40 ER - TY - CONF TI - Cross-institutional research cyberinfrastructure for data intensive science AU - Lenhardt, W. C. AU - Conway, M. AU - Scott, E. AU - Blanton, B. AU - Krishnamurthy, A. AU - Hadzikadic, M. AU - Vouk, M. AU - Wilson, Alyson AB - This paper describes a multi-institution effort to develop a “data science as a service” platform. This platform integrates advanced federated data management for small to large datasets, access to high performance computing, distributed computing and advanced networking. The goal is to develop a platform that is flexible and extensible while still supporting domain research and avoiding the walled garden problem. Some preliminary lessons learned and next steps will also be outlined. C2 - 2016/// C3 - 2016 ieee high performance extreme computing conference (hpec) DA - 2016/// DO - 10.1109/hpec.2016.7761597 ER - TY - JOUR TI - Tree decompositions and social graphs AU - Adcock, Aaron B. AU - Sullivan, Blair D. AU - Mahoney, Michael W. T2 - INTERNET MATHEMATICS AB - Recent work has established that large informatics graphs such as social and information networks have non-trivial tree-like structure when viewed at moderate size scales. Here, we present results from the first detailed empirical evaluation of the use of tree decomposition (TD) heuristics for structure identification and extraction in social graphs. Although TDs have historically been used in structural graph theory and scientific computing, we show that—even with existing TD heuristics developed for those very different areas—TD methods can identify interesting structure in a wide range of realistic informatics graphs. Our main contributions are the following: we show that TD methods can identify structures that correlate strongly with the core-periphery structure of realistic networks, even when using simple greedy heuristics; we show that the peripheral bags of these TDs correlate well with low-conductance communities (when they exist) found using local spectral computations; and we show that several types of large-scale “ground-truth” communities, defined by demographic metadata on the nodes of the network, are well-localized in the large-scale and/or peripheral structures of the TDs. Our other main contributions are the following: we provide detailed empirical results for TD heuristics on toy and synthetic networks to establish a baseline to understand better the behavior of the heuristics on more complex real-world networks; and we prove a theorem providing formal justification for the intuition that the only two impediments to low-distortion hyperbolic embedding are high tree-width and long geodesic cycles. Our results suggest future directions for improved TD heuristics that are more appropriate for realistic social graphs. DA - 2016/// PY - 2016/// DO - 10.1080/15427951.2016.1182952 VL - 12 IS - 5 SP - 315-361 SN - 1944-9488 UR - http://www.scopus.com/inward/record.url?eid=2-s2.0-84978531731&partnerID=MN8TOARS ER - TY - JOUR TI - Examining and Reducing the Influence of Sampling Errors on Feedback-Driven Optimizations AU - Zhou, Mingzhou AU - Wu, Bo AU - Shen, Xipeng AU - Gao, Yaoqing AU - Yiu, Graham T2 - ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION AB - Feedback-driven optimization (FDO) is an important component in mainstream compilers. By allowing the compiler to reoptimize the program based on some profiles of the program's dynamic behaviors, it often enhances the quality of the generated code substantially. A barrier for using FDO is that it often requires many training runs to collect enough profiles to amortize the sensitivity of program optimizations to program input changes. Various sampling techniques have been explored to alleviate this time-consuming process. However, the lowered profile accuracy caused by sampling often hurts the benefits of FDO. This article gives the first systematic study in how sampling rates affect the accuracy of collected profiles and how the accuracy correlates with the usefulness of the profile for modern FDO. Studying basic block and edge profiles for FDO in two mature compilers reveals several counterintuitive observations, one of which is that profiling accuracy does not strongly correlate with the benefits of the FDO. A detailed analysis identifies three types of sampling-caused errors that critically impair the quality of the profiles for FDO. It then introduces a simple way to rectify profiles based on the findings. Experiments demonstrate that the simple rectification fixes most of those critical errors in sampled profiles and significantly enhances the effectiveness of FDO. DA - 2016/4// PY - 2016/4// DO - 10.1145/2851502 VL - 13 IS - 1 SP - SN - 1544-3973 KW - Compiler KW - Profiling KW - Feedback-Driven Optimization (FDO) KW - Performance KW - Input Sensitivity KW - Performance KW - influence of sampling errors KW - feedback-driven optimization ER - TY - JOUR TI - Tuning for software analytics: Is it really necessary? AU - Fu, Wei AU - Menzies, Tim AU - Shen, Xipeng T2 - Information and Software Technology AB - Context: Data miners have been widely used in software engineering to, say, generate defect predictors from static code measures. Such static code defect predictors perform well compared to manual methods, and they are easy to use and useful to use. But one of the “black arts” of data mining is setting the tunings that control the miner. Objective: We seek simple, automatic, and very effective method for finding those tunings. Method: For each experiment with different data sets (from open source JAVA systems), we ran differential evolution as an optimizer to explore the tuning space (as a first step) then tested the tunings using hold-out data. Results: Contrary to our prior expectations, we found these tunings were remarkably simple: it only required tens, not thousands, of attempts to obtain very good results. For example, when learning software defect predictors, this method can quickly find tunings that alter detection precision from 0% to 60%. Conclusion: Since (1) the improvements are so large, and (2) the tuning is so simple, we need to change standard methods in software analytics. At least for defect prediction, it is no longer enough to just run a data miner and present the result without conducting a tuning optimization study. The implication for other kinds of analytics is now an open and pressing issue. DA - 2016/8// PY - 2016/8// DO - 10.1016/j.infsof.2016.04.017 VL - 76 SP - 135-146 J2 - Information and Software Technology LA - en OP - SN - 0950-5849 UR - http://dx.doi.org/10.1016/j.infsof.2016.04.017 DB - Crossref KW - Defect prediction KW - CART KW - Random forest KW - Differential evolution KW - Search-based software engineering ER - TY - JOUR TI - Use of Bayesian inference in crystallographic structure refinement via full diffraction profile analysis AU - Fancher, C.M. AU - Han, Z. AU - Levin, I. AU - Page, K. AU - Reich, B.J. AU - Smith, R.C. AU - Wilson, Alyson AU - Jones, J.L. T2 - Scientific Reports AB - A Bayesian inference method for refining crystallographic structures is presented. The distribution of model parameters is stochastically sampled using Markov chain Monte Carlo. Posterior probability distributions are constructed for all model parameters to properly quantify uncertainty by appropriately modeling the heteroskedasticity and correlation of the error structure. The proposed method is demonstrated by analyzing a National Institute of Standards and Technology silicon standard reference material. The results obtained by Bayesian inference are compared with those determined by Rietveld refinement. Posterior probability distributions of model parameters provide both estimates and uncertainties. The new method better estimates the true uncertainties in the model as compared to the Rietveld method. DA - 2016/// PY - 2016/// DO - 10.1038/srep31625 VL - 6 UR - http://www.scopus.com/inward/record.url?eid=2-s2.0-84983527047&partnerID=MN8TOARS ER -