TY - JOUR TI - Zig-Zag Numberlink is NP-Complete T2 - Journal of Information Processing AB - When can t terminal pairs in an m × n grid be connected by t vertex-disjoint paths that cover all vertices of the grid? We prove that this problem is NP-complete. Our hardness result can be compared to two previous NP-hardness proofs: Lynch's 1975 proof without the “cover all vertices” constraint, and Kotsuma and Takenaga's 2010 proof when the paths are restricted to have the fewest possible corners within their homotopy class. The latter restriction is a common form of the famous Nikoli puzzle Numberlink. Our problem is another common form of Numberlink, sometimes called Zig-Zag Numberlink and popularized by the smartphone app Flow Free. DA - 2015/// PY - 2015/// DO - 10.2197/ipsjjip.23.239 VL - 23 IS - 3 SP - 239-245 UR - http://www.scopus.com/inward/record.url?eid=2-s2.0-84929412116&partnerID=MN8TOARS ER - TY - RPRT TI - Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup AU - Ding, Yufei AU - Shen, Xipeng AU - Musuvathi, Madanlal AU - Mytkowicz, Todd A3 - North Carolina State University DA - 2015/// PY - 2015/// M1 - TR-2015-2 M3 - Technical Report PB - North Carolina State University SN - TR-2015-2 ER - TY - RPRT TI - TOP: A Framework for Enabling Algorithmic Optimizations for Distance-Related Problems” AU - Ding, Yufei AU - Shen, Xipeng AU - Musuvathi, Madanlal AU - Mytkowicz, Todd A3 - North Carolina State University DA - 2015/// PY - 2015/// M1 - TR-2015-3 M3 - Technical Report PB - North Carolina State University SN - TR-2015-3 ER - TY - CONF TI - TOP: A Framework for Enabling Algorithmic Optimizations for Distance-Related Problems AU - Ding, Yufei AU - Shen, Xipeng AU - Musuvathi, Madan AU - Mytkowicz, Todd T2 - 41st International Conference on Very Large Data Bases A2 - Li, Chen A2 - Markl, Volker C2 - 2015/8// C3 - 41st International Conference on Very Large Data Bases (VLDB 2015) : proceedings of the VLDB Endowment, volume 8, number 1-13, Kohala Coast, Hawaii, USA, 31 August-4 September 2015 CY - Kohala Coast, Hawaii DA - 2015/8// PY - 2015/8// PB - VLDB Endowment ER - TY - CONF TI - Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup AU - Ding, Yufei AU - Zhao, Yue AU - Shen, Xipeng AU - Musuvathi, Madan AU - Mytkowicz, Todd T2 - The 32nd International Conference on Machine Learning C2 - 2015/7/6/ C3 - Proceedings of the 32nd International Conference on Machine Learning CY - Lille, France DA - 2015/7/6/ PY - 2015/7/6/ VL - 37 SP - 579-587 ER - TY - CONF TI - Enabling and Exploiting Flexible Task Assignment on GPU through SM-Centric Program Transformations AU - Wu, Bo AU - Chen, Guoyang AU - Li, Dong AU - Shen, Xipeng AU - Vetter, Jeffrey T2 - the 29th ACM AB - A GPU's computing power lies in its abundant memory bandwidth and massive parallelism. However, its hardware thread schedulers, despite being able to quickly distribute computation to processors, often fail to capitalize on program characteristics effectively, achieving only a fraction of the GPU's full potential. Moreover, current GPUs do not allow programmers or compilers to control this thread scheduling, forfeiting important optimization opportunities at the program level. This paper presents a transformation centered on Streaming Multiprocessors (SM); this software approach to circumventing the limitations of the hardware scheduler allows flexible program-level control of scheduling. By permitting precise control of job locality on SMs, the transformation overcomes inherent limitations in prior methods. C2 - 2015/// C3 - Proceedings of the 29th ACM on International Conference on Supercomputing - ICS '15 DA - 2015/// DO - 10.1145/2751205.2751213 PB - ACM Press SN - 9781450335591 UR - http://dx.doi.org/10.1145/2751205.2751213 DB - Crossref KW - GPGPU KW - Scheduling KW - Compiler Transformation KW - Data Affinity KW - Program Co-Run ER - TY - CONF TI - Free launch AU - Chen, Guoyang AU - Shen, Xipeng T2 - the 48th International Symposium AB - Supporting dynamic parallelism is important for GPU to benefit a broad range of applications. There are currently two fundamental ways for programs to exploit dynamic parallelism on GPU: a software-based approach with software-managed worklists, and a hardware-based approach through dynamic subkernel launches. Neither is satisfactory. The former is complicated to program and is often subject to some load imbalance; the latter suffers large runtime overhead. C2 - 2015/// C3 - Proceedings of the 48th International Symposium on Microarchitecture - MICRO-48 DA - 2015/// DO - 10.1145/2830772.2830818 PB - ACM Press SN - 9781450340342 UR - http://dx.doi.org/10.1145/2830772.2830818 DB - Crossref KW - GPU KW - Dynamic Parallelism KW - Optimization KW - Thread Reuse Compiler KW - Runtime Adaptation ER - TY - CHAP TI - Understanding Co-run Degradations on Integrated Heterogeneous Processors AU - Zhu, Qi AU - Wu, Bo AU - Shen, Xipeng AU - Shen, Li AU - Wang, Zhiying T2 - Languages and Compilers for Parallel Computing PY - 2015/// DO - 10.1007/978-3-319-17473-0_6 SP - 82-97 OP - PB - Springer International Publishing SN - 9783319174723 9783319174730 UR - http://dx.doi.org/10.1007/978-3-319-17473-0_6 DB - Crossref KW - Heterogeneous architecture KW - Performance analysis KW - CPU and memory contention KW - Optimization KW - GPGPU ER - TY - JOUR TI - Hyperbolicity, Degeneracy, and Expansion of Random Intersection Graphs AU - Farrell, Matthew AU - Goodrich, Timothy D. AU - Lemons, Nathan AU - Reidl, Felix AU - Villaamil, Fernando Sanchez AU - Sullivan, Blair D. T2 - ALGORITHMS AND MODELS FOR THE WEB GRAPH, (WAW 2015) AB - We establish the conditions under which several algorithmically exploitable structural features hold for random intersection graphs, a natural model for many real-world networks where edges correspond to shared attributes. Specifically, we fully characterize the degeneracy of random intersection graphs, and prove that the model asymptotically almost surely produces graphs with hyperbolicity at least $$\log {n}$$ . Further, we prove that when degenerate, the graphs generated by this model belong to a bounded-expansion graph class with high probability, a property particularly suitable for the design of linear time algorithms. DA - 2015/// PY - 2015/// DO - 10.1007/978-3-319-26784-5_3 VL - 9479 SP - 29-41 SN - 1611-3349 UR - http://www.scopus.com/inward/record.url?eid=2-s2.0-84951869780&partnerID=MN8TOARS ER - TY - JOUR TI - On the Threshold of Intractability AU - Drange, Pal Gronas AU - Dregi, Markus Sortland AU - Lokshtanov, Daniel AU - Sullivan, Blair D. T2 - ALGORITHMS - ESA 2015 AB - We study the computational complexity of the graph modification problems and , adding and deleting as few edges as possible to transform the input into a threshold (or chain) graph. In this article, we show that both problems are -hard, resolving a conjecture by Natanzon, Shamir, and Sharan (2001). On the positive side, we show that these problems admit quadratic vertex kernels. Furthermore, we give a subexponential time parameterized algorithm solving in time, making it one of relatively few natural problems in this complexity class on general graphs. These results are of broader interest to the field of social network analysis, where recent work of Brandes (2014) posits that the minimum edit distance to a threshold graph gives a good measure of consistency for node centralities. Finally, we show that all our positive results extend to , as well as the completion and deletion variants of both problems. DA - 2015/// PY - 2015/// DO - 10.1007/978-3-662-48350-3_35 VL - 9294 SP - 411-423 SN - 1611-3349 UR - http://www.scopus.com/inward/record.url?eid=2-s2.0-84945584240&partnerID=MN8TOARS ER - TY - JOUR TI - On-the-Fly Principled Speculation for FSM Parallelization AU - Zhao, Zhijia AU - Shen, Xipeng T2 - ACM SIGPLAN NOTICES AB - Finite State Machine (FSM) is the backbone of an important class of applications in many domains. Its parallelization has been extremely difficult due to inherent strong dependences in the computation. Recently, principled speculation shows good promise to solve the problem. However, the reliance on offline training makes the approach inconvenient to adopt and hard to apply to many practical FSM applications, which often deal with a large variety of inputs different from training inputs. This work presents an assembly of techniques that completely remove the needs for offline training. The techniques include a set of theoretical results on inherent properties of FSMs, and two newly designed dynamic optimizations for efficient FSM characterization. The new techniques, for the first time, make principle speculation applicable on the fly, and enables swift, automatic configuration of speculative parallelizations to best suit a given FSM and its current input. They eliminate the fundamental barrier for practical adoption of principle speculation for FSM parallelization. Experiments show that the new techniques give significantly higher speedups for some difficult FSM applications in the presence of input changes. DA - 2015/4// PY - 2015/4// DO - 10.1145/2775054.2694369 VL - 50 IS - 4 SP - 619-630 SN - 1558-1160 KW - Languages KW - Performance KW - Finite State Machine KW - FSM KW - DFA KW - Speculative Parallelization KW - Multicore KW - Online Profiling ER - TY - JOUR TI - Multi-Level Anomaly Detection on Time-Varying Graph Data AU - Bridges, Robert A. AU - Collins, John P. AU - Ferragut, Erik M. AU - Laska, Jason A. AU - Sullivan, Blair D. T2 - PROCEEDINGS OF THE 2015 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM 2015) AB - This work presents a novel modeling and analysis framework for graph sequences which addresses the challenge of detecting and contextualizing anomalies in labeled, streaming graph data. We introduce a generalization of the BTER model of Seshadhri et al. by adding flexibility to community structure, and use this model to perform multi-scale graph anomaly detection. Specifically, probability models describing coarse subgraphs are built by aggregating node probabilities, and these related hierarchical models simultaneously detect deviations from expectation. This technique provides insight into the graphs' structure and helps contextualized detected event. For evaluation, two new hierarchical detectors are tested against a baseline Gaussian method on a synthetic graph sequence with seeded anomalies. We demonstrate that in a labeled setting with community structure, our graph statistics-based approach outperforms both a distribution-based detector and the baseline, accurately detecting anomalies at the node, subgraph, and graph levels. DA - 2015/// PY - 2015/// DO - 10.1145/2808797.2809406 SP - 579-583 UR - http://www.scopus.com/inward/record.url?eid=2-s2.0-84962580244&partnerID=MN8TOARS ER - TY - JOUR TI - Statistical methods for combining information: Stryker family of vehicles reliability case study AU - Dickinson, R.M. AU - Freeman, L.J. AU - Simpson, B.A. AU - Wilson, A.G. T2 - Journal of Quality Technology AB - Problem: Reliability is an essential element in assessing the operational suitability of Department of Defense weapon systems. Reliability takes a prominent role in both the design and analysis of operational tests. In the current era of reduced budgets and increased reliability requirements, it is challenging to verify reliability requirements in a single test. Furthermore, all available data should be considered in order to ensure evaluations provide the most appropriate analysis of the system's reliability.Approach: This paper describes the benefits of using parametric statistical models to combine information across multiple testing events. Both frequentist and Bayesian inference techniques are employed and they are compared and contrasted to illustrate different statistical methods for combining information. We apply these methods to data collected during the developmental and operational test phases for the Stryker family of vehicles.Results: We show that, when we combine the available information across two test phases for the Stryker family of vehicles, reliability estimates are more accurate and precise than those reported previously using traditional methods that use only operational test data in their reliability assessments. DA - 2015/// PY - 2015/// DO - 10.1080/00224065.2015.11918142 VL - 47 IS - 4 SP - 400-415 UR - http://www.scopus.com/inward/record.url?eid=2-s2.0-84976273973&partnerID=MN8TOARS ER - TY - JOUR TI - Computing quality scores and uncertainty for approximate pattern matching in geospatial semantic graphs AU - Stracuzzi, David J. AU - Brost, Randy C. AU - Phillips, Cynthia A. AU - Robinson, David G. AU - Wilson, Alyson G. AU - Woodbridge, Diane M. -K. T2 - STATISTICAL ANALYSIS AND DATA MINING AB - Abstract Geospatial semantic graphs provide a robust foundation for representing and analyzing remote sensor data. In particular, they support a variety of pattern search operations that capture the spatial and temporal relationships among the objects and events in the data. However, in the presence of large data corpora, even a carefully constructed search query may return a large number of unintended matches. This work considers the problem of calculating a quality score for each match to the query, given that the underlying data are uncertain. We present a preliminary evaluation of three methods for determining both match quality scores and associated uncertainty bounds, illustrated in the context of an example based on overhead imagery data. DA - 2015/// PY - 2015/// DO - 10.1002/sam.11294 VL - 8 IS - 5-6 SP - 340-352 SN - 1932-1872 UR - http://www.scopus.com/inward/record.url?eid=2-s2.0-84944877405&partnerID=MN8TOARS KW - uncertainty KW - confidence intervals KW - statistical models KW - graphical models KW - distance metric KW - image interpretation KW - graph search ER - TY - JOUR TI - WHY DO SIMPLE ALGORITHMS FOR TRIANGLE ENUMERATION WORK IN THE REAL WORLD? AU - Berry, Jonathan W. AU - Fostvedt, Luke A. AU - Nordman, Daniel J. AU - Phillips, Cynthia A. AU - Seshadhri, C. AU - Wilson, Alyson G. T2 - INTERNET MATHEMATICS AB - _Listing all triangles is a fundamental graph operation. Triangles can have important interpretations in real-world graphs, especially social and other interaction networks. Despite the lack of provably efficient (linear, or slightly super linear) worst-case algorithms for this problem, practitioners run simple, efficient heuristics to find all triangles in graphs with millions of vertices. How are these heuristics exploiting the structure of these special graphs to provide major speedups in running time_? _We study one of the most prevalent algorithms used by practitioners. A trivial algorithm enumerates all paths of length 2, and checks if each such path is incident to a triangle. A good heuristic is to enumerate only those paths of length 2 in which the middle vertex has the lowest degree. It is easily implemented and is empirically known to give remarkable speedups over the trivial algorithm_. _We study the behavior of this algorithm over graphs with heavy-tailed degree distributions, a defining feature of real-world graphs. The erased configuration model (ECM) efficiently generates a graph with asymptotically (almost) any desired degree sequence. We show that the expected running time of this algorithm over the distribution of graphs created by the ECM is controlled by the ℓ4/3-norm of the degree sequence. Norms of the degree sequence are a measure of the heaviness of the tail, and it is precisely this feature that allows non trivial speedups of simple triangle enumeration algorithms. As a corollary of our main theorem, we prove expected linear-time performance for degree sequences following a power law with exponent α ≥ 7/3, and non trivial speedup whenever_ α ∈ (2, 3). DA - 2015/11/2/ PY - 2015/11/2/ DO - 10.1080/15427951.2015.1037030 VL - 11 IS - 6 SP - 555-571 SN - 1944-9488 UR - http://www.scopus.com/inward/record.url?eid=2-s2.0-84953870563&partnerID=MN8TOARS ER - TY - JOUR TI - Enabling Portable Optimizations of Data Placement on GPU AU - Chen, Guoyang AU - Wu, Bo AU - Li, Dong AU - Shen, Xipeng T2 - IEEE Micro AB - Modern GPU memory systems manifest more varieties, increasing complexities, and rapid changes. Different placements of data on memory systems often cause significant differences in program performance. Most current GPU programming systems rely on programmers to indicate the appropriate placements, but finding the appropriate placements is difficult for programmers in practice owing to the complexity and fast changes of memory systems, as well as the input sensitivity of appropriate data placements--that is, the best placements often differ when a program runs on a different input data set. This article introduces a software framework called Porple. It offers a solution that, for the first time, makes it possible to automatically enhance data placement across a GPU. Through Porple, a GPU program's data gets placed appropriately on memory on the fly, customized to the current input dataset. Moreover, when new memory systems arrive, it can easily adapt the placements accordingly. Experiments on three types of GPU systems show that Porple consistently finds optimal or near-optimal placement, yielding up to 2.93 times (1.75 times average on three generations of GPU) speedups compared to programmers' decisions. DA - 2015/7// PY - 2015/7// DO - 10.1109/mm.2015.53 VL - 35 IS - 4 SP - 16-24 J2 - IEEE Micro OP - SN - 0272-1732 1937-4143 UR - http://dx.doi.org/10.1109/MM.2015.53 DB - Crossref ER - TY - JOUR TI - Autotuning Algorithmic Choice for Input Sensitivity AU - Ding, Yufei AU - Ansel, Jason AU - Veeramachaneni, Kalyan AU - Shen, Xipeng AU - O'Reilly, Una-May AU - Amarasinghe, Saman T2 - ACM SIGPLAN NOTICES AB - A daunting challenge faced by program performance autotuning is input sensitivity, where the best autotuned configuration may vary with different input sets. This paper presents a novel two-level input learning algorithm to tackle the challenge for an important class of autotuning problems, algorithmic autotuning. The new approach uses a two-level input clustering method to automatically refine input grouping, feature selection, and classifier construction. Its design solves a series of open issues that are particularly essential to algorithmic autotuning, including the enormous optimization space, complex influence by deep input features, high cost in feature extraction, and variable accuracy of algorithmic choices. Experimental results show that the new solution yields up to a 3x speedup over using a single configuration for all inputs, and a 34x speedup over a traditional one-level method for addressing input sensitivity in program optimizations. DA - 2015/6// PY - 2015/6// DO - 10.1145/2813885.2737969 VL - 50 IS - 6 SP - 379-390 SN - 1558-1160 KW - Algorithms KW - Languages KW - Performance KW - Petabricks KW - Autotuning KW - Algorithmic Optimization KW - Input Adaptive KW - Input Sensitivity KW - Two-level Input Learning ER - TY - CONF TI - Understanding co-run degradations on integrated heterogeneous processors AU - Zhu, Q. AU - Wu, B. AU - Shen, X. P. AU - Shen, L. AU - Wang, Z. Y. C2 - 2015/// C3 - Languages and compilers for parallel computing (lcpc 2014) DA - 2015/// VL - 8967 SP - 82-97 ER -