Ping Xiang Xiang, P., Yang, Y., & Huiyang. (2014). Warp-level divergence in GPUs: Characterization, impact, and mitigation. International symposium on high-performance computer, 284–295. https://doi.org/10.1109/hpca.2014.6835939 Gupta, S., Xiang, P., Yang, Y., & Zhou, H. (2013). Locality principle revisited: A probability-based quantitative approach. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 73(7), 1011–1027. https://doi.org/10.1016/j.jpdc.2013.01.010 Yang, Y., Xiang, P., Mantor, M., & Zhou, H. Y. (2012). CPU-assisted GPGPU on fused CPU-GPU architectures. International symposium on high-performance computer, 103–114. Gupta, S., Xiang, P., Yang, Y., & Huiyang. (2012). Locality principle revisited: A probability-based quantitative approach. 2012 ieee 26th international parallel and distributed processing symposium (ipdps), 995–1009. https://doi.org/10.1109/ipdps.2012.93 Yang, Y., Xiang, P., Kong, J., & Zhou, H. (2010). An Optimizing Compiler for GPGPU Programs with Input-Data Sharing. PPOPP 2010: PROCEEDINGS OF THE 2010 ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, pp. 343–344. https://doi.org/10.1145/1693453.1693505