TY - CONF
TI - Characterization of Sweetpotato Inheritance Using Ultradense Multilocus Genetic Map
AU - Mollinari, M.
AU - Bode, A.O.
AU - Pereira, G.S.
AU - Gemenet, D.C.
AU - Khan, A.
AU - Yencho, Gc
AU - Zeng, Z B
T2 - International Plant & Animal Genome XXVIII Conference
C2 - 2020///
C3 - International Plant & Animal Genome XXVIII Conference
DA - 2020///
ER -
TY - JOUR
TI - fullsibQTL: An R package for QTL mapping in biparental populations of outcrossing species
AU - Gazaffi, R.
AU - Amadeu, R.R.
AU - Mollinari, M.
AU - Rosa, J.R.B.F.
AU - Taniguti, C.H.
AU - Margarido, G.R.A.
AU - Garcia, A.A.F.
T2 - bioRxiv
AB - ABSTRACT Accurate QTL mapping in outcrossing species requires software programs which consider genetic features of these populations, such as markers with different segregation patterns and different level of information. Although the available mapping procedures to date allow inferring QTL position and effects, they are mostly not based on multilocus genetic maps. Having a QTL analysis based in such maps is crucial since they allow informative markers to propagate their information to less informative intervals of the map. We developed fullsibQTL , a novel and freely available R package to perform composite interval QTL mapping considering outcrossing populations and markers with different segregation patterns. It allows to estimate QTL position, effects, segregation patterns, and linkage phase with flanking markers. Additionally, several statistical and graphical tools are implemented, for straightforward analysis and interpretations. fullsibQTL is an R open source package with C and R source code (GPLv3). It is multiplatform and can be installed from https://github.com/augusto-garcia/fullsibQTL .
DA - 2020///
PY - 2020///
DO - 10.1101/2020.12.04.412262
UR - http://www.scopus.com/inward/record.url?eid=2-s2.0-85099897031&partnerID=MN8TOARS
ER -
TY - JOUR
TI - Quantitative trait locus mapping for common scab resistance in a tetraploid potato full-sib population
AU - Silva Pereira, G.
AU - Mollinari, M.
AU - Qu, X.
AU - Thill, C.
AU - Zeng, Z.-B.
AU - Haynes, K.
AU - Yencho, G.C.
T2 - bioRxiv
AB - Abstract Despite the negative impact of common scab ( Streptomyces spp.) to the potato industry, little is known about the genetic architecture of resistance to this bacterial disease in the crop. We evaluated a mapping population (~150 full-sibs) derived from a cross between two tetraploid potatoes (‘Atlantic’ × B1829-5) in three environments (MN11, PA11, ME12) under natural common scab pressure. Three measures to common scab reaction were assessed, namely percentage of scabby tubers, and disease area and lesion indices, which were highly correlated (>0.76). Due to large environmental effect, heritability values were zero for all three traits in MN11, but moderate to high in PA11 and ME12 (0.44~0.79). We identified a single quantitative trait locus (QTL) for lesion index in PA11, ME12 and joint analyses on linkage group 3, explaining 22~30% of the total variation. The identification of QTL haplotypes and candidate genes contributing to disease resistance can support genomics-assisted breeding approaches.
DA - 2020///
PY - 2020///
DO - 10.1101/2020.10.24.353557
UR - http://www.scopus.com/inward/record.url?eid=2-s2.0-85098810824&partnerID=MN8TOARS
ER -
TY - JOUR
TI - Mercury exposure in relation to sleep duration, timing, and fragmentation among adolescents in Mexico City
AU - Jansen, E.C.
AU - Hector, Emily C.
AU - Goodrich, J.M.
AU - Cantoral, A.
AU - Rojo, M.M. Téllez
AU - Basu, N.
AU - Song, P.X.-K.
AU - Olascoaga, L. Torres
AU - Peterson, K.E.
T2 - Environmental Research
DA - 2020///
PY - 2020///
VL - 191
SP - 110216
ER -
TY - JOUR
TI - Doubly distributed supervised learning and inference with high-dimensional correlated outcomes
AU - Hector, Emily C.
AU - Song, Peter X.-K.
T2 - Journal of Machine Learning Research
DA - 2020///
PY - 2020///
VL - 21
SP - 1-35
ER -
TY - JOUR
TI - The value of summary statistics for anomaly detection in temporally evolving networks: A performance evaluation study
AU - Kodali, Lata
AU - Sengupta, Srijan
AU - House, Leanna
AU - Woodall, William H
T2 - Applied Stochastic Models in Business and Industry
DA - 2020///
PY - 2020///
VL - 36
IS - 6
SP - 980-1013
ER -
TY - JOUR
TI - Scalable estimation of epidemic thresholds via node sampling
AU - Dasgupta, Anirban
AU - Sengupta, Srijan
T2 - arXiv preprint arXiv:2007.14820
DA - 2020///
PY - 2020///
ER -
TY - JOUR
TI - Online Social Deception and Its Countermeasures: A Survey
AU - Guo, Zhen
AU - Cho, Jin-Hee
AU - Chen, Ray
AU - Sengupta, Srijan
AU - Hong, Michin
AU - Mitra, Tanushree
T2 - IEEE Access
DA - 2020///
PY - 2020///
ER -
TY - JOUR
TI - Improved understanding and prediction of freshwater fish communities through the use of joint species distribution models
AU - Wagner, Tyler
AU - Hansen, Gretchen J.A.
AU - Schliep, Erin M.
AU - Bethke, Bethany J.
AU - Honsey, Andrew E.
AU - Jacobson, Peter C.
AU - Kline, Benjamen C.
AU - White, Shannon L.
T2 - Canadian Journal of Fisheries and Aquatic Sciences
AB - Two primary goals in fisheries research are to (i) understand how habitat and environmental conditions influence the distribution of fishes across the landscape and (ii) make predictions about how fish communities will respond to environmental and anthropogenic change. In inland, freshwater ecosystems, quantitative approaches traditionally used to accomplish these goals largely ignore the effects of species interactions (competition, predation, mutualism) on shaping community structure, potentially leading to erroneous conclusions regarding habitat associations and unrealistic predictions about species distributions. Using two contrasting case studies, we highlight how joint species distribution models (JSDMs) can address the aforementioned deficiencies by simultaneously quantifying the effects of abiotic habitat variables and species dependencies. In particular, we show that conditional predictions of species occurrence from JSDMs can better predict species presence or absence compared with predictions that ignore species dependencies. JSDMs also allow for the estimation of site-specific probabilities of species co-occurrence, which can be informative for generating hypotheses about species interactions. JSDMs provide a flexible framework that can be used to address a variety of questions in fisheries science and management.
DA - 2020/9//
PY - 2020/9//
DO - 10.1139/cjfas-2019-0348
VL - 77
IS - 9
SP - 1540-1551
J2 - Can. J. Fish. Aquat. Sci.
LA - en
OP -
SN - 0706-652X 1205-7533
UR - http://dx.doi.org/10.1139/cjfas-2019-0348
DB - Crossref
ER -
TY - JOUR
TI - Ecological prediction at macroscales using big data: Does sampling design matter?
AU - Soranno, Patricia A.
AU - Cheruvelil, Kendra Spence
AU - Liu, Boyang
AU - Wang, Qi
AU - Tan, Pang‐Ning
AU - Zhou, Jiayu
AU - King, Katelyn B. S.
AU - McCullough, Ian M.
AU - Stachelek, Jemma
AU - Bartley, Meridith
AU - Filstrup, Christopher T.
AU - Hanks, Ephraim M.
AU - Lapierre, Jean‐François
AU - Lottig, Noah R.
AU - Schliep, Erin M.
AU - Wagner, Tyler
AU - Webster, Katherine E.
T2 - Ecological Applications
AB - Abstract Although ecosystems respond to global change at regional to continental scales (i.e., macroscales), model predictions of ecosystem responses often rely on data from targeted monitoring of a small proportion of sampled ecosystems within a particular geographic area. In this study, we examined how the sampling strategy used to collect data for such models influences predictive performance. We subsampled a large and spatially extensive data set to investigate how macroscale sampling strategy affects prediction of ecosystem characteristics in 6,784 lakes across a 1.8‐million‐km 2 area. We estimated model predictive performance for different subsets of the data set to mimic three common sampling strategies for collecting observations of ecosystem characteristics: random sampling design, stratified random sampling design, and targeted sampling. We found that sampling strategy influenced model predictive performance such that (1) stratified random sampling designs did not improve predictive performance compared to simple random sampling designs and (2) although one of the scenarios that mimicked targeted (non‐random) sampling had the poorest performing predictive models, the other targeted sampling scenarios resulted in models with similar predictive performance to that of the random sampling scenarios. Our results suggest that although potential biases in data sets from some forms of targeted sampling may limit predictive performance, compiling existing spatially extensive data sets can result in models with good predictive performance that may inform a wide range of science questions and policy goals related to global change.
DA - 2020/4/27/
PY - 2020/4/27/
DO - 10.1002/eap.2123
VL - 30
IS - 6
J2 - Ecol Appl
LA - en
OP -
SN - 1051-0761 1939-5582
UR - http://dx.doi.org/10.1002/eap.2123
DB - Crossref
KW - data-intensive ecology
KW - ecological context
KW - extrapolation
KW - interpolation
KW - lakes
KW - macroscale
KW - monitoring
KW - prediction
KW - sampling
KW - sampling design
ER -
TY - JOUR
TI - On the spatial and temporal shift in the archetypal seasonal temperature cycle as driven by annual and semi‐annual harmonics
AU - North, Joshua S.
AU - Schliep, Erin M.
AU - Wikle, Christopher K.
T2 - Environmetrics
AB - Abstract Statistical methods are required to evaluate and quantify the uncertainty in environmental processes, such as land and sea surface temperature, in a changing climate. Typically, annual harmonics are used to characterize the variation in the seasonal temperature cycle. However, an often overlooked feature of the climate seasonal cycle is the semi‐annual harmonic, which can account for a significant portion of the variance of the seasonal cycle and varies in amplitude and phase across space. Together, the spatial variation in the annual and semi‐annual harmonics can play an important role in driving processes that are tied to seasonality (e.g., ecological and agricultural processes). We propose a multivariate spatiotemporal model to quantify the spatial and temporal change in minimum and maximum temperature seasonal cycles as a function of the annual and semi‐annual harmonics. Our approach captures spatial dependence, temporal dynamics, and multivariate dependence of these harmonics through spatially and temporally varying coefficients. We apply the model to minimum and maximum temperature over North American for the years 1979–2018. Formal model inference within the Bayesian paradigm enables the identification of regions experiencing significant changes in minimum and maximum temperature seasonal cycles due to the relative effects of changes in the two harmonics.
DA - 2020/12/28/
PY - 2020/12/28/
DO - 10.1002/env.2665
VL - 32
IS - 6
J2 - Environmetrics
LA - en
OP -
SN - 1180-4009 1099-095X
UR - http://dx.doi.org/10.1002/env.2665
DB - Crossref
KW - dynamic system modeling
KW - North American temperature cycle
KW - predictive process
KW - spatial synchrony
KW - spatiotemporal statistics
ER -
TY - JOUR
TI - Data fusion model for speciated nitrogen to identify environmental drivers and improve estimation of nitrogen in lakes
AU - Schliep, Erin M.
AU - Collins, Sarah M.
AU - Rojas-Salazar, Shirley
AU - Lottig, Noah R.
AU - Stanley, Emily H.
T2 - The Annals of Applied Statistics
AB - Concentrations of nitrogen provide a critical metric for understanding ecosystem function and water quality in lakes. However, varying approaches for quantifying nitrogen concentrations may bias the comparison of water quality across lakes and regions. Different measurements of total nitrogen exist based on its composition (e.g., organic versus inorganic, dissolved versus particulate), which we refer to as nitrogen species. Fortunately, measurements of multiple nitrogen species are often collected and can, therefore, be leveraged together to inform our understanding of the controls on total nitrogen in lakes. We develop a multivariate hierarchical statistical model that fuses speciated nitrogen measurements, obtained across multiple methods of reporting, in order to improve our estimates of total nitrogen. The model accounts for lower detection limits and measurement error that vary across lake, species and observation. By modeling speciated nitrogen, as opposed to previous efforts that mostly consider only total nitrogen, we obtain more resolved inference with regard to differences in sources of nitrogen and their relationship with complex environmental drivers. We illustrate the inferential benefits of our model using speciated nitrogen data from the LAke GeOSpatial and temporal database (LAGOS).
DA - 2020/12/1/
PY - 2020/12/1/
DO - 10.1214/20-aoas1371
VL - 14
IS - 4
J2 - Ann. Appl. Stat.
OP -
SN - 1932-6157
UR - http://dx.doi.org/10.1214/20-aoas1371
DB - Crossref
KW - Bayesian hierarchical model
KW - detection limits
KW - LAGOS
KW - multivariate
KW - Markov chain Monte Carlo
ER -
TY - JOUR
TI - Statistical data integration in survey sampling: a review
AU - Yang, Shu
AU - Kim, Jae Kwang
T2 - JAPANESE JOURNAL OF STATISTICS AND DATA SCIENCE
AB - Finite population inference is a central goal in survey sampling. Probability sampling is the main statistical approach to finite population inference. Challenges arise due to high cost and increasing non-response rates. Data integration provides a timely solution by leveraging multiple data sources to provide more robust and efficient inference than using any single data source alone. The technique for data integration varies depending on types of samples and available information to be combined. This article provides a systematic review of data integration techniques for combining probability samples, probability and non-probability samples, and probability and big data samples. We discuss a wide range of integration methods such as generalized least squares, calibration weighting, inverse probability weighting, mass imputation, and doubly robust methods. Finally, we highlight important questions for future research.
DA - 2020/12//
PY - 2020/12//
DO - 10.1007/s42081-020-00093-w
VL - 3
IS - 2
SP - 625-650
SN - 2520-8764
KW - Generalizability
KW - Meta-analysis
KW - Missing at random
KW - Transportability
ER -
TY - JOUR
TI - Water quality performance of a permeable pavement and stormwater harvesting treatment train stormwater control measure
AU - Winston, Ryan J.
AU - Arend, Kristi
AU - Dorsey, Jay D.
AU - Hunt, William F.
T2 - BLUE-GREEN SYSTEMS
AB - Abstract Stormwater runoff from urban development causes undesired impacts to surface waters, including discharge of pollutants, erosion, and loss of habitat. A treatment train consisting of permeable interlocking concrete pavement and underground stormwater harvesting was monitored to quantify water quality improvements. The permeable pavement provided primary treatment and the cistern contributed to final polishing of total suspended solids (TSS) and turbidity concentrations (>96%) and loads (99.5% for TSS). Because of this, >40% reduction of sediment-bound nutrient forms and total nitrogen was observed. Nitrate reduction (>70%) appeared to be related to an anaerobic zone in water stored in the scarified soil beneath the permeable pavement, allowing denitrification to occur. Sequestration of copper, lead, and zinc occurred during the first 5 months of monitoring, with leaching observed during the second half of the monitoring period. This was potentially caused by a decrease in pH within the cistern or residual chloride from deicing salt causing de-sorption of metals from accumulated sediment. Pollutant loading followed the same trends as pollutant concentrations, with load reduction improved vis-à-vis concentrations because of the 27% runoff reduction provided by the treatment train. This study has shown that permeable pavement can serve as an effective pretreatment for stormwater harvesting schemes.
DA - 2020/1/1/
PY - 2020/1/1/
DO - 10.2166/bgs.2020.914
VL - 2
IS - 1
SP - 91-111
SN - 2617-4782
KW - green infrastructure
KW - pervious pavement
KW - porous pavement
KW - rainwater harvesting
KW - series
KW - WSUD
ER -
TY - CONF
TI - A New Framework for Online Testing of Heterogeneous Treatment Effect
AU - Yu, M.
AU - Lu, W.
AU - Song, R.
T2 - Thirty-Fourth AAAI Conference on Artificial Intelligence
AB - We propose a new framework for online testing of heterogeneous treatment effects. The proposed test, named sequential score test (SST), is able to control type I error under continuous monitoring and detect multi-dimensional heterogeneous treatment effects. We provide an online p-value calculation for SST, making it convenient for continuous monitoring, and extend our tests to online multiple testing settings by controlling the false discovery rate. We examine the empirical performance of the proposed tests and compare them with a state-of-art online test, named mSPRT using simulations and a real data. The results show that our proposed test controls type I error at any time, has higher detection power and allows quick inference on online A/B testing.
C2 - 2020///
C3 - Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence
CY - New York Hilton Midtown, New York, New York, USA
DA - 2020///
PY - 2020/2/7/
DO - 10.1609/aaai.v34i06.6594
VL - 34
SP - 10310-10317
M1 - 6
PB - AAAI Press
ER -
TY - JOUR
TI - Differences in proteome response to cold acclimation in Zoysia japonica cultivars with different levels of freeze tolerance
AU - Brown, Jessica M.
AU - Yu, Xingwang
AU - Holloway, H. McCamy P.
AU - DaCosta, Michelle
AU - Bernstein, Rachael P.
AU - Lu, Jefferson
AU - Tuong, Tan D.
AU - Patton, Aaron J.
AU - Dunne, Jeffrey C.
AU - Arellano, Consuelo
AU - Livingston, David P.
AU - Milla-Lewis, Susana R.
T2 - CROP SCIENCE
AB - Abstract Zoysiagrasses ( Zoysia spp.) are warm‐season turfgrasses primarily grown in the southern and transition zones of the United States. An understanding of the physiological and proteomic changes that zoysiagrasses undergo during cold acclimation may shed light on phenotypic traits and proteins useful in selection of freeze‐tolerant genotypes. We investigated the relationship between cold acclimation, protein expression, and freeze tolerance in cold acclimated (CA) and nonacclimated (NA) plants of Zoysia japonica Steud. cultivars Meyer (freeze‐tolerant) and Victoria (freeze‐susceptible). Meristematic tissues from the grass crowns were harvested for proteomic analysis. Freeze testing indicated that cold acclimation accounted for a 1.9‐fold increase in plant survival than nonacclimation treatment. Overall, proteomic analysis identified 62 protein spots differentially accumulated in abundance under cold acclimation. Nine and 22 unique protein spots were identified for Meyer and Victoria, respectively, with increased abundance or decreased abundance. In addition, 23 shared protein spots were found among the two cultivars in response to cold acclimation. Function classification revealed that these proteins were involved primarily in transcription, signal transduction and stress defense, carbohydrate and energy metabolism, and protein and amino acid metabolism. Several proteins of interest for their association with cold acclimation were identified. Further investigation of these proteins and their functional categories may contribute to increase our understanding of the differences in freezing tolerance among zoysiagrass germplasm.
DA - 2020///
PY - 2020///
DO - 10.1002/csc2.20225
VL - 60
IS - 5
SP - 2744-2756
SN - 1435-0653
ER -
TY - JOUR
TI - Provable Convex Co-clustering of Tensors
AU - Chi, Eric C.
AU - Gaines, Brian J.
AU - Sun, Will Wei
AU - Zhou, Hua
AU - Yang, Jian
T2 - Journal of Machine Learning Research
DA - 2020///
PY - 2020///
VL - 21
IS - 214
SP - 1-58
UR - http://jmlr.org/papers/v21/18-155.html
ER -
TY - JOUR
TI - Estimating Average Treatment Effects Utilizing Fractional Imputation when Confounders are Subject to Missingness
AU - Corder, Nathan
AU - Yang, Shu
T2 - JOURNAL OF CAUSAL INFERENCE
AB - Abstract The problem of missingness in observational data is ubiquitous. When the confounders are missing at random, multiple imputation is commonly used; however, the method requires congeniality conditions for valid inferences, which may not be satisfied when estimating average causal treatment effects. Alternatively, fractional imputation, proposed by Kim 2011, has been implemented to handling missing values in regression context. In this article, we develop fractional imputation methods for estimating the average treatment effects with confounders missing at random. We show that the fractional imputation estimator of the average treatment effect is asymptotically normal, which permits a consistent variance estimate. Via simulation study, we compare fractional imputation’s accuracy and precision with that of multiple imputation.
DA - 2020/1//
PY - 2020/1//
DO - 10.1515/jci-2019-0024
VL - 8
IS - 1
SP - 249-271
SN - 2193-3685
KW - Missing Data
KW - Fractional Imputation
KW - Multiple Imputation
ER -
TY - JOUR
TI - Novel Imaging Modalities Shedding Light on Plant Biology: Start Small and Grow Big
AU - Clark, Natalie M.
AU - Broeck, Lisa
AU - Guichard, Marjorie
AU - Stager, Adam
AU - Tanner, Herbert G.
AU - Blilou, Ikram
AU - Grossmann, Guido
AU - Iyer-Pascuzzi, Anjali S.
AU - Maizel, Alexis
AU - Sparks, Erin E.
AU - Sozzani, Rosangela
T2 - ANNUAL REVIEW OF PLANT BIOLOGY, VOL 71, 2020
AB - The acquisition of quantitative information on plant development across a range of temporal and spatial scales is essential to understand the mechanisms of plant growth. Recent years have shown the emergence of imaging methodologies that enable the capture and analysis of plant growth, from the dynamics of molecules within cells to the measurement of morphometricand physiological traits in field-grown plants. In some instances, these imaging methods can be parallelized across multiple samples to increase throughput. When high throughput is combined with high temporal and spatial resolution, the resulting image-derived data sets could be combined with molecular large-scale data sets to enable unprecedented systems-level computational modeling. Such image-driven functional genomics studies may be expected to appear at an accelerating rate in the near future given the early success of the foundational efforts reviewed here. We present new imaging modalities and review how they have enabled a better understanding of plant growth from the microscopic to the macroscopic scale.
DA - 2020///
PY - 2020///
DO - 10.1146/annurev-arplant-050718-100038
VL - 71
SP - 789-816
SN - 1545-2123
KW - Forster resonance energy transfer
KW - scanning fluorescent correlation spectroscopy
KW - microfluid devices
KW - light sheet microscopy
KW - imaging of macroscopic traits
KW - multiscale imaging techniques
ER -
TY - JOUR
TI - INTEGRATIVE STATISTICAL METHODS FOR EXPOSURE MIXTURES AND HEALTH
AU - Reich, Brian J.
AU - Guan, Yawen
AU - Fourches, Denis
AU - Warren, Joshua L.
AU - Sarnat, Stefanie E.
AU - Chang, Howard H.
T2 - ANNALS OF APPLIED STATISTICS
AB - Humans are concurrently exposed to chemically, structurally and toxicologically diverse chemicals. A critical challenge for environmental epidemiology is to quantify the risk of adverse health outcomes resulting from exposures to such chemical mixtures and to identify which mixture constituents may be driving etiologic associations. A variety of statistical methods have been proposed to address these critical research questions. However, they generally rely solely on measured exposure and health data available within a specific study. Advancements in understanding of the role of mixtures on human health impacts may be better achieved through the utilization of external data and knowledge from multiple disciplines with innovative statistical tools. In this paper we develop new methods for health analyses that incorporate auxiliary information about the chemicals in a mixture, such as physicochemical, structural and/or toxicological data. We expect that the constituents identified using auxiliary information will be more biologically meaningful than those identified by methods that solely utilize observed correlations between measured exposure. We develop flexible Bayesian models by specifying prior distributions for the exposures and their effects that include auxiliary information and examine this idea over a spectrum of analyses from regression to factor analysis. The methods are applied to study the effects of volatile organic compounds on emergency room visits in Atlanta. We find that including cheminformatic information about the exposure variables improves prediction and provides a more interpretable model for emergency room visits for respiratory diseases.
DA - 2020/12//
PY - 2020/12//
DO - 10.1214/20-AOAS1364
VL - 14
IS - 4
SP - 1945-1963
SN - 1941-7330
KW - Cheminformatics
KW - collinearity
KW - factor analysis
KW - principal components
KW - stochastic search
KW - variable selection
ER -
TY - JOUR
TI - Uniform convergence of penalized splines
AU - Xiao, Luo
AU - Nan, Zhe
T2 - STAT
AB - Penalized splines are popular for nonparametric regression. We establish the minimax rate optimality of penalized splines for uniform convergence, thus improving the existing rate in the literature. The result is applicable to several types of penalized splines that are commonly used and holds under mild conditions on the design points.
DA - 2020///
PY - 2020///
DO - 10.1002/sta4.297
VL - 9
IS - 1
SP -
SN - 2049-1573
KW - nonparametric regression
KW - penalized splines
KW - rate optimality
KW - uniform convergence
ER -
TY - JOUR
TI - Fast covariance estimation for multivariate sparse functional data
AU - Li, Cai
AU - Xiao, Luo
AU - Luo, Sheng
T2 - STAT
AB - Covariance estimation is essential yet underdeveloped for analyzing multivariate functional data. We propose a fast covariance estimation method for multivariate sparse functional data using bivariate penalized splines. The tensor-product B-spline formulation of the proposed method enables a simple spectral decomposition of the associated covariance operator and explicit expressions of the resulting eigenfunctions as linear combinations of B-spline bases, thereby dramatically facilitating subsequent principal component analysis. We derive a fast algorithm for selecting the smoothing parameters in covariance smoothing using leave-one-subject-out cross-validation. The method is evaluated with extensive numerical studies and applied to an Alzheimer's disease study with multiple longitudinal outcomes.
DA - 2020///
PY - 2020///
DO - 10.1002/sta4.245
VL - 9
IS - 1
SP -
SN - 2049-1573
KW - bivariate smoothing
KW - covariance function
KW - functional principal component analysis
KW - longitudinal data
KW - multivariate functional data
KW - prediction
ER -
TY - JOUR
TI - Field Assessment of the Hydrologic Mitigation Performance of Three Aging Bioretention Cells
AU - Johnson, Jeffrey P.
AU - Hunt, William F.
T2 - JOURNAL OF SUSTAINABLE WATER IN THE BUILT ENVIRONMENT
AB - Increasing imperviousness has driven regulation and design philosophies to offset consequent increases in runoff volumes and peak flows. Previous research has shown bioretention to reduce runoff volumes and peak flows. Since most research has focused on newly constructed systems, the long-term performance of bioretention has been questioned. Because bioretention is a biologically based practice, changes over time could impact hydrologic performance. This research examined and compared the hydrologic mitigation performance of three bioretention cells (BRCs) in central North Carolina with postconstruction ages ranging from 8 to 17 years old. Observed runoff volumes were significantly reduced at each of the three cells by 90%, 81%, and 64%. The volume discharge ratio for each cell was at or below low impact development (LID) target thresholds (0.33) for 63%, 67%, and 48% of observed storm events. Similar to volume reduction, all three BRCs significantly reduced peak flows. Peak discharge ratios at each site were less than the LID target threshold (0.33) for over 75% of observed storm events, and the interquartile range of peak discharge ratios was less than the LID target threshold for all observed storm events <25.4 mm. All three BRCs struggled to mitigate volumes and peak flows for large storm events (>50 mm). As the frequency and magnitude of larger events increases, guidance recommending additional surface storage should be considered. When compared to the hydrologic performance of “young” BRCs (less than 3 years old), “old” BRCs (at least 3 years old) perform at least as well with respect to peak flow mitigation while appearing to reduce runoff volumes better than newly constructed BRCs. That the three BRCs presented herein ranged from 8 to 17 years old during their respective monitoring periods while significantly reducing peak flows and runoff volumes (while meeting LID target thresholds) supports the prediction of long-term hydrologic mitigation of bioretention.
DA - 2020/11//
PY - 2020/11//
DO - 10.1061/JSWBAY.0000925
VL - 6
IS - 4
SP -
SN - 2379-6111
ER -
TY - SOUND
TI - Monte Carlo Methods in Practice
AU - Ghosh, Sujit
DA - 2020/7/17/
PY - 2020/7/17/
ER -
TY - SOUND
TI - A Glimpse of Monte Carlo Methods
AU - Ghosh, Sujit
DA - 2020/9/29/
PY - 2020/9/29/
UR - https://youtu.be/9Rvb3X3V8bc)
ER -
TY - SOUND
TI - A Gambler's Journey through Monte Carlo
AU - Ghosh, Sujit
DA - 2020/11/5/
PY - 2020/11/5/
ER -
TY - CONF
TI - On Empirical Estimation of Mode Based on Weakly Dependent Samples
AU - Ghosh, Sujit
T2 - International Conference on Statistics for Twenty-First Century
C2 - 2020/12/18/
DA - 2020/12/18/
PY - 2020/12/18/
PB - University of Kerala
ER -
TY - JOUR
TI - Rapid Hazard Characterization of Environmental Chemicals Using a Compendium of Human Cell Lines from Different Organs
AU - Chen, Zunwei
AU - Liu, Yizhong
AU - Wright, Fred A.
AU - Chiu, Weihsueh A.
AU - Rusyn, Ivan
T2 - ALTEX-ALTERNATIVES TO ANIMAL EXPERIMENTATION
AB - The lack of adequate toxicity data for the vast majority of chemicals in the environment has spurred the development of new approach methodologies (NAMs). This study aimed to develop a practical high-throughput in vitro model for rapidly evaluating potential hazards of chemicals using a small number of human cells. Forty-two compounds were tested using human induced pluripotent stem cell (iPSC)-derived cells (hepatocytes, neurons, cardiomyocytes and endothelial cells), and a primary endothelial cell line. Both functional and cytotoxicity endpoints were evaluated using high-content imaging. Concentration-response was used to derive points-of-departure (POD). PODs were integrated with ToxPi and used as surrogate NAM-based PODs for risk characterization. We found chemical class-specific similarity among the chemicals tested; metal salts exhibited the highest overall bioactivity. We also observed cell type-specific patterns among classes of chemicals, indicating the ability of the proposed in vitro model to recognize effects on different cell types. Compared to available NAM datasets, such as ToxCast/Tox21 and chemical structure-based descriptors, we found that the data from the five-cell-type model was as good or even better in assigning compounds to chemical classes. Additionally, the PODs from this model performed well as a conservative surrogate for regulatory in vivo PODs and were less likely to underestimate in vivo potency and potential risk compared to other NAM-based PODs. In summary, we demonstrate the potential of this in vitro screening model to inform rapid risk-based decision-making through ranking, clustering, and assessment of both hazard and risks of diverse environmental chemicals.
DA - 2020///
PY - 2020///
DO - 10.14573/altex.2002291
VL - 37
IS - 4
SP - 623-638
SN - 1868-8551
ER -
TY - CONF
TI - A Non-Iterative Quantile Change Detection Method in Mixture Model with Heavy-Tailed Components
AU - Li, Yuantong
AU - Ma, Qi
AU - Ghosh, Sujit K.
T2 - KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
AB - Estimating parameters of mixture model has wide applications ranging from classification problems to estimating of complex distributions. Most of the current literature on estimating the parameters of the mixture densities are based on iterative Expectation Maximization (EM) type algorithms which require the use of either taking expectations over the latent label variables or generating samples from the conditional distribution of such latent labels using the Bayes rule. Moreover, when the number of components is unknown, the problem becomes computationally more demanding due to well-known label switching issues [28]. In this paper, we propose a robust and quick approach based on change-point methods to determine the number of mixture components that works for almost any location-scale families even when the components are heavy tailed (e.g., Cauchy). We present several numerical illustrations by comparing our method with some of popular methods available in the literature using simulated data and real case studies. The proposed method is shown be as much as 500 times faster than some of the competing methods and are also shown to be more accurate in estimating the mixture distributions by goodness-of-fit tests.
C2 - 2020/7/6/
C3 - Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
DA - 2020/7/6/
DO - 10.1145/3394486.3403240
PB - ACM
SN - 9781450379984
UR - http://dx.doi.org/10.1145/3394486.3403240
DB - Crossref
KW - mixture model
KW - heavy-tailed distribution
KW - Cauchy distribution
KW - stock data
ER -
TY - JOUR
TI - Joint modeling of longitudinal continuous, longitudinal ordinal, and time-to-event outcomes
AU - Alam, Khurshid
AU - MAITY, ARNAB
AU - Sinha, Sanjoy K.
AU - Rizopoulos, Dimitris
AU - Sattar, Abdus
T2 - LIFETIME DATA ANALYSIS
DA - 2020///
PY - 2020///
DO - 10.1007/s10985-020-09511-3
KW - Joint models
KW - Association parameters
KW - Frailty model
KW - Linear mixed model
KW - Proportional odds model
ER -
TY - JOUR
TI - Bayesian Regression Using a Prior on the Model Fit: The R2-D2 Shrinkage Prior
AU - Zhang, Yan Dora
AU - Naughton, Brian P.
AU - Bondell, Howard D.
AU - Reich, Brian
T2 - JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
AB - Prior distributions for high-dimensional linear regression require specifying a joint distribution for the unobserved regression coefficients, which is inherently difficult. We instead propose a new class of shrinkage priors for linear regression via specifying a prior first on the model fit, in particular, the coefficient of determination, and then distributing through to the coefficients in a novel way. The proposed method compares favorably to previous approaches in terms of both concentration around the origin and tail behavior, which leads to improved performance both in posterior contraction and in empirical performance. The limiting behavior of the proposed prior is 1/x , both around the origin and in the tails. This behavior is optimal in the sense that it simultaneously lies on the boundary of being an improper prior both in the tails and around the origin. None of the existing shrinkage priors obtain this behavior in both regions simultaneously. We also demonstrate that our proposed prior leads to the same near-minimax posterior contraction rate as the spike-and-slab prior. Supplementary materials for this article are available online.
DA - 2020///
PY - 2020///
DO - 10.1080/01621459.2020.1825449
KW - Beta-prime distribution
KW - Coefficient of determination
KW - Global-local shrinkage
KW - High-dimensional regression
ER -
TY - JOUR
TI - Independent increments in group sequential tests: a review
AU - Kim, Kyung Mann
AU - Tsiatis, Anastasios A.
T2 - SORT-STATISTICS AND OPERATIONS RESEARCH TRANSACTIONS
DA - 2020///
PY - 2020///
DO - 10.2436/20.8080.02.101
VL - 44
IS - 2
SP - 223-264
SN - 2013-8830
KW - Failure time data
KW - interim analysis
KW - longitudinal data
KW - clinical trials
KW - repeated significance tests
KW - sequential methods
ER -
TY - JOUR
TI - Parameter Estimation for Multi-state Coherent Series and Parallel Systems with Positively Quadrant Dependent Models
AU - Kulkarni, Leena
AU - Sabnis, Sanjeev
AU - Ghosh, Sujit K.
T2 - SANKHYA-SERIES A-MATHEMATICAL STATISTICS AND PROBABILITY
DA - 2020///
PY - 2020///
DO - 10.1007/s13171-020-00217-0
KW - Multi-state series system
KW - Generalized method of moments
KW - Maximum likelihood estimation
KW - Positively quadrant dependent
KW - Farlie-Gumbel-Morgenstern distribution
ER -
TY - JOUR
TI - Statistical Downscaling with Spatial Misalignment: Application to Wildland Fire PM2.5 Concentration Forecasting
AU - Majumder, Suman
AU - Guan, Yawen
AU - Reich, Brian
AU - O'Neill, Susan
AU - Rappold, Ana G.
T2 - JOURNAL OF AGRICULTURAL BIOLOGICAL AND ENVIRONMENTAL STATISTICS
AB - Fine particulate matter, PM2.5, has been documented to have adverse health effects and wildland fires are a major contributor to PM2.5 air pollution in the US. Forecasters use numerical models to predict PM2.5 concentrations to warn the public of impending health risk. Statistical methods are needed to calibrate the numerical model forecast using monitor data to reduce bias and quantify uncertainty. Typical model calibration techniques do not allow for errors due to misalignment of geographic locations. We propose a spatiotemporal downscaling methodology that uses image registration techniques to identify the spatial misalignment and accounts for and corrects the bias produced by such warping. Our model is fitted in a Bayesian framework to provide uncertainty quantification of the misalignment and other sources of error. We apply this method to different simulated data sets and show enhanced performance of the method in presence of spatial misalignment. Finally, we apply the method to a large fire in Washington state and show that the proposed method provides more realistic uncertainty quantification than standard methods.
DA - 2020///
PY - 2020///
DO - 10.1007/s13253-020-00420-4
KW - Image registration
KW - Public health
KW - Smoothing
KW - Warping
ER -
TY - JOUR
TI - BAM1/2 receptor kinase signaling drives CLE peptide-mediated formative cell divisions in Arabidopsis roots
AU - Crook, Ashley D.
AU - Willoughby, Andrew C.
AU - Hazak, Ora
AU - Okuda, Satohiro
AU - VanDerMolen, Kylie R.
AU - Soyars, Cara L.
AU - Cattaneo, Pietro
AU - Clark, Natalie M.
AU - Sozzani, Rosangela
AU - Hothorn, Michael
AU - Hardtke, Christian S.
AU - Nimchuk, Zachary L.
T2 - PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
AB - Cell division is often regulated by extracellular signaling networks to ensure correct patterning during development. In Arabidopsis, the SHORT-ROOT (SHR)/SCARECROW (SCR) transcription factor dimer activates CYCLIND6;1 (CYCD6;1) to drive formative divisions during root ground tissue development. Here, we show plasma-membrane-localized BARELY ANY MERISTEM1/2 (BAM1/2) family receptor kinases are required for SHR-dependent formative divisions and CYCD6;1 expression, but not SHR-dependent ground tissue specification. Root-enriched CLE ligands bind the BAM1 extracellular domain and are necessary and sufficient to activate SHR-mediated divisions and CYCD6;1 expression. Correspondingly, BAM-CLE signaling contributes to the restriction of formative divisions to the distal root region. Additionally, genetic analysis reveals that BAM-CLE and SHR converge to regulate additional cell divisions outside of the ground tissues. Our work identifies an extracellular signaling pathway regulating formative root divisions and provides a framework to explore this pathway in patterning and evolution.
DA - 2020/12/22/
PY - 2020/12/22/
DO - 10.1073/pnas.2018565117
VL - 117
IS - 51
SP - 32750-32756
SN - 0027-8424
KW - Arabidopsis
KW - receptor kinase
KW - cell cycle
KW - SHORT-ROOT
KW - CLE peptide
ER -
TY - JOUR
TI - Systematic Comparisons for Composition Profiles, Taxonomic Levels, and Machine Learning Methods for Microbiome-Based Disease Prediction
AU - Song, Kuncheng
AU - Wright, Fred A.
AU - Zhou, Yi-Hui
T2 - FRONTIERS IN MOLECULAR BIOSCIENCES
AB - Microbiome composition profiles generated from 16S rRNA sequencing have been extensively studied for their usefulness in phenotype trait prediction, including for complex diseases such as diabetes and obesity. These microbiome compositions have typically been quantified in the form of Operational Taxonomic Unit (OTU) count matrices. However, alternate approaches such as Amplicon Sequence Variants (ASV) have been used, as well as the direct use of k-mer sequence counts. The overall effect of these different types of predictors when used in concert with various machine learning methods has been difficult to assess, due to varied combinations described in the literature. Here we provide an in-depth investigation of more than 1,000 combinations of these three clustering/counting methods, in combination with varied choices for normalization and filtering, grouping at various taxonomic levels, and the use of more than ten commonly used machine learning methods for phenotype prediction. The use of short k-mers, which have computational advantages and conceptual simplicity, is shown to be effective as a source for microbiome-based prediction. Among machine-learning approaches, tree-based methods show consistent, though modest, advantages in prediction accuracy. We describe the various advantages and disadvantages of combinations in analysis approaches, and provide general observations to serve as a useful guide for future trait-prediction explorations using microbiome data.
DA - 2020/12/16/
PY - 2020/12/16/
DO - 10.3389/fmolb.2020.610845
VL - 7
SP -
SN - 2296-889X
KW - phenotype prediction
KW - machine learning method
KW - k-mers
KW - operational taxonomic unit (OTU)
KW - amplicon sequence variant (ASV)
KW - phylogenetic analysis
ER -
TY - JOUR
TI - EMPIRICAL BAYES ORACLE UNCERTAINTY QUANTIFICATION FOR REGRESSION
AU - Belitser, Eduard
AU - Ghosal, Subhashis
T2 - ANNALS OF STATISTICS
AB - We propose an empirical Bayes method for high-dimensional linear regression models. Following an oracle approach that quantifies the error locally for each possible value of the parameter, we show that an empirical Bayes posterior contracts at the optimal rate at all parameters and leads to uniform size-optimal credible balls with guaranteed coverage under an “excessive bias restriction” condition. This condition gives rise to a new slicing of the entire space that is suitable for ensuring uniformity in uncertainty quantification. The obtained results immediately lead to optimal contraction and coverage properties for many conceivable classes simultaneously. The results are also extended to high-dimensional additive nonparametric regression models.
DA - 2020/12//
PY - 2020/12//
DO - 10.1214/19-AOS1845
VL - 48
IS - 6
SP - 3113-3137
SN - 0090-5364
KW - Credible ball
KW - coverage
KW - empirical Bayes
KW - excessive bias restriction
KW - oracle rate
ER -
TY - JOUR
TI - Quantitative Trait Loci Associated with Gray Leaf Spot Resistance in St. Augustinegrass
AU - Yu, Xingwang
AU - Mulkey, Steve E.
AU - Zuleta, Maria C.
AU - Arellano, Consuelo
AU - Ma, Bangya
AU - Milla-Lewis, Susana R.
T2 - PLANT DISEASE
AB - Gray leaf spot (GLS), caused by Magnaporthe grisea, is a major fungal disease of St. Augustinegrass (Stenotaphrum secundatum), causing widespread blighting of the foliage under warm, humid conditions. To identify quantitative trait loci (QTL) controlling GLS resistance, an F 1 mapping population consisting of 153 hybrids was developed from crosses between cultivar Raleigh (susceptible parent) and plant introduction PI 410353 (resistant parent). Single-nucleotide polymorphism (SNP) markers generated from genotyping-by-sequencing constituted nine linkage groups for each parental linkage map. The Raleigh map consisted of 2,257 SNP markers and spanned 916.63 centimorgans (cM), while the PI 410353 map comprised 511 SNP markers and covered 804.27 cM. GLS resistance was evaluated under controlled environmental conditions with measurements of final disease incidence and lesion length. Additionally, two derived traits, area under the disease progress curve and area under the lesion expansion curve, were calculated for QTL analysis. Twenty QTL were identified as being associated with these GLS resistance traits, which explained 7.6 to 37.2% of the total phenotypic variation. Three potential GLS QTL “hotspots” were identified on two linkage groups: P2 (106.26 to 110.36 cM and 113.15 to 116.67 cM) and P5 (17.74 to 19.28 cM). The two major effect QTL glsp2.3 and glsp5.2 together reduced 20.2% of disease incidence in this study. Sequence analysis showed that two candidate genes encoding β-1,3-glucanases were found in the intervals of two QTL, which might function in GLS resistance response. These QTL and linked markers can be potentially used to assist the transfer of GLS resistance genes to elite St. Augustinegrass breeding lines.
DA - 2020/11//
PY - 2020/11//
DO - 10.1094/PDIS-04-20-0905-RE
VL - 104
IS - 11
SP - 2799-2806
SN - 1943-7692
KW - gray leaf spot
KW - Magnaporthe grisea
KW - quantitative trait loci
KW - St. Augustinegrass
ER -
TY - JOUR
TI - Goodness-of-fit test for skew normality based on energy statistics
AU - Opperman, Logan
AU - Ning, Wei
T2 - RANDOM OPERATORS AND STOCHASTIC EQUATIONS
AB - Abstract In this paper, we propose a goodness-of-fit test based on the energy statistic for skew normality. Simulations indicate that the Type-I error of the proposed test can be controlled reasonably well for given nominal levels. Power comparisons to other existing methods under different settings show the advantage of the proposed test. Such a test is applied to two real data sets to illustrate the testing procedure.
DA - 2020/9//
PY - 2020/9//
DO - 10.1515/rose-2020-2042
VL - 28
IS - 3
SP - 227-236
SN - 1569-397X
KW - Goodness-of-fit test
KW - energy statistic
KW - skew normal distribution
KW - skew normality
ER -
TY - PCOMM
TI - Questioning Existing Cancer Hazard Evaluation Standards in the Name of Statistics
AU - Rusyn, Ivan
AU - Chiu, Weihsueh A.
AU - Wright, Fred A.
DA - 2020/10//
PY - 2020/10//
DO - 10.1093/toxsci/kfaa077
SP - 521-522
ER -
TY - JOUR
TI - Integrative Analysis of Gene-Specific DNA Methylation and Untargeted Metabolomics Data from the ELEMENT Cohort
AU - Goodrich, Jaclyn M
AU - Hector, Emily C
AU - Tang, Lu
AU - LaBarre, Jennifer L
AU - Dolinoy, Dana C
AU - Mercado-Garcia, Adriana
AU - Cantoral, Alejandra
AU - Song, Peter XK
AU - Téllez-Rojo, Martha Maria
AU - Peterson, Karen E
T2 - Epigenetics Insights
AB - Epigenetic modifications, such as DNA methylation, influence gene expression and cardiometabolic phenotypes that are manifest in developmental periods in later life, including adolescence. Untargeted metabolomics analysis provide a comprehensive snapshot of physiological processes and metabolism and have been related to DNA methylation in adults, offering insights into the regulatory networks that influence cellular processes. We analyzed the cross-sectional correlation of blood leukocyte DNA methylation with 3758 serum metabolite features (574 of which are identifiable) in 238 children (ages 8-14 years) from the Early Life Exposures in Mexico to Environmental Toxicants (ELEMENT) study. Associations between these features and percent DNA methylation in adolescent blood leukocytes at LINE-1 repetitive elements and genes that regulate early life growth (IGF2, H19, HSD11B2) were assessed by mixed effects models, adjusting for sex, age, and puberty status. After false discovery rate correction (FDR q < 0.05), 76 metabolites were significantly associated with LINE-1 DNA methylation, 27 with HSD11B2, 103 with H19, and 4 with IGF2. The ten identifiable metabolites included dicarboxylic fatty acids (five associated with LINE-1 or H19 methylation at q < 0.05) and 1-octadecanoyl-rac-glycerol (q < 0.0001 for association with H19 and q = 0.04 for association with LINE-1). We then assessed the association between these ten known metabolites and adiposity 3 years later. Two metabolites, dicarboxylic fatty acid 17:3 and 5-oxo-7-octenoic acid, were inversely associated with measures of adiposity (P < .05) assessed approximately 3 years later in adolescence. In stratified analyses, sex-specific and puberty-stage specific (Tanner stage = 2 to 5 vs Tanner stage = 1) associations were observed. Most notably, hundreds of statistically significant associations were observed between H19 and LINE-1 DNA methylation and metabolites among children who had initiated puberty. Understanding relationships between subclinical molecular biomarkers (DNA methylation and metabolites) may increase our understanding of genes and biological pathways contributing to metabolic changes that underlie the development of adiposity during adolescence.
DA - 2020/1//
PY - 2020/1//
DO - 10.1177/2516865720977888
UR - https://doi.org/10.1177/2516865720977888
KW - Metabolic programming
KW - epigenetics
KW - DNA methylation
KW - IGF2
KW - H19
KW - HSD11B2
KW - LINE-1
KW - adolescence
KW - biomarkers
KW - adiposity
KW - children's health
ER -
TY - JOUR
TI - Multiway Graph Signal Processing on Tensors: Integrative Analysis of Irregular Geometries
AU - Stanley, Jay S., III
AU - Chi, Eric C.
AU - Mishne, Gal
T2 - IEEE SIGNAL PROCESSING MAGAZINE
AB - Graph signal processing (GSP) is an important methodology for studying data residing on irregular structures. As acquired data is increasingly taking the form of multi-way tensors, new signal processing tools are needed to maximally utilize the multi-way structure within the data. In this paper, we review modern signal processing frameworks generalizing GSP to multi-way data, starting from graph signals coupled to familiar regular axes such as time in sensor networks, and then extending to general graphs across all tensor modes. This widely applicable paradigm motivates reformulating and improving upon classical problems and approaches to creatively address the challenges in tensor-based data. We synthesize common themes arising from current efforts to combine GSP with tensor analysis and highlight future directions in extending GSP to the multi-way paradigm.
DA - 2020/11//
PY - 2020/11//
DO - 10.1109/MSP.2020.3013555
VL - 37
IS - 6
SP - 160-173
SN - 1558-0792
KW - Tensors
KW - Signal processing
KW - Two dimensional displays
KW - Geometry
KW - Discrete Fourier transforms
KW - Graphical models
KW - Laplace equations
ER -
TY - JOUR
TI - High-Dimensional Precision Medicine From Patient-Derived Xenografts
AU - Rashid, Naim U.
AU - Luckett, Daniel J.
AU - Chen, Jingxiang
AU - Lawson, Michael T.
AU - Wang, Longshaokan
AU - Zhang, Yunshu
AU - Laber, Eric B.
AU - Liu, Yufeng
AU - Yeh, Jen Jen
AU - Zeng, Donglin
AU - Kosorok, Michael R.
T2 - JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
AB - The complexity of human cancer often results in significant heterogeneity in response to treatment. Precision medicine offers the potential to improve patient outcomes by leveraging this heterogeneity. Individualized treatment rules (ITRs) formalize precision medicine as maps from the patient covariate space into the space of allowable treatments. The optimal ITR is that which maximizes the mean of a clinical outcome in a population of interest. Patient-derived xenograft (PDX) studies permit the evaluation of multiple treatments within a single tumor, and thus are ideally suited for estimating optimal ITRs. PDX data are characterized by correlated outcomes, a high-dimensional feature space, and a large number of treatments. Here we explore machine learning methods for estimating optimal ITRs from PDX data. We analyze data from a large PDX study to identify biomarkers that are informative for developing personalized treatment recommendations in multiple cancers. We estimate optimal ITRs using regression-based (Q-learning) and direct-search methods (outcome weighted learning). Finally, we implement a superlearner approach to combine multiple estimated ITRs and show that the resulting ITR performs better than any of the input ITRs, mitigating uncertainty regarding user choice. Our results indicate that PDX data are a valuable resource for developing individualized treatment strategies in oncology. Supplementary materials for this article are available online.
DA - 2020/11/5/
PY - 2020/11/5/
DO - 10.1080/01621459.2020.1828091
VL - 116
IS - 535
SP - 1140-1154
SN - 1537-274X
KW - Biomarkers
KW - Deep learning autoencoders
KW - Machine learning
KW - Outcome weighted learning
KW - Precision medicine
KW - Q-learning
ER -
TY - JOUR
TI - Assembly of whole-chromosome pseudomolecules for polyploid plant genomes using outbred mapping populations
AU - Zhou, Chenxi
AU - Olukolu, Bode
AU - Gemenet, Dorcus C.
AU - Wu, Shan
AU - Gruneberg, Wolfgang
AU - Cao, Minh Duc
AU - Fei, Zhangjun
AU - Zeng, Zhao-Bang
AU - George, Andrew W.
AU - Khan, Awais
AU - Yencho, G. Craig
AU - Coin, Lachlan J. M.
T2 - NATURE GENETICS
DA - 2020/11//
PY - 2020/11//
DO - 10.1038/s41588-020-00717-7
VL - 52
IS - 11
SP - 1256-+
SN - 1546-1718
ER -
TY - JOUR
TI - LEPTOSPIRA, PARVOVIRUS, AND TOXOPLASMA IN THE NORTH AMERICAN RIVER OTTER (LONTRA CANADENSIS) IN NORTH CAROLINA, USA
AU - Sanders, Charles W., II
AU - Olfenbuttel, Colleen
AU - Pacifici, Krishna
AU - Hess, George R.
AU - Livingston, Robert S.
AU - DePerno, Christopher S.
T2 - JOURNAL OF WILDLIFE DISEASES
AB - The North American river otter (Lontra canadensis) is the largest mustelid in North Carolina, US, and was once extirpated from the central and western portions of the state. Over time and after a successful reintroduction project, otters are now abundant and occur throughout North Carolina. However, there is a concern that diseases may have an impact on the otter population, as well as on other aquatic mammals, either through exposure to emerging diseases, contact with domestic animals such as domestic cats (Felis catus), or less robust condition of individuals through declines in water quality. We tested brain and kidney tissue from harvested otters for the pathogens that cause leptospirosis, parvovirus, and toxoplasmosis. Leptospirosis and toxoplasmosis are priority zoonoses and are maintained by domestic and wild mammals. Although parvovirus is not zoonotic, it does affect pets, causing mild to fatal symptoms. Across the 2014–15 and 2015–16 trapping seasons, we tested 220 otters (76 females, 144 males) using real-time PCR for Leptospira interrogans, parvovirus, and Toxoplasma gondii. Of the otters tested, 1% (3/220) were positive for L. interrogans, 19% (41/220) were positive for parvovirus, and 24% (53/220) were positive for T. gondii. Although the pathogens for parvovirus and toxoplasmosis are relatively common in North Carolina otters, the otter harvest has remained steady and the population appears to be abundant and self-sustaining. Therefore, parvovirus and toxoplasmosis do not currently appear to be negatively impacting the population. However, subsequent research should examine transmission parameters between domestic and wild species and the sublethal effects of infection.
DA - 2020/10//
PY - 2020/10//
DO - 10.7589/2019-05-129
VL - 56
IS - 4
SP - 791-802
SN - 1943-3700
KW - Disease
KW - leptospirosis
KW - Lontra canadensis
KW - North Carolina
KW - otter
KW - parvovirus
KW - toxoplasmosis
ER -
TY - JOUR
TI - Exploring the Limits of Combined Image/'omics Analysis for Non-cancer Histological Phenotypes
AU - Gallins, Paul
AU - Saghapour, Ehsan
AU - Zhou, Yi-Hui
T2 - FRONTIERS IN GENETICS
AB - The last several years have witnessed an explosion of methods and applications for combining image data with 'omics data, and for prediction of clinical phenotypes. Much of this research has focused on cancer histology, for which genetic perturbations are large, and the signal to noise ratio is high. Related research on chronic, complex diseases is limited by tissue sample availability, lower genomic signal strength, and the less extreme and tissue-specific nature of intermediate histological phenotypes. Data from the GTEx Consortium provides a unique opportunity to investigate the connection among phenotypic histological variation, imaging data, and 'omics profiling, from multiple tissue-specific phenotypes at the sub-clinical level. Investigating histological designations in multiple tissues, we survey the evidence for genomic association and prediction of histology, and use the results to test the limits of prediction accuracy using machine learning methods applied to the imaging data, genomics data, and their combination. We find that expression data has similar or superior accuracy for pathology prediction as our use of imaging data. A variety of machine learning methods have similar performance, while network embedding methods offer at best limited improvements. These observations hold across a range of tissues and predictor types. The results are supportive of the use of genomic measurements in the same target tissue in which pathological phenotyping has been performed, which to our knowledge is a novel finding. Even while prediction accuracy remains a challenge, the results show clear evidence of pathway and tissue-specific biology.
DA - 2020/10/23/
PY - 2020/10/23/
DO - 10.3389/fgene.2020.555886
VL - 11
SP -
SN - 1664-8021
KW - imaging
KW - genomics
KW - pathology
KW - prediction
KW - integration
KW - histology
KW - machine learning
KW - embedding
ER -
TY - JOUR
TI - The GTEx Consortium atlas of genetic regulatory effects across human tissues
AU - Aguet, Francois
AU - Barbeira, Alvaro N.
AU - Bonazzola, Rodrigo
AU - Brown, Andrew
AU - Castel, Stephane E.
AU - Jo, Brian
AU - Kasela, Silva
AU - Kim-Hellmuth, Sarah
AU - Liang, Yanyu
AU - Parsana, Princy
AU - Flynn, Elise
AU - Fresard, Laure
AU - Gamazon, Eric R.
AU - Hamel, Andrew R.
AU - He, Yuan
AU - Hormozdiari, Farhad
AU - Mohammadi, Pejman
AU - Munoz-Aguirre, Manuel
AU - Ardlie, Kristin G.
AU - Battle, Alexis
AU - Bonazzola, Rodrigo
AU - Brown, Christopher D.
AU - Cox, Nancy
AU - Dermitzakis, Emmanouil T.
AU - Engelhardt, Barbara E.
AU - Garrido-Martin, Diego
AU - Gay, Nicole R.
AU - Getz, Gad
AU - Guigo, Roderic
AU - Hamel, Andrew R.
AU - Handsaker, Robert E.
AU - He, Yuan
AU - Hoffman, Paul J.
AU - Hormozdiari, Farhad
AU - Im, Hae Kyung
AU - Jo, Brian
AU - Kasela, Silva
AU - Kashin, Seva
AU - Kim-Hellmuth, Sarah
AU - Kwong, Alan
AU - Lappalainen, Tuuli
AU - Li, Xiao
AU - Liang, Yanyu
AU - MacArthur, Daniel G.
AU - Mohammadi, Pejman
AU - Montgomery, Stephen B.
AU - Munoz-Aguirre, Manuel
AU - Rouhana, John M.
AU - Hormozdiari, Farhad
AU - Im, Hae Kyung
AU - Kim-Hellmuth, Sarah
AU - Ardlie, Kristin G.
AU - Getz, Gad
AU - Guigo, Roderic
AU - Im, Hae Kyung
AU - Lappalainen, Tuuli
AU - Montgomery, Stephen B.
AU - Im, Hae Kyung
AU - Lappalainen, Tuuli
AU - Lappalainen, Tuuli
AU - Anand, Shankara
AU - Gabriel, Stacey
AU - Getz, Gad
AU - Graubert, Aaron
AU - Hadley, Kane
AU - Handsaker, Robert E.
AU - Huang, Katherine H.
AU - Kashin, Seva
AU - Li, Xiao
AU - MacArthur, Daniel G.
AU - Meier, Samuel R.
AU - Nedzel, Jared L.
AU - Balliu, Brunilda
AU - Conrad, Don
AU - Cotter, Daniel J.
AU - Das, Sayantan
AU - Goede, Olivia M.
AU - Eskin, Eleazar
AU - Eulalio, Tiffany Y.
AU - Ferraro, Nicole M.
AU - Garrido-Martin, Diego
AU - Gay, Nicole R.
AU - Getz, Gad
AU - Graubert, Aaron
AU - Guigo, Roderic
AU - Hadley, Kane
AU - Hamel, Andrew R.
AU - Handsaker, Robert E.
AU - He, Yuan
AU - Hoffman, Paul J.
AU - Hormozdiari, Farhad
AU - Hou, Lei
AU - Huang, Katherine H.
AU - Im, Hae Kyung
AU - Jo, Brian
AU - Kasela, Silva
AU - Kashin, Seva
AU - Kellis, Manolis
AU - Kim-Hellmuth, Sarah
AU - Kwong, Alan
AU - Lappalainen, Tuuli
AU - Li, Xiao
AU - Li, Xin
AU - Liang, Yanyu
AU - MacArthur, Daniel G.
AU - Mangul, Serghei
AU - Meier, Samuel R.
AU - Mohammadi, Pejman
AU - Montgomery, Stephen B.
AU - Munoz-Aguirre, Manuel
AU - Nachun, Daniel C.
AU - Nedzel, Jared L.
AU - Nguyen, Duyen Y.
AU - Nobel, Andrew B.
AU - Park, YoSon
AU - Reverter, Ferran
AU - Sabatti, Chiara
AU - Saha, Ashis
AU - Segre, Ayellet V
AU - Stephens, Matthew
AU - Strober, Benjamin J.
AU - Teran, Nicole A.
AU - Todres, Ellen
AU - Vinuela, Ana
AU - Wang, Gao
AU - Wen, Xiaoquan
AU - Wright, Fred
AU - Wucher, Valentin
AU - Zou, Yuxin
AU - Ferreira, Pedro G.
AU - Li, Gen
AU - Mele, Marta
AU - Yeger-Lotem, Esti
AU - Barcus, Mary E.
AU - Bradbury, Debra
AU - Krubit, Tanya
AU - McLean, Jeffrey A.
AU - Qi, Liqun
AU - Robinson, Karna
AU - Roche, Nancy V
AU - Smith, Anna M.
AU - Tabor, David E.
AU - Undale, Anita
AU - Bridge, Jason
AU - Brigham, Lori E.
AU - Foster, Barbara A.
AU - Gillard, Bryan M.
AU - Hasz, Richard
AU - Hunter, Marcus
AU - Johns, Christopher
AU - Johnson, Mark
AU - Karasik, Ellen
AU - Kopen, Gene
AU - Leinweber, William F.
AU - McDonald, Alisa
AU - Moser, Michael T.
AU - Myer, Kevin
AU - Ramsey, Kimberley D.
AU - Roe, Brian
AU - Shad, Saboor
AU - Thomas, Jeffrey A.
AU - Walters, Gary
AU - Washington, Michael
AU - Wheeler, Joseph
AU - Jewell, Scott D.
AU - Rohrer, Daniel C.
AU - Valley, Dana R.
AU - Davis, David A.
AU - Mash, Deborah C.
AU - Branton, Philip A.
AU - Sobin, Leslie
AU - Barker, Laura K.
AU - Gardiner, Heather M.
AU - Mosavel, Maghboeba
AU - Siminoff, Laura A.
AU - Flicek, Paul
AU - Haeussler, Maximilian
AU - Juettemann, Thomas
AU - Kent, W. James
AU - Lee, Christopher M.
AU - Powell, Conner C.
AU - Rosenbloom, Kate R.
AU - Ruffier, Magali
AU - Sheppard, Dan
AU - Taylor, Kieron
AU - Trevanion, Stephen J.
AU - Zerbino, Daniel R.
AU - Abell, Nathan S.
AU - Akey, Joshua
AU - Chen, Lin
AU - Demanelis, Kathryn
AU - Doherty, Jennifer A.
AU - Feinberg, Andrew P.
AU - Hansen, Kasper D.
AU - Hickey, Peter F.
AU - Hou, Lei
AU - Jasmine, Farzana
AU - Jiang, Lihua
AU - Kaul, Rajinder
AU - Kellis, Manolis
AU - Kibriya, Muhammad G.
AU - Li, Jin Billy
AU - Li, Qin
AU - Lin, Shin
AU - Linder, Sandra E.
AU - Montgomery, Stephen B.
AU - Oliva, Meritxell
AU - Park, Yongjin
AU - Pierce, Brandon L.
AU - Rizzardi, Lindsay F.
AU - Skol, Andrew D.
AU - Smith, Kevin S.
AU - Snyder, Michael
AU - Stamatoyannopoulos, John
AU - Tang, Hua
AU - Wang, Meng
AU - Carithers, Latarsha J.
AU - Guan, Ping
AU - Koester, Susan E.
AU - Little, A. Roger
AU - Moore, Helen M.
AU - Nierras, Concepcion R.
AU - Rao, Abhi K.
AU - Vaught, Jimmie B.
AU - Volpi, Simona
T2 - SCIENCE
AB - The Genotype-Tissue Expression (GTEx) project was established to characterize genetic effects on the transcriptome across human tissues and to link these regulatory mechanisms to trait and disease associations. Here, we present analyses of the version 8 data, examining 15,201 RNA-sequencing samples from 49 tissues of 838 postmortem donors. We comprehensively characterize genetic associations for gene expression and splicing in cis and trans, showing that regulatory associations are found for almost all genes, and describe the underlying molecular mechanisms and their contribution to allelic heterogeneity and pleiotropy of complex traits. Leveraging the large diversity of tissues, we provide insights into the tissue specificity of genetic effects and show that cell type composition is a key factor in understanding gene regulatory mechanisms in human tissues.
DA - 2020/9/11/
PY - 2020/9/11/
DO - 10.1126/science.aaz1776
VL - 369
IS - 6509
SP - 1318-1330
SN - 1095-9203
ER -
TY - JOUR
TI - Classification of estrogenic compounds by coupling high content analysis and machine learning algorithms
AU - Mukherjee, Rajib
AU - Beykal, Burcu
AU - Szafran, Adam T.
AU - Onel, Melis
AU - Stossi, Fabio
AU - Mancini, Maureen G.
AU - Lloyd, Dillon
AU - Wright, Fred A.
AU - Zhou, Lan
AU - Mancini, Michael A.
AU - Pistikopoulos, Efstratios N.
T2 - PLOS COMPUTATIONAL BIOLOGY
AB - Environmental toxicants affect human health in various ways. Of the thousands of chemicals present in the environment, those with adverse effects on the endocrine system are referred to as endocrine-disrupting chemicals (EDCs). Here, we focused on a subclass of EDCs that impacts the estrogen receptor (ER), a pivotal transcriptional regulator in health and disease. Estrogenic activity of compounds can be measured by many in vitro or cell-based high throughput assays that record various endpoints from large pools of cells, and increasingly at the single-cell level. To simultaneously capture multiple mechanistic ER endpoints in individual cells that are affected by EDCs, we previously developed a sensitive high throughput/high content imaging assay that is based upon a stable cell line harboring a visible multicopy ER responsive transcription unit and expressing a green fluorescent protein (GFP) fusion of ER. High content analysis generates voluminous multiplex data comprised of minable features that describe numerous mechanistic endpoints. In this study, we present a machine learning pipeline for rapid, accurate, and sensitive assessment of the endocrine-disrupting potential of benchmark chemicals based on data generated from high content analysis. The multidimensional imaging data was used to train a classification model to ultimately predict the impact of unknown compounds on the ER, either as agonists or antagonists. To this end, both linear logistic regression and nonlinear Random Forest classifiers were benchmarked and evaluated for predicting the estrogenic activity of unknown compounds. Furthermore, through feature selection, data visualization, and model discrimination, the most informative features were identified for the classification of ER agonists/antagonists. The results of this data-driven study showed that highly accurate and generalized classification models with a minimum number of features can be constructed without loss of generality, where these machine learning models serve as a means for rapid mechanistic/phenotypic evaluation of the estrogenic potential of many chemicals.
DA - 2020/9//
PY - 2020/9//
DO - 10.1371/journal.pcbi.1008191
VL - 16
IS - 9
SP -
SN - 1553-7358
ER -
TY - JOUR
TI - MODELING AND ESTIMATION OF CONTAGION-BASED SOCIAL NETWORK DEPENDENCE WITH TIME-TO-EVENT DATA
AU - Yu, Lin
AU - Lu, Wenbin
AU - Huang, Danyang
T2 - STATISTICA SINICA
DA - 2020/10//
PY - 2020/10//
DO - 10.5705/ss.202018.0222
VL - 30
IS - 4
SP - 2051-2074
SN - 1996-8507
KW - Contagion-based social correlation
KW - generalized linear transformation model
KW - nonparametric maximum likelihood estimation
KW - social network
KW - time-to-event data
ER -
TY - JOUR
TI - SPARSE BAYESIAN ADDITIVE NONPARAMETRIC REGRESSION WITH APPLICATION TO HEALTH EFFECTS OF PESTICIDES MIXTURES
AU - Wei, Ran
AU - Reich, Brian J.
AU - Hoppin, Jane A.
AU - Ghosal, Subhashis
T2 - STATISTICA SINICA
DA - 2020/1//
PY - 2020/1//
DO - 10.5705/ss.202017.0315
VL - 30
IS - 1
SP - 55-79
SN - 1996-8507
KW - Additive nonparametric regression
KW - Bayesian variable selection
KW - continuous shrinkage prior
KW - environmental epidemiology
KW - posterior consistency
ER -
TY - JOUR
TI - OPTIMAL EMG PLACEMENT FOR A ROBOTIC PROSTHESIS CONTROLLER WITH SEQUENTIAL, ADAPTIVE FUNCTIONAL ESTIMATION (SAFE)
AU - Stallrich, Jonathan
AU - Islam, Md Nazmul
AU - Staicu, Ana-Maria
AU - Crouch, Dustin
AU - Pan, Lizhi
AU - Huang, He
T2 - ANNALS OF APPLIED STATISTICS
AB - Robotic hand prostheses require a controller to decode muscle contraction information, such as electromyogram (EMG) signals, into the user’s desired hand movement. State-of-the-art decoders demand extensive training, require data from a large number of EMG sensors and are prone to poor predictions. Biomechanical models of a single movement degree-of-freedom tell us that relatively few muscles, and, hence, fewer EMG sensors are needed to predict movement. We propose a novel decoder based on a dynamic, functional linear model with velocity or acceleration as its response and the recent past EMG signals as functional covariates. The effect of each EMG signal varies with the recent position to account for biomechanical features of hand movement, increasing the predictive capability of a single EMG signal compared to existing decoders. The effects are estimated with a multistage, adaptive estimation procedure that we call Sequential Adaptive Functional Estimation (SAFE). Starting with 16 potential EMG sensors, our method correctly identifies the few EMG signals that are known to be important for an able-bodied subject. Furthermore, the estimated effects are interpretable and can significantly improve understanding and development of robotic hand prostheses.
DA - 2020/9//
PY - 2020/9//
DO - 10.1214/20-AOAS1324
VL - 14
IS - 3
SP - 1164-1181
SN - 1932-6157
KW - Electromyography signal
KW - varying functional regression
KW - functional variable selection
KW - adaptive group LASSO
KW - correlated functional predictors
KW - sequential adaptive functional estimation
ER -
TY - JOUR
TI - Synonymous Site-to-Site Substitution Rate Variation Dramatically Inflates False Positive Rates of Selection Analyses: Ignore at Your Own Peril
AU - Wisotsky, Sadie R.
AU - Pond, Sergei L. Kosakovsky
AU - Shank, Stephen D.
AU - Muse, Spencer V
T2 - MOLECULAR BIOLOGY AND EVOLUTION
AB - Abstract Most molecular evolutionary studies of natural selection maintain the decades-old assumption that synonymous substitution rate variation (SRV) across sites within genes occurs at levels that are either nonexistent or negligible. However, numerous studies challenge this assumption from a biological perspective and show that SRV is comparable in magnitude to that of nonsynonymous substitution rate variation. We evaluated the impact of this assumption on methods for inferring selection at the molecular level by incorporating SRV into an existing method (BUSTED) for detecting signatures of episodic diversifying selection in genes. Using simulated data we found that failing to account for even moderate levels of SRV in selection testing is likely to produce intolerably high false positive rates. To evaluate the effect of the SRV assumption on actual inferences we compared results of tests with and without the assumption in an empirical analysis of over 13,000 Euteleostomi (bony vertebrate) gene alignments from the Selectome database. This exercise reveals that close to 50% of positive results (i.e., evidence for selection) in empirical analyses disappear when SRV is modeled as part of the statistical analysis and are thus candidates for being false positives. The results from this work add to a growing literature establishing that tests of selection are much more sensitive to certain model assumptions than previously believed.
DA - 2020/8//
PY - 2020/8//
DO - 10.1093/molbev/msaa037
VL - 37
IS - 8
SP - 2430-2439
SN - 1537-1719
KW - evolutionary model
KW - synonymous rate variation
KW - codon model
KW - episodic selection
ER -
TY - JOUR
TI - Transitioning Machine Learning from Theory to Practice in Natural Resources Management
AU - Saia, Sheila M.
AU - Nelson, Natalie
AU - Huseth, Anders S.
AU - Grieger, Khara
AU - Reich, Brian J.
T2 - ECOLOGICAL MODELLING
DA - 2020/11/1/
PY - 2020/11/1/
DO - 10.1016/j.ecolmodel.2020.109257
VL - 435
SP -
SN - 1872-7026
KW - Machine learning
KW - Natural resources management
KW - Stakeholders
KW - Decision-support tools
KW - Decision-making
KW - Process-based modeling
ER -
TY - JOUR
TI - On empirical estimation of mode based on weakly dependent samples
AU - Liu, Bowen
AU - Ghosh, Sujit K.
T2 - COMPUTATIONAL STATISTICS & DATA ANALYSIS
AB - Given a large sample of observations from an unknown univariate continuous distribution, it is often of interest to empirically estimate the global mode of the underlying density. Applications include samples obtained by Monte Carlo methods with independent observations, or Markov Chain Monte Carlo methods with weakly dependent samples from the underlying stationary density. In either case, often the generating density is not available in closed form and only empirical determination of the mode is possible. Assuming that the generating density has a unique global mode, a non-parametric estimate of the density is proposed based on a sequence of mixtures of Beta densities which allows for the estimation of the mode even when the mode is possibly located on the boundary of the support of the density. Furthermore, the estimated mode is shown to be strongly universally consistent under a set of mild regularity conditions. The proposed method is compared with other empirical estimates of the mode based on popular kernel density estimates. Numerical results based on extensive simulation studies show benefits of the proposed methods in terms of empirical bias, standard errors and computation time. An R package implementing the method is also made available online.
DA - 2020/12//
PY - 2020/12//
DO - 10.1016/j.csda.2020.107046
VL - 152
SP -
SN - 1872-7352
UR - https://doi.org/10.1016/j.csda.2020.107046
KW - Bernstein polynomials
KW - Empirical mode estimator
KW - Strong consistency
KW - Anderson-Darling test
ER -
TY - JOUR
TI - Stickiness of rental rate and housing vacancy rate
AU - Wang, Haoyu
T2 - ECONOMICS LETTERS
AB - We study the stickiness in house rent by examining rent under two settings: rent is determined solely by landlord and rent is set by Nash bargaining between landlord and tenant. Our results show that, under Nash bargaining, the vacancy rate is able to restrain the growth of rents and hence generates the stickiness feature in rents.
DA - 2020/10//
PY - 2020/10//
DO - 10.1016/j.econlet.2020.109487
VL - 195
SP -
SN - 1873-7374
KW - Sticky rent
KW - Nash bargaining
KW - Business cycle
ER -
TY - JOUR
TI - Evaluation of a Stepped-Care eHealth HIV Prevention Program for Diverse Adolescent Men Who Have Sex With Men: Protocol for a Hybrid Type 1 Effectiveness Implementation Trial of SMART
AU - Mustanski, Brian
AU - Moskowitz, David A.
AU - Moran, Kevin O.
AU - Newcomb, Michael E.
AU - Macapagal, Kathryn
AU - Rodriguez-Diaz, Carlos
AU - Rendina, H. Jonathon
AU - Laber, Eric B.
AU - Li, Dennis H.
AU - Matson, Margaret
AU - Talan, Ali J.
AU - Cabral, Cynthia
T2 - JMIR RESEARCH PROTOCOLS
AB - Background Adolescent men who have sex with men (AMSM), aged 13 to 18 years, account for more than 80% of teen HIV occurrences. Despite this disproportionate burden, there is a conspicuous lack of evidence-based HIV prevention programs. Implementation issues are critical as traditional HIV prevention delivery channels (eg, community-based organizations, schools) have significant access limitations for AMSM. As such, eHealth interventions, such as our proposed SMART program, represent an excellent modality for delivering AMSM-specific intervention material where youth are. Objective This randomized trial aimed to test the effectiveness of the SMART program in reducing condom-less anal sex and increasing condom self-efficacy, condom use intentions, and HIV testing for AMSM. We also plan to test whether SMART has differential effectiveness across important subgroups of AMSM based on race and ethnicity, urban versus rural residence, age, socioeconomic status, and participation in an English versus a Spanish version of SMART. Methods Using a sequential multiple assignment randomized trial design, we will evaluate the impact of a stepped-care package of increasingly intensive eHealth interventions (ie, the universal, information-based SMART Sex Ed; the more intensive, selective SMART Squad; and a higher cost, indicated SMART Sessions). All intervention content is available in English and Spanish. Participants are recruited primarily from social media sources using paid and unpaid advertisements. Results The trial has enrolled 1285 AMSM aged 13 to 18 years, with a target enrollment of 1878. Recruitment concluded in June 2020. Participants were recruited from 49 US states as well as Puerto Rico and the District of Columbia. Assessments of intervention outcomes at 3, 6, 9, and 12 months are ongoing. Conclusions SMART is the first web-based program for AMSM to take a stepped-care approach to sexual education and HIV prevention. This design indicates that SMART delivers resources to all adolescents, but more costly treatments (eg, video chat counseling in SMART Sessions) are conserved for individuals who need them the most. SMART has the potential to reach AMSM to provide them with a sex-positive curriculum that empowers them with the information, motivation, and skills to make better health choices. Trial Registration ClinicalTrials.gov Identifier NCT03511131; https://clinicaltrials.gov/ct2/show/NCT03511131 International Registered Report Identifier (IRRID) DERR1-10.2196/19701
DA - 2020/8//
PY - 2020/8//
DO - 10.2196/19701
VL - 9
IS - 8
SP -
SN - 1929-0748
KW - HIV prevention
KW - eHealth
KW - adolescents
KW - men who have sex with men
KW - implementation science
KW - mobile phone
ER -
TY - RPRT
TI - Statistical data integration in survey sampling: a review
AU - Yang, S.
AU - Kim, J.K.
DA - 2020/1/9/
PY - 2020/1/9/
UR - https://arxiv.org/abs/2001.03259
ER -
TY - RPRT
TI - Double score matching estimators of average and quantile treatment effects
AU - Yang, S.
AU - Zhang, Y.
DA - 2020///
PY - 2020///
UR - https://arxiv.org/abs/2001.06049
ER -
TY - RPRT
TI - Estimating Average Treatment Effects Utilizing Fractional Imputation when Confounders are Subject to Missingness
AU - Corder, N.
AU - Yang, S.
DA - 2020///
PY - 2020///
UR - https://arxiv.org/pdf/1905.11497
ER -
TY - RPRT
TI - Integrative analysis of randomized clinicaltrials with real world evidence studies
AU - Dong, L.
AU - Yang, S.
AU - Wang, X.
AU - Zeng, D.
AU - Cai, J.W.
DA - 2020///
PY - 2020///
UR - https://arxiv.org/pdf/2003.01242
ER -
TY - CHAP
TI - Hierarchical continuous time hidden Markov model, with application in zero-inflated accelerometer data
AU - Xu, Z.
AU - Laber, E.B.
AU - Staicu, A.
T2 - Statistical Modeling for Biomedical Research: Contemporary Topics and Voices in the Field
A2 - Zhao, Y.
A2 - Chen, D.G
T3 - Emerging Topics of Statistics and Biostatistics Book Series
AB - Wearable devices including accelerometers are increasingly being used to collect high-frequency human activity data in situ. There is tremendous potential to use such data to inform medical decision making and public health policies. However, modeling such data is challenging as they are high-dimensional, heterogeneous, and subject to informative missingness, e.g., zero readings when the device is removed by the participant. We propose a flexible and extensible continuous-time hidden Markov model to extract meaningful activity patterns from human accelerometer data. To facilitate estimation with massive data we derive an efficient learning algorithm that exploits the hierarchical structure of the parameters indexing the proposed model. We also propose a bootstrap procedure for interval estimation. The proposed methods are illustrated using data from the 2003 - 2004 and 2005 - 2006 National Health and Nutrition Examination Survey.
PY - 2020///
DO - 10.1007/978-3-030-33416-1_7
SP - 125-142
PB - Springer
SN - 978-3-030-33416-1
ER -
TY - JOUR
TI - Statistical Inference for Online Decision Making: In a Contextual Bandit Setting
AU - Chen, Haoyu
AU - Lu, Wenbin
AU - Song, Rui
T2 - JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
AB - Online decision-making problem requires us to make a sequence of decisions based on incremental information. Common solutions often need to learn a reward model of different actions given the contextual information and then maximize the long-term reward. It is meaningful to know if the posited model is reasonable and how the model performs in the asymptotic sense. We study this problem under the setup of the contextual bandit framework with a linear reward model. The ε-greedy policy is adopted to address the classic exploration-and-exploitation dilemma. Using the martingale central limit theorem, we show that the online ordinary least squares estimator of model parameters is asymptotically normal. When the linear model is misspecified, we propose the online weighted least squares estimator using the inverse propensity score weighting and also establish its asymptotic normality. Based on the properties of the parameter estimators, we further show that the in-sample inverse propensity weighted value estimator is asymptotically normal. We illustrate our results using simulations and an application to a news article recommendation dataset from Yahoo!.
DA - 2020///
PY - 2020///
DO - 10.1080/01621459.2020.1770098
VL - 7
SP - 1-16
UR - http://dx.doi.org/10.1080/01621459.2020.1770098
KW - Epsilon-greedy
KW - Inverse propensity weighted estimator
KW - Model misspecification
KW - Online decision making
KW - Statistical inference
ER -
TY - JOUR
TI - Sequencing depth and genotype quality: accuracy and breeding operation considerations for genomic selection applications in autopolyploid crops
AU - Gemenet, Dorcus C.
AU - Lindqvist-Kreuze, Hannele
AU - De Boeck, Bert
AU - da Silva Pereira, Guilherme
AU - Mollinari, Marcelo
AU - Zeng, Zhao-Bang
AU - Craig Yencho, G.
AU - Campos, Hugo
T2 - Theoretical and Applied Genetics
AB - Polypoid crop breeders can balance resources between density and sequencing depth, dosage information and fewer highly informative SNPs recommended, non-additive models and QTL advantages on prediction dependent on trait architecture. The autopolyploid nature of potato and sweetpotato ensures a wide range of meiotic configurations and linkage phases leading to complex gene-action and pose problems in genotype data quality and genomic selection analyses. We used a 315-progeny biparental F1 population of hexaploid sweetpotato and a diversity panel of 380 tetraploid potato, genotyped using different platforms to answer the following questions: (i) do polyploid crop breeders need to invest more for additional sequencing depth? (ii) how many markers are required to make selection decisions? (iii) does considering non-additive genetic effects improve predictive ability (PA)? (iv) does considering dosage or quantitative trait loci (QTL) offer significant improvement to PA? Our results show that only a small number of highly informative single nucleotide polymorphisms (SNPs; ≤ 1000) are adequate for prediction in the type of populations we analyzed. We also show that considering dosage information and models considering only additive effects had the best PA for most traits, while the comparative advantage of considering non-additive genetic effects and including known QTL in the predictive model depended on trait architecture. We conclude that genomic selection can help accelerate the rate of genetic gains in potato and sweetpotato. However, application of genomic selection should be considered as part of optimizing the entire breeding program. Additionally, since the predictions in the current study are based on single populations, further studies on the effects of haplotype structure and inheritance on PA should be studied in actual multi-generation breeding populations.
DA - 2020/9/2/
PY - 2020/9/2/
DO - 10.1007/s00122-020-03673-2
VL - 133
IS - 12
SP - 3345-3363
J2 - Theor Appl Genet
LA - en
OP -
SN - 0040-5752 1432-2242
UR - http://dx.doi.org/10.1007/s00122-020-03673-2
DB - Crossref
ER -
TY - JOUR
TI - Effect of bicyclopyrone herbicide on sweetpotato and Palmer amaranth (Amaranthus palmeri)
AU - Lindley, Jennifer J.
AU - Jennings, Katherine M.
AU - Monks, David W.
AU - Chaudhari, Sushila
AU - Schultheis, Jonathan R.
AU - Waldschmidt, Matthew
AU - Brownie, Cavell
T2 - WEED TECHNOLOGY
AB - Abstract Management options are needed to limit sweetpotato yield loss due to weeds. Greenhouse studies were conducted in 2018 in Greensboro, NC, and in the field from 2016 to 2018 in Clinton, NC, to evaluate the effect of bicyclopyrone on sweetpotato and Palmer amaranth (field only). In greenhouse studies, Covington and NC04-531 clones were treated with bicyclopyrone (0, 25, 50, 100, or 150 g ai ha −1 ) either preplant (PP; i.e., immediately before transplanting) or post-transplant (PT; i.e., on the same day after transplanting). Sweetpotato plant injury and stunting increased, and vine length and shoot dry weight decreased with increasing rate of bicyclopyrone regardless of clone or application timing. In field studies, Beauregard (2016) or Covington (2017 and 2018) sweetpotato clones were treated with bicyclopyrone at 50 g ha −1 PP, flumioxazin at 107 g ai ha −1 PP, bicyclopyrone at 50 or 100 g ha −1 PP followed by (fb) S -metolachlor at 800 g ai ha −1 PT, flumioxazin at 107 g ha −1 PP fb S -metolachlor at 800 g ha −1 PT, flumioxazin at 107 g ha −1 PP fb S -metolachlor at 800 g ha −1 PT fb bicyclopyrone at 50 g ha −1 PT-directed, and clomazone at 420 g ai ha −1 PP fb S -metolachlor at 800 g ha −1 PT. Bicyclopyrone PP at 100 g ha −1 fb S- metolachlor PT caused 33% or greater crop stunting and 44% or greater marketable yield reduction compared with the weed-free check in 2016 (Beauregard) and 2017 (Covington). Bicyclopyrone PP at 50 g ha −1 alone or fb S- metolachlor PT resulted in 12% or less injury and similar no. 1 and jumbo yields as the weed-free check in 2 of 3 yr. Injury to Covington from bicyclopyrone PT-directed was 4% or less at 4 or 5 wk after transplanting and marketable yield was similar to that of the weed-free check in 2017 and 2018.
DA - 2020/8//
PY - 2020/8//
DO - 10.1017/wet.2020.13
VL - 34
IS - 4
SP - 552-559
SN - 1550-2740
KW - Greenhouse
KW - weed control
KW - crop injury
KW - interference
ER -
TY - JOUR
TI - A general frequency domain method for assessing spatial covariance structures
AU - Van Hala, Matthew
AU - Bandyopadhyay, Soutir
AU - Lahiri, Soumendra N.
AU - Nordman, Daniel J.
T2 - BERNOULLI
AB - When examining dependence in spatial data, it can be helpful to formally assess spatial covariance structures that may not be parametrically specified or fully model-based. That is, one may wish to test for general features regarding spatial covariance without presupposing any particular, or potentially restrictive, assumptions about the joint data distribution. Current methods for testing spatial covariance are often intended for specialized inference scenarios, usually with spatial lattice data. We propose instead a general method for estimation and testing of spatial covariance structure, which is valid for a variety of inference problems (including nonparametric hypotheses) and applies to a large class of spatial sampling designs with irregular data locations. In this setting, spatial statistics have limiting distributions with complex standard errors depending on the intensity of spatial sampling, the distribution of sampling locations, and the process dependence. The proposed method has the advantage of providing valid inference in the frequency domain without estimation of such standard errors, which are often intractable, and without particular distributional assumptions about the data (e.g., Gaussianity). To illustrate, we develop the method for formally testing isotropy and separability in spatial covariance and consider confidence regions for spatial parameters in variogram model fitting. A broad result is also presented to justify the method for application to other potential problems and general scenarios with testing spatial covariance. The approach uses spatial test statistics, based on an extended version of empirical likelihood, having simple chi-square limits for calibrating tests. We demonstrate the proposed method through several numerical studies.
DA - 2020/11//
PY - 2020/11//
DO - 10.3150/19-BEJ1160
VL - 26
IS - 4
SP - 2463-2487
SN - 1573-9759
KW - confidence sets
KW - spatial periodogram
KW - spatial testing
KW - spectral moment conditions
KW - stochastic sampling
ER -
TY - JOUR
TI - Asymptotic properties of penalized splines for functional data
AU - Xiao, Luo
T2 - BERNOULLI
AB - Penalized spline methods are popular for functional data analysis but their asymptotic properties have not been established. We present a theoretic study of the $L_{2}$ and uniform convergence of penalized splines for estimating the mean and covariance functions of functional data under general settings. The established convergence rates for the mean function estimation are mini-max rate optimal and the rates for the covariance function estimation are comparable to those using other smoothing methods.
DA - 2020/11//
PY - 2020/11//
DO - 10.3150/20-BEJ1209
VL - 26
IS - 4
SP - 2847-2875
SN - 1573-9759
KW - L-2 convergence
KW - functional data analysis
KW - nonparametric regression
KW - penalized splines
KW - uniform convergence
ER -
TY - JOUR
TI - Mechanistic model of hormonal contraception
AU - Wright, A. Armean
AU - Fayad, Ghassan N.
AU - Selgrade, James F.
AU - Olufsen, Mette S.
T2 - PLOS COMPUTATIONAL BIOLOGY
AB - Contraceptive drugs intended for family planning are used by the majority of married or in-union women in almost all regions of the world. The two most prevalent types of hormones associated with contraception are synthetic estrogens and progestins. Hormonal based contraceptives contain a dose of a synthetic progesterone (progestin) or a combination of a progestin and a synthetic estrogen. In this study we use mathematical modeling to understand better how these contraceptive paradigms prevent ovulation, special focus is on understanding how changes in dose impact hormonal cycling. To explain this phenomenon, we added two autocrine mechanisms essential to achieve contraception within our previous menstrual cycle models. This new model predicts mean daily blood concentrations of key hormones during a contraceptive state achieved by administering progestins, synthetic estrogens, or a combined treatment. Model outputs are compared with data from two clinical trials: one for a progestin only treatment and one for a combined hormonal treatment. Results show that contraception can be achieved with synthetic estrogen, with progestin, and by combining the two hormones. An advantage of the combined treatment is that a contraceptive state can be obtained at a lower dose of each hormone. The model studied here is qualitative in nature, but can be coupled with a pharmacokinetic/pharamacodynamic (PKPD) model providing the ability to fit exogenous inputs to specific bioavailability and affinity. A model of this type may allow insight into a specific drug's effects, which has potential to be useful in the pre-clinical trial stage identifying the lowest dose required to achieve contraception.
DA - 2020/6//
PY - 2020/6//
DO - 10.1371/journal.pcbi.1007848
VL - 16
IS - 6
SP -
SN - 1553-7358
ER -
TY - JOUR
TI - Comparison of the Effectiveness of Online Homework With Handwritten Homework in Electrical and Computer Engineering Classes
AU - Trussell, H. Joel
AU - Gumpertz, Marcia L.
T2 - IEEE TRANSACTIONS ON EDUCATION
AB - Contribution: This article compares the predictive performance of the scores on WeBWorK homework (online) with those of standard handwritten homework. The comparison is done across six undergraduate electrical engineering classes where each of the nine instructors have used both homework modalities. Background: Online homework systems have been used for many years, but analysis of their effectiveness is mixed. Previous work has been limited to a small number of classes in a wide variety of disciplines. This article has a larger number of classes and instructors than previous studies. The classes cover many basic topic areas in electrical and computer engineering, so is directly applicable to the audience of these transactions. Research Question: What is the effect of online homework compared to traditional handwritten homework on the performance of the students on the final exams in selected ECE classes? Methodology: Mixed-effects analysis of variance models are used to determine the predictive ability of performance on homework of the two modalities on the performance on the final exams. The data are limited to classes where the instructors have taught the class using both modalities. These models incorporate the effect of modalities for each instructor and the effect of the modalities across all classes. Findings: The result is that there is no significant statistical difference in the two modalities to predict final exam scores. This indicates that the advantages of using the automated online system can be obtained with no detrimental effect on the students' learning.
DA - 2020/8//
PY - 2020/8//
DO - 10.1109/TE.2020.2971198
VL - 63
IS - 3
SP - 209-215
SN - 1557-9638
KW - Electronic mail
KW - Software
KW - Education
KW - Standards
KW - Electrical engineering
KW - Testing
KW - Programming
KW - Effectiveness
KW - handwritten homework
KW - online homework
KW - statistical analysis
KW - traditional homework
KW - WeBWorK
ER -
TY - JOUR
TI - Sequential Optimization in Locally Important Dimensions
AU - Winkel, Munir A.
AU - Stallrich, Jonathan W.
AU - Storlie, Curtis B.
AU - Reich, Brian
T2 - TECHNOMETRICS
AB - Optimizing an expensive, black-box function f(·) is challenging when its input space is high-dimensional. Sequential design frameworks first model f(·) with a surrogate function and then optimize an acquisition function to determine input settings to evaluate next. Optimization of both f(·) and the acquisition function benefit from effective dimension reduction. Global variable selection detects and removes input variables that do not affect f(·) across the input space. Further dimension reduction may be possible if we consider local variable selection around the current optimum estimate. We develop a sequential design algorithm called sequential optimization in locally important dimensions (SOLID) that incorporates global and local variable selection to optimize a continuous, differentiable function. SOLID performs local variable selection by comparing the surrogate’s predictions in a localized region around the estimated optimum with the p alternative predictions made by removing each input variable. The search space of the acquisition function is further restricted to focus only on the variables that are deemed locally active, leading to greater emphasis on refining the surrogate model in locally active dimensions. A simulation study across multiple test functions and an application to the Sarcos robot dataset show that SOLID outperforms conventional approaches. Supplementary materials for this article are available online.
DA - 2020///
PY - 2020///
DO - 10.1080/00401706.2020.1714738
KW - Augmented expected improvement
KW - Bayesian analysis
KW - Computer experiments
KW - Gaussian process
KW - Local importance
KW - Sequential design
ER -
TY - JOUR
TI - Ascertaining properties of weighting in the estimation of optimal treatment regimes under monotone missingness
AU - Dong, Lin
AU - Laber, Eric
AU - Goldberg, Yair
AU - Song, Rui
AU - Yang, Shu
T2 - STATISTICS IN MEDICINE
AB - Dynamic treatment regimes operationalize precision medicine as a sequence of decision rules, one per stage of clinical intervention, that map up‐to‐date patient information to a recommended intervention. An optimal treatment regime maximizes the mean utility when applied to the population of interest. Methods for estimating an optimal treatment regime assume the data to be fully observed, which rarely occurs in practice. A common approach is to first use multiple imputation and then pool the estimators across imputed datasets. However, this approach requires estimating the joint distribution of patient trajectories, which can be high‐dimensional, especially when there are multiple stages of intervention. We examine the application of inverse probability weighted estimating equations as an alternative to multiple imputation in the context of monotonic missingness. This approach applies to a broad class of estimators of an optimal treatment regime including both Q‐learning and a generalization of outcome weighted learning. We establish consistency under mild regularity conditions and demonstrate its advantages in finite samples using a series of simulation experiments and an application to a schizophrenia study.
DA - 2020/11/10/
PY - 2020/11/10/
DO - 10.1002/sim.8678
VL - 39
IS - 25
SP - 3503-3520
SN - 1097-0258
KW - augmented inverse probability weighting
KW - dynamic treatment regimes
KW - monotonic coarseness
KW - outcome weighted learning
KW - Q-learning
ER -
TY - JOUR
TI - A deep learning approach to identify smoke plumes in satellite imagery in near-real time for health risk communication
AU - Larsen, Alexandra
AU - Hanigan, Ivan
AU - Reich, Brian J.
AU - Qin, Yi
AU - Cope, Martin
AU - Morgan, Geoffrey
AU - Rappold, Ana G.
T2 - JOURNAL OF EXPOSURE SCIENCE AND ENVIRONMENTAL EPIDEMIOLOGY
AB - Wildland fire (wildfire; bushfire) pollution contributes to poor air quality, a risk factor for premature death. The frequency and intensity of wildfires are expected to increase; improved tools for estimating exposure to fire smoke are vital. New-generation satellite-based sensors produce high-resolution spectral images, providing real-time information of surface features during wildfire episodes. Because of the vast size of such data, new automated methods for processing information are required. We present a deep fully convolutional neural network (FCN) for predicting fire smoke in satellite imagery in near-real time (NRT). The FCN identifies fire smoke using output from operational smoke identification methods as training data, leveraging validated smoke products in a framework that can be operationalized in NRT. We demonstrate this for a fire episode in Australia; the algorithm is applicable to any geographic region. The algorithm has high classification accuracy (99.5% of pixels correctly classified on average) and precision (average intersection over union = 57.6%). The FCN algorithm has high potential as an exposure-assessment tool, capable of providing critical information to fire managers, health and environmental agencies, and the general public to prevent the health risks associated with exposure to hazardous smoke from wildland fires in NRT.
DA - 2020///
PY - 2020///
DO - 10.1038/s41370-020-0246-y
ER -
TY - JOUR
TI - Semiparametric estimation of the cure fraction in population-based cancer survival analysis
AU - Gu, Ennan
AU - Zhang, Jiajia
AU - Lu, Wenbin
AU - Wang, Lianming
AU - Felizzi, Federico
T2 - STATISTICS IN MEDICINE
AB - With rapid development in medical research, the treatment of diseases including cancer has progressed dramatically and those survivors may die from causes other than the one under study, especially among elderly patients. Motivated by the Surveillance, Epidemiology, and End Results (SEER) female breast cancer study, background mortality is incorporated into the mixture cure proportional hazards (MCPH) model to improve the cure fraction estimation in population‐based cancer studies. Here, that patients are “cured” is defined as when the mortality rate of the individuals in diseased group returns to the same level as that expected in the general population, where the population level mortality is presented by the mortality table of the United States. The semiparametric estimation method based on the EM algorithm for the MCPH model with background mortality (MCPH+BM) is further developed and validated via comprehensive simulation studies. Real data analysis shows that the proposed semiparametric MCPH+BM model may provide more accurate estimation in population‐level cancer study.
DA - 2020/11/20/
PY - 2020/11/20/
DO - 10.1002/sim.8693
VL - 39
IS - 26
SP - 3787-3805
SN - 1097-0258
KW - Breslow estimator
KW - EM algorithm
KW - mixture cure model
KW - perturbation
KW - population-based study
KW - semiparametric regression
ER -
TY - JOUR
TI - Estimating the drivers of species distributions with opportunistic data using mediation analysis
AU - Huberman, David B.
AU - Reich, Brian J.
AU - Pacifici, Krishna
AU - Collazo, Jaime A.
T2 - ECOSPHERE
AB - Abstract Ecological occupancy modeling has historically relied on high‐quality, low‐quantity designed‐survey data for estimation and prediction. In recent years, there has been a large increase in the amount of high‐quantity, unknown‐quality opportunistic data. This has motivated research on how best to combine these two data sources in order to optimize inference. Existing methods can be infeasible for large datasets or require opportunistic data to be located where designed‐survey data exist. These methods map species occupancies, motivating a need to properly evaluate covariate effects (e.g., land cover proportion) on their distributions. We describe a spatial estimation method for supplementarily including additional opportunistic data using mediation analysis concepts. The opportunistic data mediate the effect of the covariate on the designed‐survey data response, decomposing it into a direct and indirect effect. A component of the indirect effect can then be quickly estimated via regressing the mediator on the covariate, while the other components are estimated through a spatial occupancy model. The regression step allows for use of large quantities of opportunistic data that can be collected in locations with no designed‐survey data available. Simulation results suggest that the mediated method produces an improvement in relative MSE when the data are of reasonable quality. However, when the simulated opportunistic data are poorly correlated with the true spatial process, the standard, unmediated method is still preferable. A spatiotemporal extension of the method is also developed for analyzing the effect of deciduous forest land cover on red‐eyed vireo distribution in the southeastern United States and find that including the opportunistic data do not lead to a substantial improvement. Opportunistic data quality remains an important consideration when employing this method, as with other data integration methods.
DA - 2020/6//
PY - 2020/6//
DO - 10.1002/ecs2.3165
VL - 11
IS - 6
SP -
SN - 2150-8925
KW - mediation analysis
KW - occupancy modeling
KW - opportunistic data
KW - spatial statistics
ER -
TY - JOUR
TI - GRID: A VARIABLE SELECTION AND STRUCTURE DISCOVERY METHOD FOR HIGH DIMENSIONAL NONPARAMETRIC REGRESSION
AU - Giordano, Francesco
AU - Lahiri, Soumendra Nath
AU - Parrella, Maria Lucia
T2 - ANNALS OF STATISTICS
AB - We consider nonparametric regression in high dimensions where only a relatively small subset of a large number of variables are relevant and may have nonlinear effects on the response. We develop methods for variable selection, structure discovery and estimation of the true low-dimensional regression function, allowing any degree of interactions among the relevant variables that need not be specified a-priori. The proposed method, called the GRID, combines empirical likelihood based marginal testing with the local linear estimation machinery in a novel way to select the relevant variables. Further, it provides a simple graphical tool for identifying the low dimensional nonlinear structure of the regression function. Theoretical results establish consistency of variable selection and structure discovery, and also Oracle risk property of the GRID estimator of the regression function, allowing the dimension $d$ of the covariates to grow with the sample size $n$ at the rate $d=O(n^{a})$ for any $a\in(0,\infty)$ and the number of relevant covariates $r$ to grow at a rate $r=O(n^{\gamma})$ for some $\gamma\in(0,1)$ under some regularity conditions that, in particular, require finiteness of certain absolute moments of the error variables depending on $a$. Finite sample properties of the GRID are investigated in a moderately large simulation study.
DA - 2020/6//
PY - 2020/6//
DO - 10.1214/19-AOS1846
VL - 48
IS - 3
SP - 1848-1874
SN - 0090-5364
KW - Empirical likelihood
KW - marginal testing
KW - variable selection consistency
ER -
TY - JOUR
TI - ROBUST AND RATE-OPTIMAL GIBBS POSTERIOR INFERENCE ON THE BOUNDARY OF A NOISY IMAGE
AU - Syring, Nicholas
AU - Martin, Ryan
T2 - ANNALS OF STATISTICS
AB - Detection of an image boundary when the pixel intensities are measured with noise is an important problem in image segmentation, with numerous applications in medical imaging and engineering. From a statistical point of view, the challenge is that likelihood-based methods require modeling the pixel intensities inside and outside the image boundary, even though these are typically of no practical interest. Since misspecification of the pixel intensity models can negatively affect inference on the image boundary, it would be desirable to avoid this modeling step altogether. Towards this, we develop a robust Gibbs approach that constructs a posterior distribution for the image boundary directly, without modeling the pixel intensities. We prove that, for a suitable prior on the image boundary, the Gibbs posterior concentrates asymptotically at the minimax optimal rate, adaptive to the boundary smoothness. Monte Carlo computation of the Gibbs posterior is straightforward, and simulation experiments show that the corresponding inference is more accurate than that based on existing Bayesian methodology.
DA - 2020/6//
PY - 2020/6//
DO - 10.1214/19-AOS1856
VL - 48
IS - 3
SP - 1498-1513
SN - 0090-5364
KW - Adaptation
KW - boundary detection
KW - likelihood-free inference
KW - model misspecification
KW - posterior concentration rate
ER -
TY - JOUR
TI - Genetic and environmental risk for lymphoma in boxer dogs
AU - Craun, Kaitlyn
AU - Ekena, Joanne
AU - Sacco, James
AU - Jiang, Tao
AU - Motsinger-Reif, Alison
AU - Trepanier, Lauren A.
T2 - JOURNAL OF VETERINARY INTERNAL MEDICINE
AB - Non-Hodgkin lymphoma in humans is associated with environmental chemical exposures, and risk is enhanced by genetic variants in glutathione S-transferases (GST) enzymes.We hypothesized that boxer dogs, a breed at risk for lymphoma, would have a higher prevalence of GST variants with predicted low activity, and greater accumulated DNA damage, compared to other breeds. We also hypothesized that lymphoma in boxers would be associated with specific environmental exposures and a higher prevalence of canine GST variants.Fifty-four healthy boxers and 56 age-matched nonboxer controls; 63 boxers with lymphoma and 89 unaffected boxers ≥10 years old.We resequenced variant loci in canine GSTT1, GSTT5, GSTM1, and GSTP1 and compared endogenous DNA damage in peripheral leukocytes of boxers and nonboxers using the comet assay. We also compared GST variants and questionnaire-based environmental exposures in boxers with and without lymphoma.Endogenous DNA damage did not differ between boxers and nonboxers. Boxers with lymphoma were more likely to live within 10 miles of a nuclear power plant and within 2 miles of a chemical supplier or crematorium. Lymphoma risk was not modulated by known canine GST variants.Proximity to nuclear power plants, chemical suppliers, and crematoria were significant risk factors for lymphoma in this population of boxers. These results support the hypothesis that aggregate exposures to environmental chemicals and industrial waste may contribute to lymphoma risk in dogs.
DA - 2020/9//
PY - 2020/9//
DO - 10.1111/jvim.15849
VL - 34
IS - 5
SP - 2068-2077
SN - 1939-1676
KW - canine
KW - detoxification
KW - exposure
KW - lymphosarcoma
ER -
TY - JOUR
TI - Posterior contraction and credible sets for filaments of regression functions
AU - Li, Wei
AU - Ghosal, Subhashis
T2 - ELECTRONIC JOURNAL OF STATISTICS
AB - A filament consists of local maximizers of a smooth function $f$ when moving in a certain direction. A filamentary structure is an important feature of the shape of an object and is also considered as an important lower dimensional characterization of multivariate data. There have been some recent theoretical studies of filaments in the nonparametric kernel density estimation context. This paper supplements the current literature in two ways. First, we provide a Bayesian approach to the filament estimation in regression context and study the posterior contraction rates using a finite random series of B-splines basis. Compared with the kernel-estimation method, this has a theoretical advantage as the bias can be better controlled when the function is smoother, which allows obtaining better rates. Assuming that $f:\mathbb{R}^{2}\mapsto \mathbb{R}$ belongs to an isotropic Hölder class of order $\alpha \geq 4$, with the optimal choice of smoothing parameters, the posterior contraction rates for the filament points on some appropriately defined integral curves and for the Hausdorff distance of the filament are both $(n/\log n)^{(2-\alpha )/(2(1+\alpha ))}$. Secondly, we provide a way to construct a credible set with sufficient frequentist coverage for the filaments. We demonstrate the success of our proposed method in simulations and one application to earthquake data.
DA - 2020///
PY - 2020///
DO - 10.1214/20-EJS1705
VL - 14
IS - 1
SP - 1707-1743
SN - 1935-7524
KW - Filament
KW - nonparametric regression
KW - posterior contraction
KW - credibility
KW - coverage
KW - B-splines
ER -
TY - JOUR
TI - Central limit theorems for classical multidimensional scaling
AU - Li, Gongkai
AU - Tang, Minh
AU - Charon, Nichlas
AU - Priebe, Carey
T2 - ELECTRONIC JOURNAL OF STATISTICS
AB - Classical multidimensional scaling is a widely used method in dimensionality reduction and manifold learning. The method takes in a dissimilarity matrix and outputs a low-dimensional configuration matrix based on a spectral decomposition. In this paper, we present three noise models and analyze the resulting configuration matrices, or embeddings. In particular, we show that under each of the three noise models the resulting embedding gives rise to a central limit theorem. We also provide compelling simulations and real data illustrations of these central limit theorems. This perturbation analysis represents a significant advancement over previous results regarding classical multidimensional scaling behavior under randomness.
DA - 2020///
PY - 2020///
DO - 10.1214/20-EJS1720
VL - 14
IS - 1
SP - 2362-2394
SN - 1935-7524
KW - Classical multidimensional scaling
KW - dissimilarity matrix
KW - perturbation analysis
KW - central limit theorem
ER -
TY - JOUR
TI - Semiparametric regression of the illness-death model with interval censored disease incidence time: An application to the ACLS data
AU - Zhou, Jie
AU - Zhang, Jiajia
AU - McLain, Alexander C.
AU - Lu, Wenbin
AU - Sui, Xuemei
AU - Hardin, James W.
T2 - STATISTICAL METHODS IN MEDICAL RESEARCH
AB - To investigate the effect of fitness on cardiovascular disease and all-cause mortality using the Aerobics Center Longitudinal Study, we develop a semiparametric illness-death model account for intermittent observations of the cardiovascular disease incidence time and the right censored data of all-cause mortality. The main challenge in estimation is to handle the intermittent observations (interval censoring) of cardiovascular disease incidence time and we develop a semiparametric estimation method based on the expectation-maximization algorithm for a Markov illness-death regression model. The variance of the parameters is estimated using profile likelihood methods. The proposed method is evaluated using extensive simulation studies and illustrated with an application to the Aerobics Center Longitudinal Study data.
DA - 2020/12//
PY - 2020/12//
DO - 10.1177/0962280220939123
VL - 29
IS - 12
SP - 3707-3720
SN - 1477-0334
KW - Semi-competing model
KW - illlness-death model
KW - semi-parametric regression
KW - interval censoring
KW - Markov models
ER -
TY - JOUR
TI - Gastric artery embolization: studying the effects of catheter type and injection method on microsphere distributions within a benchtop arterial model
AU - Jernigan, Shaphan R.
AU - Osborne, Jason A.
AU - Buckner, Gregory D.
T2 - BIOMEDICAL ENGINEERING ONLINE
AB - Abstract Aims The objective of the study is to investigate the effect of catheter type and injection method on microsphere distributions, specifically vessel targeting accuracy. Materials and methods The study utilized three catheter types (a standard end-hole micro-catheter, a Surefire anti-reflux catheter, and an Endobar occlusion balloon catheter) and both manual and computer-controlled injection schemes. A closed-loop, dynamically pressurized surrogate arterial system was assembled to replicate arterial flow for bariatric embolization procedures. Four vessel branches immediately distal to the injection site were targeted for embolization. Embolic microspheres were injected into the model using these three catheter types and both manual and computer-controlled injections. Results Across all injection methods, the catheter effect on the proportion of microspheres to target vessels (vs. non-target vessels) was significant ( p = 0.005). The catheter effect on the number of non-target vessels embolized was nearly significant ( p = 0.059). Across all catheter types, the injection method effect was not statistically significant for either of two outcome measures (percent microspheres to target vessels: p = 0.265, number of non-target vessels embolized: p = 0.148). Conclusion Catheter type had a significant effect on targeting accuracy across all injection methods. The Endobar catheter exhibited a higher targeting accuracy in pairwise comparisons with the other two injection catheters across all injection schemes and when considering the Endobar catheter with the manifold injection method vs. each of the catheters with the manual injection method; the differences were significant in three of four analyses. The injection method effect was not statistically significant across all catheter types and when considering the Endobar catheter/Endobar manifold combination vs. Endobar catheter injections with manual and pressure-replicated methods.
DA - 2020/6/26/
PY - 2020/6/26/
DO - 10.1186/s12938-020-00794-z
VL - 19
IS - 1
SP -
SN - 1475-925X
KW - Gastric artery
KW - Embolization
KW - Vessel targeting
KW - Reflux
ER -
TY - JOUR
TI - Ultrasoft Liquid Metal Elastomer Foams with Positive and Negative Piezopermittivity for Tactile Sensing
AU - Yang, Jiayi
AU - Tang, David
AU - Ao, Jinping
AU - Ghosh, Tushar
AU - Neumann, Taylor V.
AU - Zhang, Dongguang
AU - Piskarev, Yegor
AU - Yu, Tingting
AU - Truong, Vi Khanh
AU - Xie, Kai
AU - Lai, Ying-Chih
AU - Li, Yang
AU - Dickey, Michael D.
T2 - ADVANCED FUNCTIONAL MATERIALS
AB - Abstract Soft, capacitive tactile (pressure) sensors are important for applications including human–machine interfaces, soft robots, and electronic skins. Such capacitors consist of two electrodes separated by a soft dielectric. Pressing the capacitor brings the electrodes closer together and thereby increases capacitance. Thus, sensitivity to a given force is maximized by using dielectric materials that are soft and have a high dielectric constant, yet such properties are often in conflict with each other. Here, a liquid metal elastomer foam (LMEF) is introduced that is extremely soft (elastic modulus 7.8 kPa), highly compressible (70% strain), and has a high permittivity. Compressing the LMEF displaces the air in the foam structure, increasing the permittivity over a large range (5.6–11.7). This is called “positive piezopermittivity.” Interestingly, it is discovered that the permittivity of such materials decreases (“negative piezopermittivity”) when compressed to large strain due to the geometric deformation of the liquid metal droplets. This mechanism is theoretically confirmed via electromagnetic theory, and finite element simulation. Using these materials, a soft tactile sensor with high sensitivity, high initial capacitance, and large capacitance change is demonstrated. In addition, a tactile sensor powered wirelessly (from 3 m away) with high power conversion efficiency (84%) is demonstrated.
DA - 2020/9//
PY - 2020/9//
DO - 10.1002/adfm.202002611
VL - 30
IS - 36
SP -
SN - 1616-3028
KW - foams
KW - liquid metals
KW - pressuring sensing
KW - stretchable electronics
KW - tactile sensors
ER -
TY - JOUR
TI - Microstructural classification of unirradiated LiAlO2 pellets by deep learning methods
AU - Pazdernik, Karl
AU - LaHaye, Nicole L.
AU - Artman, Conor M.
AU - Zhu, Yuanyuan
T2 - COMPUTATIONAL MATERIALS SCIENCE
AB - Microstructural features and defects can greatly impact material properties and performance in a wide range of application areas. Recognition and characterization of microstructural features is essential to the understanding and prediction of material performance under various operational conditions, including irradiation. In this work, we tested a collection of Deep Convolutional Neural Network (DCNN) architectures that have been optimized for image segmentation and selected the best performer to obtain pixel-level classification of the main microstructural features in unirradiated LiAlO2 pellets, including grains, grain boundaries, voids, precipitates, and zirconia impurities. LiAlO2 is an important material that is used as a tritium producer for the Tritium Sustainment Program. While LiAlO2 pellets have been employed in tritium-producing burnable absorber rods (TPBARs) for years, comprehensive microstructural analysis of unirradiated LiAlO2, and therefore time-dependent tritium release from the material during irradiation, has not been established. A full understanding of unirradiated LiAlO2 microstructure and how it evolves as a result of neutron irradiation is necessary to produce an integrated performance model to predict in-reactor behavior as well as to target strategic experiments. This work aims at developing a fast and quantitative analysis method to classify various microstructural features in unirradiated LiAlO2 pellets that are visualized by scanning electron microscopy (SEM). Given classification results obtained, statistical analysis was then carried out to evaluate the performance of the DCNN classification and to describe the properties of the microstructural features as a whole, based on standard aggregation and spatial point-process methodology. Our results show improved performance over a baseline heuristic approach. Also, the computational efficiency of the computer-aided analytical method allows for quantitative characterization of a larger volume of SEM images than was previously possible using manual segmentation.
DA - 2020/8//
PY - 2020/8//
DO - 10.1016/j.commatsci.2020.109728
VL - 181
SP -
SN - 1879-0801
KW - Deep convolutional neural network
KW - Scanning electron microscopy
KW - Spatial point process
KW - Image segmentation
ER -
TY - JOUR
TI - BASELINE DRIFT ESTIMATION FOR AIR QUALITY DATA USING QUANTILE TREND FILTERING
AU - Brantley, Halley L.
AU - Guinness, Joseph
AU - Chi, Eric C.
T2 - ANNALS OF APPLIED STATISTICS
AB - We address the problem of estimating smoothly varying baseline trends in time series data. This problem arises in a wide range of fields, including chemistry, macroeconomics and medicine; however, our study is motivated by the analysis of data from low cost air quality sensors. Our methods extend the quantile trend filtering framework to enable the estimation of multiple quantile trends simultaneously while ensuring that the quantiles do not cross. To handle the computational challenge posed by very long time series, we propose a parallelizable alternating direction method of multipliers (ADMM) algorithm. The ADMM algorthim enables the estimation of trends in a piecewise manner, both reducing the computation time and extending the limits of the method to larger data sizes. We also address smoothing parameter selection and propose a modified criterion based on the extended Bayesian information criterion. Through simulation studies and our motivating application to low cost air quality sensor data, we demonstrate that our model provides better quantile trend estimates than existing methods and improves signal classification of low-cost air quality sensor output.
DA - 2020/6//
PY - 2020/6//
DO - 10.1214/19-AOAS1318
VL - 14
IS - 2
SP - 585-604
SN - 1932-6157
KW - Air quality
KW - nonparametric quantile regression
KW - trend estimation
ER -
TY - JOUR
TI - Distributions of pattern statistics in sparse Markov models
AU - Martin, Donald E. K.
T2 - Annals of the Institute of Statistical Mathematics
DA - 2020/8//
PY - 2020/8//
DO - 10.1007/s10463-019-00714-6
VL - 72
IS - 4
SP - 895-913
SN - 0020-3157 1572-9052
UR - http://dx.doi.org/10.1007/S10463-019-00714-6
KW - Auxiliary Markov chain
KW - Pattern distribution
KW - Sparse Markov model
KW - Variable length Markov chain
ER -
TY - JOUR
TI - In-Plane Thermoelectric Properties of Flexible and Room-Temperature-Doped Carbon Nanotube Films
AU - Chatterjee, Kony
AU - Negi, Ankit
AU - Kim, Kyunghoon
AU - Liu, Jun
AU - Ghosh, Tushar K.
T2 - ACS Applied Energy Materials
AB - Soft materials with high power factors (PFs) and low thermal conductivity (κ) are critically important for integration of thermoelectric (TE) modules into flexible form factors for energy harvesting or cooling applications. Here, air stable p- and n-type multiwalled carbon nanotube films with high PFs (up to 521 μW/m K2) are reported, with n-type doping carried out in a facile two-step process. The maximum figures of merit (ZTs) of p-type and n-type CNTs are obtained as 0.019 and 0.015 at 300 K, respectively, with all three transport properties—Seebeck coefficient, electrical conductivity, and κ—measured in-plane, providing a more accurate ZT. Using time-domain thermoreflectance, we report a fast and non-contact measurement of κ without complex microfabrication or material processing. Moreover, there is no material mismatch between the p- and n-type legs of the TE module. Such materials have the potential for widespread applications in inexpensive and scalable wearable energy harvesting and localized heating/cooling.
DA - 2020/7/27/
PY - 2020/7/27/
DO - 10.1021/acsaem.0c00995
VL - 3
IS - 7
SP - 6929-6936
UR - https://doi.org/10.1021/acsaem.0c00995
KW - thermoelectrics
KW - carbon nanotubes
KW - flexible film
KW - in-plane thermal conductivity
KW - air stable
ER -
TY - JOUR
TI - Comparative Exposure Assessment Using Silicone Passive Samplers Indicates That Domestic Dogs Are Sentinels To Support Human Health Research
AU - Wise, Catherine F.
AU - Hammel, Stephanie C.
AU - Herkert, Nicholas
AU - Ma, Jun
AU - Motsinger-Reif, Alison
AU - Stapleton, Heather M.
AU - Breen, Matthew
T2 - ENVIRONMENTAL SCIENCE & TECHNOLOGY
AB - Silicone wristbands are promising passive samplers to support epidemiological studies in characterizing exposure to organic contaminants; however, investigating associated health risks remains challenging because of the latency period for many chronic diseases that take years to manifest. Dogs provide valuable insights as sentinels for exposure-related human disease because they share similar exposures in the home, have shorter life spans, share many clinical/biological features, and have closely related genomes. Here, we evaluated exposures among pet dogs and their owners using silicone dog tags and wristbands to determine if contaminant levels were correlated with validated exposure biomarkers. Significant correlations between measures on dog tags and wristbands were observed (rs = 0.38–0.90; p < 0.05). Correlations with their respective urinary biomarkers were stronger in dog tags compared to that in human wristbands (rs = 0.50–0.71; p < 0.01) for several organophosphate esters. This supports the value of using silicone bands with dogs to investigate health impacts on humans from shared exposures.
DA - 2020/6/16/
PY - 2020/6/16/
DO - 10.1021/acs.est.9b06605
VL - 54
IS - 12
SP - 7409-7419
SN - 1520-5851
ER -
TY - JOUR
TI - Vecchia Approximations of Gaussian-Process Predictions
AU - Katzfuss, Matthias
AU - Guinness, Joseph
AU - Gong, Wenlong
AU - Zilber, Daniel
T2 - JOURNAL OF AGRICULTURAL BIOLOGICAL AND ENVIRONMENTAL STATISTICS
AB - Gaussian processes are popular and flexible models for spatial, temporal, and functional data, but they are computationally infeasible for large datasets. We discuss Gaussian-process approximations that use basis functions at multiple resolutions to achieve fast inference and that can (approximately) represent any spatial covariance structure. We consider two special cases of this multi-resolution-approximation framework, a taper version and a domain-partitioning (block) version. We describe theoretical properties and inference procedures, and study the computational complexity of the methods. Numerical comparisons and an application to satellite data are also provided.
DA - 2020/6/23/
PY - 2020/6/23/
DO - 10.1007/s13253-020-00401-7
SP -
SN - 1537-2693
KW - Computational complexity
KW - Kriging
KW - Large datasets
KW - Sparsity
KW - Spatial statistics
ER -
TY - JOUR
TI - Tenure and Promotion Outcomes at Four Large Land Grant Universities: Examining the Role of Gender, Race, and Academic Discipline
AU - Durodoye, Raifu, Jr.
AU - Gumpertz, Marcia
AU - Wilson, Alyson
AU - Griffith, Emily
AU - Ahmad, Seher
T2 - RESEARCH IN HIGHER EDUCATION
DA - 2020/8//
PY - 2020/8//
DO - 10.1007/s11162-019-09573-9
VL - 61
IS - 5
SP - 628-651
SN - 1573-188X
KW - Tenure
KW - Faculty
KW - Race
KW - Gender
KW - Discipline
ER -
TY - JOUR
TI - The influence of packed cell volume versus plasma proteins on thromboelastographic variables in canine blood
AU - Lynch, Alex M.
AU - Ruterbories, Laura
AU - Jack, John
AU - Motsinger-Reif, Alison A.
AU - Hanel, Rita
T2 - JOURNAL OF VETERINARY EMERGENCY AND CRITICAL CARE
AB - Abstract Objective Determine the correlation between kaolin‐activated thromboelastography (TEG) variables (R, K, angle, and maximum amplitude [MA]) and PCV, fibrinogen concentration (FC), and total fibrinogen (TF) in an ex vivo model. Animals Two healthy adult mixed‐breed dogs. Procedures Citrated whole blood was obtained and separated into packed red cells, platelet rich plasma, and platelet poor plasma (PPP). An aliquot of PPP was heated to denature heat labile proteins (fibrinogen, factor V, factor VIII). Blood components were recombined for analyses of 6 physiological scenarios: anemia with low fibrinogen; anemia with moderate fibrinogen; anemia with normal fibrinogen; anemia with normal saline; normal PCV and normal fibrinogen; and normal PCV and low fibrinogen. A Kruskal–Wallis test, along with linear regressions on pairwise combinations of TEG variables, was used to determine the correlation between TEG variables and PCV, FC, and TF. Results Maximum amplitude correlated with FC ( R 2 0.60, P < 0.001) and TF ( R 2 0.57, P < 0.001) but not PCV ( R 2 0.003, P = 0.7). Angle and K time were moderately correlated with FC ([angle: R 2 0.53, P < 0.001]; [K: R 2 0.55, P < 0.001]) and TF ([alpha angle: R 2 0.52, P < 0.001]; [K: R 2 0.51, P < 0.001]) but not PCV. The R time was weakly correlated with PCV ( R 2 0.15, P < 0.009) but not FC or TF. Conclusions and clinical relevance In an ex vivo model, plasma proteins but not PCV impacted TEG variables. This suggests that TEG changes noted with anemia are imparted by changes in available fibrinogen in a fixed microenvironment rather than artifact of anemia.
DA - 2020/7//
PY - 2020/7//
DO - 10.1111/vec.12979
VL - 30
IS - 4
SP - 418-425
SN - 1476-4431
ER -
TY - JOUR
TI - Peptide variability and signatures associated with disease progression in CSF collected longitudinally from ALS patients
AU - Mellinger, Allyson L.
AU - Griffith, Emily H.
AU - Bereman, Michael S.
T2 - ANALYTICAL AND BIOANALYTICAL CHEMISTRY
DA - 2020/9//
PY - 2020/9//
DO - 10.1007/s00216-020-02765-8
VL - 412
IS - 22
SP - 5465-5475
SN - 1618-2650
KW - Amyotrophic lateral sclerosis
KW - Cerebrospinal fluid
KW - Longitudinal modeling
KW - Proteomics
KW - Biomarker
ER -
TY - JOUR
TI - Global forensic geolocation with deep neural networks
AU - Grantham, Neal S.
AU - Reich, Brian J.
AU - Laber, Eric B.
AU - Pacifici, Krishna
AU - Dunn, Robert R.
AU - Fierer, Noah
AU - Gebert, Matthew
AU - Allwood, Julia S.
AU - Faith, Seth A.
T2 - JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS
AB - Summary An important problem in modern forensic analyses is identifying the provenance of materials at a crime scene, such as biological material on a piece of clothing. This procedure, which is known as geolocation, is conventionally guided by expert knowledge of the biological evidence and therefore tends to be application specific, labour intensive and often subjective. Purely data-driven methods have yet to be fully realized in this domain, because in part of the lack of a sufficiently rich source of data. However, high throughput sequencing technologies can identify tens of thousands of fungi and bacteria taxa by using DNA recovered from a single swab collected from nearly any object or surface. This microbial community, or microbiome, may be highly informative of the provenance of the sample, but data on the spatial variation of microbiomes are sparse and high dimensional and have a complex dependence structure that render them difficult to model with standard statistical tools. Deep learning algorithms have generated a tremendous amount of interest within the machine learning community for their predictive performance in high dimensional problems. We present DeepSpace: a new algorithm for geolocation that aggregates over an ensemble of deep neural network classifiers trained on randomly generated Voronoi partitions of a spatial domain. The DeepSpace algorithm makes remarkably good point predictions; for example, when applied to the microbiomes of over 1300 dust samples collected across continental USA, more than half of geolocation predictions produced by this model fall less than 100 km from their true origin, which is a 60% reduction in error from competing geolocation methods. Moreover, we apply DeepSpace to a novel data set of global dust samples collected from nearly 30 countries, finding that dust-associated fungi alone predict a sample's country of origin with nearly 90% accuracy.
DA - 2020/8//
PY - 2020/8//
DO - 10.1111/rssc.12427
VL - 69
IS - 4
SP - 909-929
SN - 1467-9876
KW - Citizen science
KW - Machine learning
KW - Microbiome
KW - Non-homogeneous Poisson process
KW - Spatial point pattern
ER -
TY - JOUR
TI - Association test using Copy Number Profile Curves (CONCUR) enhances power in rare copy number variant analysis
AU - Brucker, Amanda
AU - Lu, Wenbin
AU - West, Rachel Marceau
AU - Yu, Qi-You
AU - Hsiao, Chuhsing Kate
AU - Hsiao, Tzu-Hung
AU - Lin, Ching-Heng
AU - Magnusson, Patrik K. E.
AU - Sullivan, Patrick F.
AU - Szatkiewicz, Jin P.
AU - Lu, Tzu-Pin
AU - Tzeng, Jung-Ying
T2 - PLOS COMPUTATIONAL BIOLOGY
AB - Copy number variants (CNVs) are the gain or loss of DNA segments in the genome that can vary in dosage and length. CNVs comprise a large proportion of variation in human genomes and impact health conditions. To detect rare CNV associations, kernel-based methods have been shown to be a powerful tool due to their flexibility in modeling the aggregate CNV effects, their ability to capture effects from different CNV features, and their accommodation of effect heterogeneity. To perform a kernel association test, a CNV locus needs to be defined so that locus-specific effects can be retained during aggregation. However, CNV loci are arbitrarily defined and different locus definitions can lead to different performance depending on the underlying effect patterns. In this work, we develop a new kernel-based test called CONCUR (i.e., copy number profile curve-based association test) that is free from a definition of locus and evaluates CNV-phenotype associations by comparing individuals' copy number profiles across the genomic regions. CONCUR is built on the proposed concepts of "copy number profile curves" to describe the CNV profile of an individual, and the "common area under the curve (cAUC) kernel" to model the multi-feature CNV effects. The proposed method captures the effects of CNV dosage and length, accounts for the numerical nature of copy numbers, and accommodates between- and within-locus etiological heterogeneity without the need to define artificial CNV loci as required in current kernel methods. In a variety of simulation settings, CONCUR shows comparable or improved power over existing approaches. Real data analyses suggest that CONCUR is well powered to detect CNV effects in the Swedish Schizophrenia Study and the Taiwan Biobank.
DA - 2020/5//
PY - 2020/5//
DO - 10.1371/journal.pcbi.1007797
VL - 16
IS - 5
SP -
SN - 1553-7358
ER -
TY - JOUR
TI - The association between neuraxial anesthesia and the development of childhood asthma - a secondary analysis of the newborn epigenetics study cohort
AU - Huang, Yueyang
AU - Tzeng, Jung-Ying
AU - Maguire, Rachel
AU - Hoyo, Cathrine
AU - Allen, Terrence
T2 - CURRENT MEDICAL RESEARCH AND OPINION
AB - Objectives Childhood asthma is a common chronic illness that has been associated with mode of delivery. However, the effect of cesarean delivery alone does not fully account for the increased prevalence of childhood asthma. We tested the hypothesis that neuraxial anesthesia used for labor analgesia and cesarean delivery alters the risk of developing childhood asthma.Methods Within the Newborn Epigenetics Study birth cohort, 196 mother and child pairs with entries in the electronic anesthesia records were included. From these records, data on maternal anesthesia type, duration of exposure, and drugs administered peripartum were abstracted and combined with questionnaire-derived prenatal risk factors and medical records and questionnaire-derived asthma diagnosis data in children. Logistic regression models were used to evaluate associations between type of anesthesia, duration of anesthesia, and the development of asthma in males and females.Results We found that longer duration of epidural anesthesia was associated with a lower risk of asthma in male children (OR = 0.80; 95% CI = 0.66–0.95) for each hour of epidural exposure. Additionally, a unit increase in the composite dose of local anesthetics and opioid analgesics administered via the spinal route was associated with a lower risk of asthma in both male (OR = 0.59, 95% CI = 0.36–0.96) and female children (OR 0.26, 95% CI 0.09–0.82).Conclusion Our data suggest that peripartum exposure to neuraxial anesthesia may reduce the risk of childhood asthma primarily in males. Larger human studies and model systems with longer follow-up are required to elucidate these findings.
DA - 2020/6/2/
PY - 2020/6/2/
DO - 10.1080/03007995.2020.1747417
VL - 36
IS - 6
SP - 1025-1032
SN - 1473-4877
KW - Anesthesia
KW - opioid analgesics
KW - asthma
KW - children
KW - sex-specific
ER -
TY - JOUR
TI - Growth performance, oxidative stress, and antioxidant capacity of newly weaned piglets fed dietary peroxidized lipids with vitamin E or phytogenic compounds in drinking water
AU - Silva-Guillen, Ysenia
AU - Arellano, Consuelo
AU - Martinez, Gabriela
AU - Heugten, Eric
T2 - APPLIED ANIMAL SCIENCE
AB - This study evaluated the use of vitamin E and phytogenic compounds in drinking water on growth performance, oxidative stress, and immune status of piglets fed peroxidized lipids. In a 35-d study, 21-d-old weaned piglets (n = 96; 6.10 ± 0.64 kg of BW) were assigned within sex and BW blocks to 1 of 4 treatments, using 24 pens (4 pigs per pen; 6 replications per treatment). Diets contained either 6% soybean oil or 6% peroxidized soybean oil. Pigs fed peroxidized soybean oil received drinking water without (control) or with supplemental vitamin E (100 IU/L of RRR-α-tocopherol) or phytogenic compounds (60 μL/L for wk 1 and 30 μL/L for wk 2 to 5). Peroxidized soybean oil decreased (P < 0.001) final BW (18.2 vs. 21.6 kg) and ADG (346 vs. 441 g/d) and tended to decrease ADFI (P = 0.14; 542 vs. 617 g/d) and G:F (P = 0.07; 645 vs. 715 g/kg). Peroxidation decreased serum vitamin E concentrations (P = 0.03), which could be restored (P = 0.01) by vitamin E in the water, but not phytogenic compounds. Peroxidized soybean oil decreased serum 8-hydroxydeoxyguanosine, increased serum protein carbonyl, and had no effects on serum malondialdehyde or cytokines. Peroxidized soybean oil reduced growth performance of weaned nursery pigs, which did not appear to be related to oxidative stress or immune status. The negative effects of peroxidized soybean oil on animal performance could not be improved by supplementation of vitamin E or phytogenic compounds in the drinking water.
DA - 2020/6//
PY - 2020/6//
DO - 10.15232/aas.2019-01976
VL - 36
IS - 3
SP - 341-351
SN - 2590-2865
KW - health
KW - oxidation
KW - plant extracts
KW - tocopherol
ER -
TY - JOUR
TI - Multiple QTL Mapping in Autopolyploids: A Random-Effect Model Approach with Application in a Hexaploid Sweetpotato Full-Sib Population
AU - Da Silva Pereira, G.
AU - Gemenet, D.C.
AU - Mollinari, M.
AU - Olukolu, B.A.
AU - Wood, J.C.
AU - Diaz, F.
AU - Mosquera, V.
AU - Gruneberg, W.J.
AU - Khan, A.
AU - Buell, C.R.
AU - Yencho, G.C.
AU - Zeng, Z.-B.
T2 - Genetics
AB - Abstract Genetic analysis in autopolyploids is a very complicated subject due to the enormous number of genotypes at a locus that needs to be considered. For instance, the number of... In developing countries, the sweetpotato, Ipomoea batatas (L.) Lam. (2n=6x=90), is an important autopolyploid species, both socially and economically. However, quantitative trait loci (QTL) mapping has remained limited due to its genetic complexity. Current fixed-effect models can fit only a single QTL and are generally hard to interpret. Here, we report the use of a random-effect model approach to map multiple QTL based on score statistics in a sweetpotato biparental population (‘Beauregard’ × ‘Tanzania’) with 315 full-sibs. Phenotypic data were collected for eight yield component traits in six environments in Peru, and jointly adjusted means were obtained using mixed-effect models. An integrated linkage map consisting of 30,684 markers distributed along 15 linkage groups (LGs) was used to obtain the genotype conditional probabilities of putative QTL at every centiMorgan position. Multiple interval mapping was performed using our R package QTLpoly and detected a total of 13 QTL, ranging from none to four QTL per trait, which explained up to 55% of the total variance. Some regions, such as those on LGs 3 and 15, were consistently detected among root number and yield traits, and provided a basis for candidate gene search. In addition, some QTL were found to affect commercial and noncommercial root traits distinctly. Further best linear unbiased predictions were decomposed into additive allele effects and were used to compute multiple QTL-based breeding values for selection. Together with quantitative genotyping and its appropriate usage in linkage analyses, this QTL mapping methodology will facilitate the use of genomic tools in sweetpotato breeding as well as in other autopolyploids.
DA - 2020/5/5/
PY - 2020/5/5/
DO - 10.1534/genetics.120.303080
VL - 215
IS - 3
SP - 579-595
UR - http://dx.doi.org/10.1534/genetics.120.303080
KW - multiple interval mapping
KW - polyploid QTL model
KW - restricted maximum likelihood
KW - variance components
KW - yield components
KW - heritability
ER -
TY - JOUR
TI - Disturbances drive changes in coral community assemblages and coral calcification capacity
AU - Courtney, Travis A.
AU - Barnes, Brian B.
AU - Chollett, Iliana
AU - Elahi, Robin
AU - Gross, Kevin
AU - Guest, James R.
AU - Kuffner, Ilsa B.
AU - Lenz, Elizabeth A.
AU - Nelson, Hannah R.
AU - Rogers, Caroline S.
AU - Toth, Lauren T.
AU - Andersson, Andreas J.
T2 - ECOSPHERE
AB - Abstract Anthropogenic environmental change has increased coral reef disturbance regimes in recent decades, altering the structure and function of many coral reefs globally. In this study, we used coral community survey data collected from 1996 to 2015 to evaluate reef‐scale coral calcification capacity (CCC) dynamics with respect to recorded pulse disturbances for 121 reef sites in the Main Hawaiian Islands and Mo'orea (French Polynesia) in the Pacific and the Florida Keys Reef Tract and St. John (U.S. Virgin Islands) in the western Atlantic. CCC remained relatively high in the Main Hawaiian Islands in the absence of recorded widespread disturbances; declined and subsequently recovered in Mo'orea following a crown‐of‐thorns sea star outbreak, coral bleaching, and major cyclone; decreased and remained low following coral bleaching in the Florida Keys Reef Tract; and decreased following coral bleaching and disease in St. John. Individual coral taxa have variable calcification rates and susceptibility to disturbances because of their differing life‐history strategies. As a result, temporal changes in CCC in this study were driven by shifts in both overall coral cover and coral community composition. Analysis of our results considering coral life‐history strategies showed that weedy corals generally increased their contributions to CCC over time while the contribution of competitive corals decreased. Shifts in contributions by stress‐tolerant and generalist corals to CCC were more variable across regions. The increasing frequency and intensity of disturbances under 21st century global change therefore has the potential to drive lower and more variable CCC because of the increasing dominance of weedy and some stress‐tolerant corals.
DA - 2020/4//
PY - 2020/4//
DO - 10.1002/ecs2.3066
VL - 11
IS - 4
SP -
SN - 2150-8925
KW - carbonate budgets
KW - climate change
KW - coral bleaching
KW - coral disease
KW - ecological traits
KW - environmental monitoring
KW - resilience
KW - scleractinians
ER -
TY - JOUR
TI - DHPA: Dynamic Human Preference Analytics Framework— A Case Study on Taxi Drivers' Learning Curve Analysis
AU - Pan, M.
AU - Li, Y.
AU - Zhou, X.
AU - Liu, Z.
AU - Song, R.
AU - Liu, H.
AU - Luo, J.
AU - Huang, Weixiao
AU - Tian, Zhihong
T2 - ACM Transactions on Intelligent Systems and Technology
AB - Many real-world human behaviors can be modeled and characterized as sequential decision-making processes, such as a taxi driver’s choices of working regions and times. Each driver possesses unique preferences on the sequential choices over time and improves the driver’s working efficiency. Understanding the dynamics of such preferences helps accelerate the learning process of taxi drivers. Prior works on taxi operation management mostly focus on finding optimal driving strategies or routes, lacking in-depth analysis on what the drivers learned during the process and how they affect the performance of the driver. In this work, we make the first attempt to establish Dynamic Human Preference Analytics. We inversely learn the taxi drivers’ preferences from data and characterize the dynamics of such preferences over time. We extract two types of features (i.e., profile features and habit features) to model the decision space of drivers. Then through inverse reinforcement learning, we learn the preferences of drivers with respect to these features. The results illustrate that self-improving drivers tend to keep adjusting their preferences to habit features to increase their earning efficiency while keeping the preferences to profile features invariant. However, experienced drivers have stable preferences over time. The exploring drivers tend to randomly adjust the preferences over time.
DA - 2020/1//
PY - 2020/1//
DO - 10.1145/3360312
VL - 11
IS - 1
SP -
SN - 2157-6912
KW - Urban computing
KW - inverse reinforcement learning
KW - preference dynamics
ER -
TY - JOUR
TI - Nonlinear Dose-Response Modeling of High-Throughput Screening Data Using an Evolutionary Algorithm
AU - Ma, Jun
AU - Bair, Eric
AU - Motsinger-Reif, Alison
T2 - DOSE-RESPONSE
AB - Nonlinear dose-response relationships exist extensively in the cellular, biochemical, and physiologic processes that are affected by varying levels of biological, chemical, or radiation stress. Modeling such responses is a crucial component of toxicity testing and chemical screening. Traditional model fitting methods such as nonlinear least squares (NLS) are very sensitive to initial parameter values and often had convergence failure. The use of evolutionary algorithms (EAs) has been proposed to address many of the limitations of traditional approaches, but previous methods have been limited in the types of models they can fit. Therefore, we propose the use of an EA for dose-response modeling for a range of potential response model functional forms. This new method can not only fit the most commonly used nonlinear dose-response models (eg, exponential models and 3-, 4-, and 5-parameter logistic models) but also select the best model if no model assumption is made, which is especially useful in the case of high-throughput curve fitting. Compared with NLS, the new method provides stable and robust solutions without sensitivity to initial values.
DA - 2020/4//
PY - 2020/4//
DO - 10.1177/1559325820926734
VL - 18
IS - 2
SP -
SN - 1559-3258
KW - evolutionary algorithm
KW - hillslope model
KW - parameter estimation
KW - nonlinear regression
KW - model selection
ER -
TY - JOUR
TI - Equiprobable discrete models of site-specific substitution rates underestimate the extent of rate variability
AU - Mannino, Frank
AU - Wisotsky, Sadie
AU - Pond, Sergei L. Kosakovsky
AU - Muse, Spencer V
T2 - PLOS ONE
AB - It is standard practice to model site-to-site variability of substitution rates by discretizing a continuous distribution into a small number, K, of equiprobable rate categories. We demonstrate that the variance of this discretized distribution has an upper bound determined solely by the choice of K and the mean of the distribution. This bound can introduce biases into statistical inference, especially when estimating parameters governing site-to-site variability of substitution rates. Applications to two large collections of sequence alignments demonstrate that this upper bound is often reached in analyses of real data. When parameter estimation is of primary interest, additional rate categories or more flexible modeling methods should be considered.
DA - 2020/3/2/
PY - 2020/3/2/
DO - 10.1371/journal.pone.0229493
VL - 15
IS - 3
SP -
SN - 1932-6203
ER -
TY - JOUR
TI - Regional and field-specific differences in Fusarium species and mycotoxins associated with blighted North Carolina wheat
AU - Cowger, Christina
AU - Ward, Todd J.
AU - Nilsson, Kathryn
AU - Arellano, Consuelo
AU - McCormick, Susan P.
AU - Busman, Mark
T2 - INTERNATIONAL JOURNAL OF FOOD MICROBIOLOGY
AB - Worldwide, while Fusarium graminearum is the main causal species of Fusarium head blight (FHB) in small-grain cereals, a diversity of FHB-causing species belonging to different species complexes has been found in most countries. In the U.S., FHB surveys have focused on the Fusarium graminearum species complex (FGSC) and the frequencies of 3-ADON, 15-ADON, and nivalenol (NIV) chemotypes. A large-scale survey was undertaken across the state of North Carolina in 2014 to explore the frequency and distribution of F. graminearum capable of producing NIV, which is not monitored at grain intake points. Symptomatic wheat spikes were sampled from 59 wheat fields in 24 counties located in three agronomic zones typical of several states east of the Appalachian Mountains: Piedmont, Coastal Plain, and Tidewater. Altogether, 2197 isolates were identified to species using DNA sequence-based methods. Surprisingly, although F. graminearum was the majority species detected, species in the Fusarium tricinctum species complex (FTSC) that produce “emerging mycotoxins” were frequent, and even dominant in some fields. The FTSC percentage was 50–100% in four fields, 30–49% in five fields, 20–29% in five fields, and < 20% in the remaining 45 fields. FTSC species were at significantly higher frequency in the Coastal Plain than in the Piedmont or Tidewater (P < .05). Moniliformin concentrations in samples ranged from 0.0 to 38.7 μg g−1. NIV producing isolates were rare statewide (2.2%), and never >12% in a single field, indicating that routine testing for NIV is probably unnecessary. The patchy distribution of FTSC species in wheat crops demonstrated the need to investigate the potential importance of their mycotoxins and the factors that allow them to sometimes outcompete trichothecene producers. An increased sampling intensity of wheat fields led to the unexpected discovery of a minority FHB-causing population.
DA - 2020/6/16/
PY - 2020/6/16/
DO - 10.1016/j.ijfoodmicro.2020.108594
VL - 323
SP -
SN - 1879-3460
KW - Fusarium graminearum
KW - Fusarium head blight
KW - Fusarium tricinctum species complex
KW - Scab
KW - Deoxynivalenol
KW - Moniliformin
KW - Gibberella ear rot
KW - Small grains
KW - Chemotype
ER -
TY - JOUR
TI - Research note: Shout-out survey for quantifying reasons for trail use
AU - Hess, George R.
AU - Loflin, Alexandria M.
AU - Selm, Kathryn R.
T2 - JOURNAL OF OUTDOOR RECREATION AND TOURISM-RESEARCH PLANNING AND MANAGEMENT
AB - Gathering data about why people use greenway trails (e.g., health, recreation, transportation) requires interaction with trail users who typically do not want to stop for a survey; runners and bicyclists are particularly challenging. We placed a series of signs along trails asking users to shout out their answer to a simple question as they passed a surveyor, who also recorded observational data. In a feasibility study along greenway trails in Raleigh, NC, USA, we counted 541 users, 66% of whom shouted out whether they were using the trail for recreation or transportation. Of all users who passed, 45% were on bicycles and 55% on foot. Of those who responded, 11% were using the trail for transportation and 89% for recreation; 86% of transportation users were bicyclists. This method is generalizable and offers a way to collect additional information as individuals pass surveyors who might otherwise collect only observational data.
DA - 2020/3//
PY - 2020/3//
DO - 10.1016/j.jort.2019.100234
VL - 29
SP -
SN - 2213-0799
KW - Bicyclist
KW - Greenway trail use
KW - Pedestrian
KW - Poll
KW - Recreation
KW - Survey
KW - Transportation
ER -
TY - JOUR
TI - Fully‐Textile Seam‐Line Sensors for Facile Textile Integration and Tunable Multi‐Modal Sensing of Pressure, Humidity, and Wetness
AU - Agcayazi, Talha
AU - Tabor, Jordan
AU - McKnight, Michael
AU - Martin, Isaac
AU - Ghosh, Tushar K.
AU - Bozkurt, Alper
T2 - Advanced Materials Technologies
AB - Abstract The unique potential of e‐textiles for unobtrusive and ubiquitous monitoring and their innovative interfacing with electronic devices has garnished great attention. Sensors are one of the few essential devices or components necessary for most functional e‐textile applications. Ideally, any e‐textile based sensor should be soft, easily integrated in textile manufacturing processes, and tunable for the desired applications. Here, an easy‐to‐manufacture, tunable, fully‐textile sensor system with capability of detecting pressure, humidity, or wetness is presented. Capacitive pressure sensors are formed via a traditional sewing process with two commercially available conductive sewing yarns (silver‐plated polyamide (silver) and stainless steel (SS)) with cotton knit, polyethylene‐terephthalate (PET) knit and elastomeric meltblown textile dielectrics. The relationship between the sensor's physical, mechanical, and electromechanical properties including hysteresis, sensitivity, response, and relaxation time is evaluated. In addition, the same sensor configuration is assessed for its humidity and wetness sensing performance. Results indicate that pressure, relative humidity (RH), and wetness sensing performance are easily tunable using different combinations of the conductive and dielectric textile materials. Finally, proof of concept deployment demonstrations as human‐machine interfaces within a pressure sensing mat and a smart glove capable of remotely controlling a drone are provided.
DA - 2020/8//
PY - 2020/8//
DO - 10.1002/admt.202000155
UR - https://doi.org/10.1002/admt.202000155
KW - e-textiles
KW - flexible sensors
KW - humidity sensing
KW - pressure sensing
KW - wetness sensing
ER -
TY - JOUR
TI - Bayesian Inference in Nonparanormal Graphical Models
AU - Mulgrave, Jami J.
AU - Ghosal, Subhashis
T2 - BAYESIAN ANALYSIS
AB - Gaussian graphical models have been used to study intrinsic dependence among several variables, but the Gaussianity assumption may be restrictive in many applications. A nonparanormal graphical model is a semiparametric generalization for continuous variables where it is assumed that the variables follow a Gaussian graphical model only after some unknown smooth monotone transformations on each of them. We consider a Bayesian approach in the nonparanormal graphical model by putting priors on the unknown transformations through a random series based on B-splines where the coefficients are ordered to induce monotonicity. A truncated normal prior leads to partial conjugacy in the model and is useful for posterior simulation using Gibbs sampling. On the underlying precision matrix of the transformed variables, we consider a spike-and-slab prior and use an efficient posterior Gibbs sampling scheme. We use the Bayesian Information Criterion to choose the hyperparameters for the spike-and-slab prior. We present a posterior consistency result on the underlying transformation and the precision matrix. We study the numerical performance of the proposed method through an extensive simulation study and finally apply the proposed method on a real data set.
DA - 2020/6//
PY - 2020/6//
DO - 10.1214/19-BA1159
VL - 15
IS - 2
SP - 449-475
SN - 1936-0975
KW - Bayesian inference
KW - nonparanormal
KW - Gaussian graphical models
KW - sparsity
KW - continuous shrinkage prior
ER -
TY - JOUR
TI - Comparison of decay rates between native and non-native wood species in invaded forests of the southeastern US: a rapid assessment
AU - Ulyshen, Michael D.
AU - Horn, Scott
AU - Brownie, Cavell
AU - Strickland, Michael S.
AU - Wurzburger, Nina
AU - Zanne, Amy
T2 - BIOLOGICAL INVASIONS
DA - 2020/8//
PY - 2020/8//
DO - 10.1007/s10530-020-02276-8
VL - 22
IS - 8
SP - 2619-2632
SN - 1573-1464
KW - Chinese privet
KW - Exotic species
KW - Japanese stiltgrass
KW - Novel ecosystems
KW - Plant traits
ER -
TY - JOUR
TI - Use of standardized bioinformatics for the analysis of fungal DNA signatures applied to sample provenance
AU - Allwood, Julia S.
AU - Fierer, Noah
AU - Dunn, Robert R.
AU - Breen, Matthew
AU - Reich, Brian J.
AU - Laber, Eric B.
AU - Clifton, Jesse
AU - Grantham, Neal S.
AU - Faith, Seth A.
T2 - FORENSIC SCIENCE INTERNATIONAL
AB - The use of environmental trace material to aid criminal investigations is an ongoing field of research within forensic science. The application of environmental material thus far has focused upon a variety of different objectives relevant to forensic biology, including sample provenance (also referred to as sample attribution). The capability to predict the provenance or origin of an environmental DNA sample would be an advantageous addition to the suite of investigative tools currently available. A metabarcoding approach is often used to predict sample provenance, through the extraction and comparison of the DNA signatures found within different environmental materials, such as the bacteria within soil or fungi within dust. Such approaches are combined with bioinformatics workflows and statistical modelling, often as part of large-scale study, with less emphasis on the investigation of the adaptation of these methods to a smaller scale method for forensic use. The present work was investigating a small-scale approach as an adaptation of a larger metabarcoding study to develop a model for global sample provenance using fungal DNA signatures collected from dust swabs. This adaptation was to facilitate a standardized method for consistent, reproducible sample treatment, including bioinformatics processing and final application of resulting data to the available prediction model. To investigate this small-scale method, 76 DNA samples were treated as anonymous test samples and analyzed using the standardized process to demonstrate and evaluate processing and customized sequence data analysis. This testing included samples originating from countries previously used to train the model, samples artificially mixed to represent multiple or mixed countries, as well as outgroup samples. Positive controls were also developed to monitor laboratory processing and bioinformatics analysis. Through this evaluation we were able to demonstrate that the samples could be processed and analyzed in a consistent manner, facilitated by a relatively user-friendly bioinformatic pipeline for sequence data analysis. Such investigation into standardized analyses and application of metabarcoding data is of key importance for the future use of applied microbiology in forensic science.
DA - 2020/5//
PY - 2020/5//
DO - 10.1016/j.forsciint.2020.110250
VL - 310
SP -
SN - 1872-6283
KW - Forensic microbiology
KW - Bioinformatics
KW - Metabarcoding
KW - Sample provenance
ER -
TY - JOUR
TI - Bayesian ordinal probit semiparametric regression models: KNHANES 2016 data analysis of the relationship between smoking behavior and coffee intake
AU - Lee, Dasom
AU - Lee, Eunji
AU - Jo, Seogil
AU - Choi, Taeryeon
T2 - KOREAN JOURNAL OF APPLIED STATISTICS
DA - 2020/2//
PY - 2020/2//
DO - 10.5351/KJAS.2020.33.1.025
VL - 33
IS - 1
SP - 25-46
SN - 2383-5818
KW - BSAR
KW - Gaussian process
KW - KNHANES data
KW - Markov chain Monte Carlo
KW - Ordinal probit
KW - Semiparametric regression
ER -
TY - JOUR
TI - Mechanistic models of PLC/PKC signaling implicate phosphatidic acid as a key amplifier of chemotactic gradient sensing
AU - Nosbisch, Jamie L.
AU - Rahman, Anisur
AU - Mohan, Krithika
AU - Elston, Timothy C.
AU - Bear, James E.
AU - Haugh, Jason M.
T2 - PLOS COMPUTATIONAL BIOLOGY
AB - Chemotaxis of fibroblasts and other mesenchymal cells is critical for embryonic development and wound healing. Fibroblast chemotaxis directed by a gradient of platelet-derived growth factor (PDGF) requires signaling through the phospholipase C (PLC)/protein kinase C (PKC) pathway. Diacylglycerol (DAG), the lipid product of PLC that activates conventional PKCs, is focally enriched at the up-gradient leading edge of fibroblasts responding to a shallow gradient of PDGF, signifying polarization. To explain the underlying mechanisms, we formulated reaction-diffusion models including as many as three putative feedback loops based on known biochemistry. These include the previously analyzed mechanism of substrate-buffering by myristoylated alanine-rich C kinase substrate (MARCKS) and two newly considered feedback loops involving the lipid, phosphatidic acid (PA). DAG kinases and phospholipase D, the enzymes that produce PA, are identified as key regulators in the models. Paradoxically, increasing DAG kinase activity can enhance the robustness of DAG/active PKC polarization with respect to chemoattractant concentration while decreasing their whole-cell levels. Finally, in simulations of wound invasion, efficient collective migration is achieved with thresholds for chemotaxis matching those of polarization in the reaction-diffusion models. This multi-scale modeling framework offers testable predictions to guide further study of signal transduction and cell behavior that affect mesenchymal chemotaxis.
DA - 2020/4//
PY - 2020/4//
DO - 10.1371/journal.pcbi.1007708
VL - 16
IS - 4
SP -
SN - 1553-7358
ER -
TY - JOUR
TI - Model-free posterior inference on the area under the receiver operating characteristic curve
AU - Wang, Zhe
AU - Martin, Ryan
T2 - JOURNAL OF STATISTICAL PLANNING AND INFERENCE
AB - The area under the receiver operating characteristic curve (AUC) serves as a summary of a binary classifier’s performance. For inference on the AUC, a common modeling assumption is binormality, which restricts the distribution of the score produced by the classifier. However, this assumption introduces an infinite-dimensional nuisance parameter and may be restrictive in certain machine learning settings. To avoid making distributional assumptions, and to avoid the computational challenges of a fully nonparametric analysis, we develop a direct and model-free Gibbs posterior distribution for inference on the AUC. We present the asymptotic Gibbs posterior concentration rate, and a strategy for tuning the learning rate so that the corresponding credible intervals achieve the nominal frequentist coverage probability. Simulation experiments and a real data analysis demonstrate the Gibbs posterior’s strong performance compared to existing Bayesian methods.
DA - 2020/12//
PY - 2020/12//
DO - 10.1016/j.jspi.2020.03.008
VL - 209
SP - 174-186
SN - 1873-1171
KW - Credible interval
KW - Gibbs posterior
KW - Generalized bayesian inference
KW - Model misspecification
KW - Robustness
ER -
TY - JOUR
TI - Spine and dine: A key defensive trait promotes ecological success in spiny ants
AU - Blanchard, Benjamin D.
AU - Nakamura, Akihiro
AU - Cao, Min
AU - Chen, Stephanie T.
AU - Moreau, Corrie S.
T2 - ECOLOGY AND EVOLUTION
AB - Abstract A key focus of ecologists is explaining the origin and maintenance of morphological diversity and its association with ecological success. We investigate potential benefits and costs of a common and varied morphological trait, cuticular spines, for foraging behavior, interspecific competition, and predator–prey interactions in naturally co‐occurring spiny ants (Hymenoptera: Formicidae: Polyrhachis ) in an experimental setting. We expect that a defensive trait like spines might be associated with more conspicuous foraging, a greater number of workers sent out to forage, and potentially increased competitive ability. Alternatively, consistent with the ecological trade‐off hypothesis, we expect that investment in spines for antipredator defense might be negatively correlated with these other ecological traits. We find little evidence for any costs to ecological traits, instead finding that species with longer spines either outperform or do not differ from species with shorter spines for all tested metrics, including resource discovery rate and foraging effort as well as competitive ability and antipredator defense. Spines appear to confer broad antipredator benefits and serve as a form of defense with undetectable costs to key ecological abilities like resource foraging and competitive ability, providing an explanation for both the ecological success of the study genus and the large number of evolutionary origins of this trait across all ants. This study also provides a rare quantitative empirical test of ecological effects related to a morphological trait in ants.
DA - 2020/6//
PY - 2020/6//
DO - 10.1002/ece3.6322
VL - 10
IS - 12
SP - 5852-5863
SN - 2045-7758
KW - competition
KW - defense
KW - morphological trait
KW - predator-prey interactions
KW - spines
ER -
TY - JOUR
TI - Bayesian linear regression for multivariate responses under group sparsity
AU - Ning, Bo
AU - Jeong, Seonghyun
AU - Ghosal, Subhashis
T2 - BERNOULLI
AB - We study frequentist properties of a Bayesian high-dimensional multivariate linear regression model with correlated responses. The predictors are separated into many groups and the group structure is pre-determined. Two features of the model are unique: (i) group sparsity is imposed on the predictors; (ii) the covariance matrix is unknown and its dimensions can also be high. We choose a product of independent spike-and-slab priors on the regression coefficients and a new prior on the covariance matrix based on its eigendecomposition. Each spike-and-slab prior is a mixture of a point mass at zero and a multivariate density involving the $\ell_{2,1}$-norm. We first obtain the posterior contraction rate, the bounds on the effective dimension of the model with high posterior probabilities. We then show that the multivariate regression coefficients can be recovered under certain compatibility conditions. Finally, we quantify the uncertainty for the regression coefficients with frequentist validity through a Bernstein–von Mises type theorem. The result leads to selection consistency for the Bayesian method. We derive the posterior contraction rate using the general theory by constructing a suitable test from the first principle using moment bounds for certain likelihood ratios. This leads to posterior concentration around the truth with respect to the average Rényi divergence of order $1/2$. This technique of obtaining the required tests for posterior contraction rate could be useful in many other problems.
DA - 2020/8//
PY - 2020/8//
DO - 10.3150/20-BEJ1198
VL - 26
IS - 3
SP - 2353-2382
SN - 1573-9759
KW - Bayesian variable selection
KW - covariance matrix
KW - group sparsity
KW - multivariate linear regression
KW - posterior contraction rate
KW - Renyi divergence
KW - spike-and-slab prior
ER -
TY - JOUR
TI - Rating exotic price coverage in crop revenue insurance
AU - Ramsey, A. Ford
AU - Ghosh, Sujit K.
AU - Goodwin, Barry K.
T2 - AGRICULTURAL FINANCE REVIEW
AB - Purpose Revenue insurance is the most popular form of insurance available in the US federal crop insurance program. The majority of crop revenue policies are sold with a harvest price replacement feature that pays out on lost crop yields at the maximum of a realized or projected harvest price. The authors introduce a novel actuarial and statistical approach to rate revenue insurance policies with exotic price coverage: the payout depends on an order statistic or average of prices. The authors examine the price implications of different dependence models and demonstrate the feasibility of policies of this type. Design/methodology/approach Hierarchical Archimedean copulas and vine copulas are used to model dependence between prices and yields and serial dependence of prices. The authors construct several synthetic exotic price coverage insurance policies and evaluate the impact of copula models on policies covering different types of risk. Findings The authors’ findings show that the price of exotic price coverage policies is sensitive to the choice of dependence model. Serial dependence varies across the growing season. It is possible to accurately price exotic coverage policies and we suggest these add-ons as a possible avenue for developing private crop insurance markets. Originality/value The authors apply hierarchical Archimedean copulas and vine copulas that allow for flexibility in the modeling of multivariate dependence. Unlike previous research, which has primarily considered dependence across space, the form of exotic price coverage requires modeling serial dependence in relative prices. Results are important for this segment of the agricultural insurance market: one of the main areas that insurers can develop private products around the federal program.
DA - 2020///
PY - 2020///
DO - 10.1108/AFR-10-2019-0107
VL - 80
IS - 5
SP - 609-631
SN - 2041-6326
KW - Crop revenue insurance
KW - Nested copulas
KW - Domestic credit
KW - Exotic price coverage
ER -
TY - JOUR
TI - Exploring the Usefulness of Meteorological Data for Predicting Malaria Cases in Visakhapatnam, Andhra Pradesh
AU - Sehgal, Meena
AU - Ghosh, Sujit
T2 - WEATHER CLIMATE AND SOCIETY
AB - Abstract Malaria and dengue fever are among the most important vectorborne diseases in the tropics and subtropics. Average weekly meteorological parameters—specifically, minimum temperature, maximum temperature, humidity, and rainfall—were collected using data from 100 automated weather stations from the Indian Space Research Organization. We obtained district-level weekly reported malaria cases from the Integrated Disease Surveillance Program (IDSP), Department of Health and Family Welfare, Andhra Pradesh, India, for three years, 2014–16. We used a generalized linear model with Poisson distribution and default logarithm-link to estimate model parameters, and we used a quasi-Poisson method with a generalized additive model that uses nonparametric regression with smoothing splines. It appears that higher minimum temperatures (e.g., >24°C) tend to lead to higher malaria counts but lower values do not seem to have an impact on the malaria counts. On the other hand, higher values of maximum temperature (e.g., >32°C) seem to negatively affect the malaria counts. The relationships with rainfall and humidity appear to be not as strong once we account for smooth (weekly) trends and temperatures; both smooth curves seem to hover around zero across all of their values. We note that a rainfall amount between 40 and 50 mm seems to have a positive impact on malaria counts. Our analyses show that the incremental increase in meteorological parameters does not lead to an increase in reported malaria cases in the same manner for all of the districts within the same state. This suggests that other factors such as vegetation, elevation, and water index in the environment also influence disease occurrence.
DA - 2020/4//
PY - 2020/4//
DO - 10.1175/WCAS-D-19-0029.1
VL - 12
IS - 2
SP - 323-330
SN - 1948-8335
KW - Atmosphere
KW - Asia
KW - Air quality
KW - Climate change
ER -
TY - JOUR
TI - Variable selection in functional linear concurrent regression
AU - Ghosal, Rahul
AU - Maity, Arnab
AU - Clark, Timothy
AU - Longo, Stefano B.
T2 - JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS
AB - Summary We propose a novel method for variable selection in functional linear concurrent regression. Our research is motivated by a fisheries footprint study where the goal is to identify important time-varying sociostructural drivers influencing patterns of seafood consumption, and hence the fisheries footprint, over time, as well as estimating their dynamic effects. We develop a variable-selection method in functional linear concurrent regression extending the classically used scalar-on-scalar variable-selection methods like the lasso, smoothly clipped absolute deviation (SCAD) and minimax concave penalty (MCP). We show that in functional linear concurrent regression the variable-selection problem can be addressed as a group lasso, and their natural extension: the group SCAD or a group MCP problem. Through simulations, we illustrate that our method, particularly with the group SCAD or group MCP, can pick out the relevant variables with high accuracy and has minuscule false positive and false negative rate even when data are observed sparsely, are contaminated with noise and the error process is highly non-stationary. We also demonstrate two real data applications of our method in studies of dietary calcium absorption and fisheries footprint in the selection of influential time-varying covariates.
DA - 2020/6//
PY - 2020/6//
DO - 10.1111/rssc.12408
VL - 69
IS - 3
SP - 565-587
SN - 1467-9876
KW - Fisheries footprint
KW - Functional linear concurrent regression
KW - Variable selection
ER -
TY - JOUR
TI - Form-stable phase-change elastomer gels derived from thermoplastic elastomer copolyesters swollen with fatty acids
AU - Armstrong, Daniel P.
AU - Chatterjee, Kony
AU - Ghosh, Tushar K.
AU - Spontak, Richard J.
T2 - THERMOCHIMICA ACTA
AB - Phase-change materials (PCMs) are of considerable scientific and technological interest in applications related to energy management and storage, especially as they pertain to residential or commercial construction and packaging. Most PCMs developed for these purposes consist of a crystallizable species encapsulated within an impermeable polymeric shell. Such encapsulants can then be strategically embedded throughout a construct to promote thermal stability in close proximity to the normal melting point of the encapsulated species. In this study, we introduce form-stable PCMs, which avoid the need for costly and inconvenient encapsulation and consist of commercial thermoplastic elastomer copolyesters selectively swollen with crystallizable fatty acids. Since the copolyester matrices endow the PCMs with solid-like characteristics even when swollen with liquid, we refer to this particular class of materials as phase-change elastomer gels (PCEGs). In this study, we explore the thermal characteristics of PCEG films wherein the copolyester grade, gel composition and fatty acid are all varied. Our results indicate that these PCEGs exhibit non-hysteretic thermal cycling, unaffected transition temperatures, and competitive latent transition heats. Relative to model and commercially available encapsulated PCMs, the form-stable PCEGs examined here afford an alternative capable of superior thermal performance and versatility.
DA - 2020/4//
PY - 2020/4//
DO - 10.1016/j.tca.2020.178566
VL - 686
SP -
SN - 1872-762X
KW - Thermoplastic elastomer
KW - Physical crosslinking
KW - Thermal storage
KW - Phase-change material
KW - Energy conservation
ER -
TY - JOUR
TI - A STATISTICAL ANALYSIS OF NOISY CROWDSOURCED WEATHER DATA
AU - Chakraborty, Arnab
AU - Lahiri, Soumendra Nath
AU - Wilson, Alyson
T2 - ANNALS OF APPLIED STATISTICS
AB - Spatial prediction of weather elements like temperature, precipitation, and barometric pressure are generally based on satellite imagery or data collected at ground stations. None of these data provide information at a more granular or “hyperlocal” resolution. On the other hand, crowdsourced weather data, which are captured by sensors installed on mobile devices and gathered by weather-related mobile apps like WeatherSignal and AccuWeather, can serve as potential data sources for analyzing environmental processes at a hyperlocal resolution. However, due to the low quality of the sensors and the nonlaboratory environment, the quality of the observations in crowdsourced data is compromised. This paper describes methods to improve hyperlocal spatial prediction using this varying-quality, noisy crowdsourced information. We introduce a reliability metric, namely Veracity Score (VS), to assess the quality of the crowdsourced observations using a coarser, but high-quality, reference data. A VS-based methodology to analyze noisy spatial data is proposed and evaluated through extensive simulations. The merits of the proposed approach are illustrated through case studies analyzing crowdsourced daily average ambient temperature readings for one day in the contiguous United States.
DA - 2020/3//
PY - 2020/3//
DO - 10.1214/19-AOAS1290
VL - 14
IS - 1
SP - 116-142
SN - 1932-6157
KW - Veracity score
KW - geostatistics
KW - robust kriging
KW - hyperlocal spatial prediction
ER -
TY - JOUR
TI - FastLORS: Joint modelling for expression quantitative trait loci mapping in R
AU - Rhyne, Jacob
AU - Jeng, X. Jessie
AU - Chi, Eric C.
AU - Tzeng, Jung-Ying
T2 - STAT
AB - FastLORS is a software package that implements a new algorithm to solve sparse multivariate regression for expression quantitative trait loci (eQTLs) mapping. FastLORS solves the same optimization problem as LORS, an existing popular algorithm. The optimization problem is solved through inexact block coordinate descent with updates by proximal gradient steps, which reduces the computational cost compared with LORS. We apply LORS and FastLORS to a real dataset for eQTL mapping and demonstrate that FastLORS delivers comparable results with LORS in much less computing time.
DA - 2020///
PY - 2020///
DO - 10.1002/sta4.265
VL - 9
IS - 1
SP -
SN - 2049-1573
UR - https://doi.org/10.1002/sta4.265
KW - block coordinate descent
KW - eQTL mapping
KW - low-rank approximation
KW - proximal gradient descent
KW - sparse regression
ER -
TY - JOUR
TI - Untargeted metabolomic profiling identifies disease-specific signatures in food allergy and asthma
AU - Crestani, Elena
AU - Harb, Hani
AU - Charbonnier, Louis-Marie
AU - Leirer, Jonathan
AU - Motsinger-Reif, Alison
AU - Rachid, Rima
AU - Phipatanakul, Wanda
AU - Kaddurah-Daouk, Rima
AU - Chatila, Talal A.
T2 - JOURNAL OF ALLERGY AND CLINICAL IMMUNOLOGY
AB - BackgroundFood allergy (FA) affects an increasing proportion of children for reasons that remain obscure. Novel disease biomarkers and curative treatment options are strongly needed.ObjectiveWe sought to apply untargeted metabolomic profiling to identify pathogenic mechanisms and candidate disease biomarkers in patients with FA.MethodsMass spectrometry–based untargeted metabolomic profiling was performed on serum samples of children with either FA alone, asthma alone, or both FA and asthma, as well as healthy pediatric control subjects.ResultsIn this pilot study patients with FA exhibited a disease-specific metabolomic signature compared with both control subjects and asthmatic patients. In particular, FA was uniquely associated with a marked decrease in sphingolipid levels, as well as levels of a number of other lipid metabolites, in the face of normal frequencies of circulating natural killer T cells. Specific comparison of patients with FA and asthmatic patients revealed differences in the microbiota-sensitive aromatic amino acid and secondary bile acid metabolism. Children with both FA and asthma exhibited a metabolomic profile that aligned with that of FA alone but not asthma. Among children with FA, the history of severe systemic reactions and the presence of multiple FAs were associated with changes in levels of tryptophan metabolites, eicosanoids, plasmalogens, and fatty acids.ConclusionsChildren with FA have a disease-specific metabolomic profile that is informative of disease mechanisms and severity and that dominates in the presence of asthma. Lower levels of sphingolipids and ceramides and other metabolomic alterations observed in children with FA might reflect the interplay between an altered microbiota and immune cell subsets in the gut. Food allergy (FA) affects an increasing proportion of children for reasons that remain obscure. Novel disease biomarkers and curative treatment options are strongly needed. We sought to apply untargeted metabolomic profiling to identify pathogenic mechanisms and candidate disease biomarkers in patients with FA. Mass spectrometry–based untargeted metabolomic profiling was performed on serum samples of children with either FA alone, asthma alone, or both FA and asthma, as well as healthy pediatric control subjects. In this pilot study patients with FA exhibited a disease-specific metabolomic signature compared with both control subjects and asthmatic patients. In particular, FA was uniquely associated with a marked decrease in sphingolipid levels, as well as levels of a number of other lipid metabolites, in the face of normal frequencies of circulating natural killer T cells. Specific comparison of patients with FA and asthmatic patients revealed differences in the microbiota-sensitive aromatic amino acid and secondary bile acid metabolism. Children with both FA and asthma exhibited a metabolomic profile that aligned with that of FA alone but not asthma. Among children with FA, the history of severe systemic reactions and the presence of multiple FAs were associated with changes in levels of tryptophan metabolites, eicosanoids, plasmalogens, and fatty acids. Children with FA have a disease-specific metabolomic profile that is informative of disease mechanisms and severity and that dominates in the presence of asthma. Lower levels of sphingolipids and ceramides and other metabolomic alterations observed in children with FA might reflect the interplay between an altered microbiota and immune cell subsets in the gut.
DA - 2020/3//
PY - 2020/3//
DO - 10.1016/j.jaci.2019.10.014
VL - 145
IS - 3
SP - 897-906
SN - 1097-6825
KW - Asthma
KW - food allergy
KW - invariant natural killer T cells
KW - metabolomics
KW - metabolites
KW - secondary bile acids
KW - sphingolipids
KW - tryptophan
ER -
TY - JOUR
TI - Aridity Trends in Central America: A Spatial Correlation Analysis
AU - Córdoba, Marcela Alfaro
AU - Hidalgo, Hugo
AU - Alfaro, Eric
T2 - Atmosphere
AB - Trend analyses are common in several types of climate change studies. In many cases, finding evidence that the trends are different from zero in hydroclimate variables is of particular interest. However, when estimating the confidence interval of a set of hydroclimate stations or gridded data the spatial correlation between can affect the significance assessment using for example traditional non-parametric and parametric methods. For this reason, Monte Carlo simulations are needed in order to generate maps of corrected trend significance. In this article, we determined the significance of trends in aridity, modeled runoff using the Variable Infiltration Capacity Macroscale Hydrological model, Hagreaves potential evapotranspiration (PET) and near-surface temperature in Central America. Linear-regression models were fitted considering that the predictor variable is the time variable (years from 1970 to 1999) and predictand variable corresponds to each of the previously mentioned hydroclimate variables. In order to establish if the temporal trends were significantly different from zero, a Mann Kendall and a Monte Carlo test were used. The spatial correlation was calculated first to correct the variance of each trend. It was assumed in this case that the trends form a spatial stochastic process that can be modeled as such. Results show that the analysis considering the spatial correlation proposed here can be used for identifying those extreme trends. However, a set of variables with strong spatial correlation such as temperature can have robust and widespread significant trends assuming independence, but the vast majority of the stations can still fail the Monte Carlo test. We must be vigilant of the statistically robust changes in key primary parameters such as temperature and precipitation, which are the driving sources of hydrological alterations that may affect social and environmental systems in the future.
DA - 2020/4/23/
PY - 2020/4/23/
DO - 10.3390/atmos11040427
UR - http://dx.doi.org/10.3390/atmos11040427
KW - aridity
KW - Central American climate
KW - spatial correlation
KW - trend analysis
KW - variability
ER -
TY - JOUR
TI - Tuning parameter selection for penalised empirical likelihood with a diverging number of parameters
AU - Zheng, Chaowen
AU - Wu, Yichao
T2 - JOURNAL OF NONPARAMETRIC STATISTICS
AB - Penalised likelihood methods have been a success in analysing high dimensional data. Tang and Leng [(2010), ‘Penalized High-Dimensional Empirical Likelihood’, Biometrika, 97(4), 905–920] extended the penalisation approach to the empirical likelihood scenario and showed that the penalised empirical likelihood estimator could identify the true predictors consistently in the linear regression models. However, this desired selection consistency property of the penalised empirical likelihood method relies heavily on the choice of the tuning parameter. In this work, we propose a tuning parameter selection procedure for penalised empirical likelihood to guarantee that this selection consistency can be achieved. Specifically, we propose a generalised information criterion (GIC) for the penalised empirical likelihood in the linear regression case. We show that the tuning parameter selected by the GIC yields the true model consistently even when the number of predictors diverges to infinity with the sample size. We demonstrate the performance of our procedure by numerical simulations and a real data analysis.
DA - 2020/1/2/
PY - 2020/1/2/
DO - 10.1080/10485252.2020.1717491
VL - 32
IS - 1
SP - 246-261
SN - 1029-0311
KW - Tuning parameter selection
KW - variable selection
KW - generalised information criterion
KW - empirical likelihood
ER -
TY - JOUR
TI - Incorporating Nearest-Neighbor Site Dependence into Protein Evolution Models
AU - Larson, Gary
AU - Thorne, Jeffrey L.
AU - Schmidler, Scott
T2 - JOURNAL OF COMPUTATIONAL BIOLOGY
AB - Evolutionary models of proteins are widely used for statistical sequence alignment and inference of homology and phylogeny. However, the vast majority of these models rely on an unrealistic assumption of independent evolution between sites. Here we focus on the related problem of protein structure alignment, a classic tool of computational biology that is widely used to identify structural and functional similarity and to infer homology among proteins. A site-independent statistical model for protein structural evolution has previously been introduced and shown to significantly improve alignments and phylogenetic inferences compared with approaches that utilize only amino acid sequence information. Here we extend this model to account for correlated evolutionary drift among neighboring amino acid positions. The result is a spatiotemporal model of protein structure evolution, described by a multivariate diffusion process convolved with a spatial birth–death process. This extended site-dependent model (SDM) comes with little additional computational cost or analytical complexity compared with the site-independent model (SIM). We demonstrate that this SDM yields a significant reduction of bias in estimated evolutionary distances and helps further improve phylogenetic tree reconstruction. We also develop a simple model of site-dependent sequence evolution, which we use to demonstrate the bias resulting from the application of standard site-independent sequence evolution models.
DA - 2020/3/1/
PY - 2020/3/1/
DO - 10.1089/cmb.2019.0500
VL - 27
IS - 3
SP - 361-375
SN - 1557-8666
KW - diffusion process
KW - dynamic programming
KW - evolution
KW - phylogeny
KW - protein structure
ER -
TY - JOUR
TI - Evidence for temperature-dependent shifts in spawning times of anadromous alewife (Alosa pseudoharengus) and blueback herring (Alosa aestivalis)
AU - Lombardo, Steven M.
AU - Buckel, Jeffrey A.
AU - Hain, Ernie F.
AU - Griffith, Emily H.
AU - White, Holly
T2 - CANADIAN JOURNAL OF FISHERIES AND AQUATIC SCIENCES
AB - We analyzed four decades of presence–absence data from a fishery-independent survey to characterize the long-term phenology of river herring (alewife, Alosa pseudoharengus; and blueback herring, Alosa aestivalis) spawning migrations in their southern distribution. We used logistic generalized additive models to characterize the average ingress, peak, and egress timing of spawning. In the 2010s, alewife arrived to spawning habitat 16 days earlier and egressed 27 days earlier (peak 12 days earlier) relative to the 1970s. Blueback herring arrived 5 days earlier and egressed 23 days earlier (peak 13 days earlier) in the 2010s relative to the 1980s. The changes in ingress and egress timing have shortened the occurrence in spawning systems by 11 days for alewife over four decades and 18 days for blueback herring over three decades. We found that the rate of vernal warming was faster during 2001–2016 relative to 1973–1988 and is the most parsimonious explanation for changes in spawning phenology. The influence of a shortened spawning season on river herring population dynamics warrants further investigation.
DA - 2020/4//
PY - 2020/4//
DO - 10.1139/cjfas-2019-0140
VL - 77
IS - 4
SP - 741-751
SN - 1205-7533
ER -
TY - JOUR
TI - Probabilistic Detection and Estimation of Conic Sections From Noisy Data
AU - Guha, Subharup
AU - Ghosh, Sujit K.
T2 - JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS
AB - Subharup Guhaa* & Sujit K. Ghoshb a Department of Biostatistics, University of Florida, Gainesville, FL; b Department of Statistics, North Carolina State University, Raleigh, NC
DA - 2020/7/2/
PY - 2020/7/2/
DO - 10.1080/10618600.2020.1737084
VL - 29
IS - 3
SP - 513-522
SN - 1537-2715
UR - https://doi.org/10.1080/10618600.2020.1737084
KW - Bayesian hierarchical model
KW - Bernstein basis polynomials
KW - Focus-directrix approach
KW - Markov chain Monte Carlo
KW - Metropolis-Hastings algorithm
KW - Partial conics
ER -
TY - JOUR
TI - Correlation models for monitoring fetal growth
AU - Feng, Yuan
AU - Xiao, Luo
AU - Li, Cai
AU - Chen, Stephanie T.
AU - Ohuma, Eric O.
T2 - STATISTICAL METHODS IN MEDICAL RESEARCH
AB - Ultrasound growth measurements are monitored to evaluate if a fetus is growing normally compared with a defined standard chart at a specified gestational age. Using data from the Fetal Growth Longitudinal Study of the INTERGROWTH-21 st project, we have modelled the longitudinal dependence of fetal head circumference, biparietal diameter, occipito-frontal diameter, abdominal circumference, and femur length using a two-stage approach. The first stage involved finding a suitable transformation of the raw fetal measurements (as the marginal distributions of ultrasound measurements were non-normal) to standardized deviations (Z-scores). In the second stage, a correlation model for a Gaussian process is fitted, yielding a correlation for any pair of observations made between 14 and 40 weeks. The correlation structure of the fetal Z-score can be used to assess whether the growth, for example, between successive measurements is satisfactory. The paper is accompanied by a Shiny application, see https://lxiao5.shinyapps.io/shinycalculator/ .
DA - 2020/10//
PY - 2020/10//
DO - 10.1177/0962280220905623
VL - 29
IS - 10
SP - 2795-2813
SN - 1477-0334
KW - Fetal health
KW - longitudinal study
KW - correlation
KW - reference chart
ER -
TY - JOUR
TI - Using Irrigation to Increase Stormwater Mitigation Potential of Rainwater Harvesting Systems
AU - Gee, K. D.
AU - Hunt, W. F.
AU - Peacock, C. H.
AU - Woodward, M. D.
AU - Arellano, C.
T2 - JOURNAL OF SUSTAINABLE WATER IN THE BUILT ENVIRONMENT
AB - Rainwater harvesting (RWH) systems used for irrigation often provide fewer stormwater management benefits than systems used for year-round, nondiscretionary purposes because there is diminished demand for harvested rainwater during the nongrowing season or rainy periods. Thus, identifying demands during these periods would improve the stormwater mitigation potential of RWH systems. This study evaluated how irrigating bermudagrass year-round at rates exceeding those for minimum water conservation affected the stormwater benefits provided by an RWH system. Results indicated significant increases in runoff volume retention when turf was irrigated at 25 and 50 mm/week, compared to an evapotranspiration/effective precipitation (or agronomic)–based regime. While overall soil moisture content increased with irrigation rate, there were no concomitant increases in pest occurrences or runoff generation. Turf quality did not differ from the control irrigation regime for either application rate, and there were no indications of soil nitrate leaching. Irrigating at rates up to 50 mm/week resulted in stormwater volume reductions up to 65% without causing a decline in turf quality.
DA - 2020/5/1/
PY - 2020/5/1/
DO - 10.1061/JSWBAY.0000913
VL - 6
IS - 2
SP -
SN - 2379-6111
ER -
TY - JOUR
TI - Water Quality and Hydrologic Performance of Two Dry Detention Basins Receiving Highway Stormwater Runoff in the Piedmont Region of North Carolina
AU - Wissler, Austin D.
AU - Hunt, William F.
AU - McLaughlin, Richard A.
T2 - JOURNAL OF SUSTAINABLE WATER IN THE BUILT ENVIRONMENT
AB - Dry detention basins (DDBs) are a stormwater control measure (SCM) designed to provide flood storage, peak discharge abatement, and some water quality improvement through sedimentation; however, little data characterize DDB water quality performance in the highway environment. In this study, two DDBs [Hughes Farm Road and Poole Road basin (HFR and PRB henceforth)], constructed in 2010, mowed twice a year, receiving highway runoff, and located in the Piedmont of North Carolina (NC), USA, were monitored for up to 11 months. Flow-weighted composite samples were collected during storm events and analyzed for total phosphorus (TP); ortho-phosphorus (OP); ammonia (NH3); nitrate-nitrite (NOX); total Kjeldahl nitrogen (TKN); total suspended solids (TSS); and total Cd, Cu, Pb, and Zn. Influent runoff concentrations were similar to other studies in NC, and the monitoring revealed significant concentration reductions for most constituents in HFR. PRB significantly reduced concentrations for all pollutants except TSS, particulate phosphorous, and NH3, while significantly exporting Zn. HFR exhibited soil infiltration that led to significant pollutant load reductions (LRs) for all analytes except Cu. PRB exhibited little infiltration but had significant LRs for dissolved nutrients. This study provides evidence that DDB inlet and outlet configuration and the presence of standing water may impact DDB water quality improvement.
DA - 2020/5/1/
PY - 2020/5/1/
DO - 10.1061/JSWBAY.0000915
VL - 6
IS - 2
SP -
SN - 2379-6111
ER -
TY - JOUR
TI - A Functional Metric Approach to Assess Biosimilarity With Application to Rheumatoid Arthritis Trials
AU - Ghosh, Sujit K.
AU - Dong, Lin
T2 - STATISTICS IN BIOPHARMACEUTICAL RESEARCH
AB - In recent years there has been a lot of interest to test for similarity between biological drug products, commonly known as biologics. Biologics are large and complex molecule drugs that are produced by living cells and hence these are sensitive to the environmental changes. In addition, biologics usually induce antibodies which raise the safety and efficacy issues. The manufacturing process is also much more complicated and often costlier than the small-molecule generic drugs. Because of these complexities and inherent variability of the biologics, the testing paradigm of the traditional generic drugs cannot be directly used to test for biosimilarity. Taking into account some of these concerns we propose a functional distance based methodology that takes into consideration the entire time course of the study and is based on a class of flexible semiparametric models. The empirical results show that the proposed approach is more sensitive than the classical equivalence tests approach which are usually based on arbitrarily chosen time point. Bootstrap based methodologies are also presented for statistical inference.
DA - 2020/4/2/
PY - 2020/4/2/
DO - 10.1080/19466315.2020.1733071
VL - 12
IS - 2
SP - 234-243
SN - 1946-6315
UR - https://doi.org/10.1080/19466315.2020.1733071
KW - Bernstein polynomials
KW - Binary responses
KW - Rheumatic arthritis
KW - Semiparametric models
ER -
TY - JOUR
TI - Low-Dose Silver Nanoparticle Surface Chemistry and Temporal Effects on Gene Expression in Human Liver Cells
AU - House, John S.
AU - Bouzos, Evangelia
AU - Fahy, Kira M.
AU - Francisco, Victorino Miguel
AU - Lloyd, Dillon T.
AU - Wright, Fred A.
AU - Motsinger-Reif, Alison A.
AU - Asuri, Prashanth
AU - Wheeler, Korin E.
T2 - SMALL
AB - Silver nanoparticles (AgNPs) are widely incorporated into consumer and biomedical products for their antimicrobial and plasmonic properties with limited risk assessment of low-dose cumulative exposure in humans. To evaluate cellular responses to low-dose AgNP exposures across time, human liver cells (HepG2) are exposed to AgNPs with three different surface charges (1.2 µg mL-1 ) and complete gene expression is monitored across a 24 h period. Time and AgNP surface chemistry mediate gene expression. In addition, since cells are fed, time has marked effects on gene expression that should be considered. Surface chemistry of AgNPs alters gene transcription in a time-dependent manner, with the most dramatic effects in cationic AgNPs. Universal to all surface coatings, AgNP-treated cells responded by inactivating proliferation and enabling cell cycle checkpoints. Further analysis of these universal features of AgNP cellular response, as well as more detailed analysis of specific AgNP treatments, time points, or specific genes, is facilitated with an accompanying application. Taken together, these results provide a foundation for understanding hepatic response to low-dose AgNPs for future risk assessment.
DA - 2020/5//
PY - 2020/5//
DO - 10.1002/smll.202000299
VL - 16
IS - 21
SP -
SN - 1613-6829
KW - nanotoxicity
KW - silver nanoparticles
KW - transcriptomics
ER -
TY - JOUR
TI - Effective SNP ranking improves the performance of eQTL mapping
AU - Jeng, X. Jessie
AU - Rhyne, Jacob
AU - Zhang, Teng
AU - Tzeng, Jung-Ying
T2 - GENETIC EPIDEMIOLOGY
AB - Abstract Genome‐wide expression quantitative trait loci (eQTLs) mapping explores the relationship between gene expression and DNA variants, such as single‐nucleotide polymorphism (SNPs), to understand genetic basis of human diseases. Due to the large number of genes and SNPs that need to be assessed, current methods for eQTL mapping often suffer from low detection power, especially for identifying trans ‐eQTLs. In this paper, we propose the idea of performing SNP ranking based on the higher criticism statistic, a summary statistic developed in large‐scale signal detection. We illustrate how the HC‐based SNP ranking can effectively prioritize eQTL signals over noise, greatly reduce the burden of joint modeling, and improve the power for eQTL mapping. Numerical results in simulation studies demonstrate the superior performance of our method compared to existing methods. The proposed method is also evaluated in HapMap eQTL data analysis and the results are compared to a database of known eQTLs.
DA - 2020/9//
PY - 2020/9//
DO - 10.1002/gepi.22293
VL - 44
IS - 6
SP - 611-619
SN - 1098-2272
KW - HC ranking
KW - hotspot
KW - multivariate response
KW - penalized regression
KW - trans-eQTL
ER -
TY - JOUR
TI - Modeling buffer capacity and pH in acid and acidified foods
AU - Price, Robert E.
AU - Longtin, Madyson
AU - Conley-Payton, Summer
AU - Osborne, Jason A.
AU - Johanningsmeier, Suzanne D.
AU - Bitzer, Donald
AU - Breidt, Fred
T2 - JOURNAL OF FOOD SCIENCE
AB - Standard ionic equilibria equations may be used for calculating pH of weak acid and base solutions. These calculations are difficult or impossible to solve analytically for foods that include many unknown buffering components, making pH prediction in these systems impractical. We combined buffer capacity (BC) models with a pH prediction algorithm to allow pH prediction in complex food matrices from BC data. Numerical models were developed using Matlab software to estimate the pH and buffering components for mixtures of weak acid and base solutions. The pH model was validated with laboratory solutions of acetic or citric acids with ammonia, in combinations with varying salts using Latin hypercube designs. Linear regressions of observed versus predicted pH values based on the concentration and pK values of the solution components resulted in estimated slopes between 0.96 and 1.01 with and without added salts. BC models were generated from titration curves for 0.6 M acetic acid or 12.4 mM citric acid resulting in acid concentration and pK estimates. Predicted pH values from these estimates were within 0.11 pH units of the measured pH. Acetic acid concentration measurements based on the model were within 6% accuracy compared to high-performance liquid chromatography measurements for concentrations less than 400 mM, although they were underestimated above that. The models may have application for use in determining the BC of food ingredients with unknown buffering components. Predicting pH changes for food ingredients using these models may be useful for regulatory purposes with acid or acidified foods and for product development. PRACTICAL APPLICATION: Buffer capacity models may benefit regulatory agencies and manufacturers of acid and acidified foods to determine pH stability (below pH 4.6) and how low-acid food ingredients may affect the safety of these foods. Predicting pH for solutions with known or unknown buffering components was based on titration data and models that use only monoprotic weak acids and bases. These models may be useful for product development and food safety by estimating pH and buffering capacity.
DA - 2020/4//
PY - 2020/4//
DO - 10.1111/1750-3841.15091
VL - 85
IS - 4
SP - 918-925
SN - 1750-3841
KW - acid
KW - base
KW - acid foods
KW - acidified foods
KW - buffer capacity
KW - buffer model
KW - pH
ER -
TY - JOUR
TI - Managing a Destructive, Episodic Crop Disease: A National Survey of Wheat and Barley Growers' Experience With Fusarium Head Blight
AU - Cowger, Christina
AU - Smith, Joy
AU - Boos, Dennis
AU - Bradley, Carl A.
AU - Ransom, Joel
AU - Bergstrom, Gary C.
T2 - PLANT DISEASE
AB - The main techniques for minimizing Fusarium head blight (FHB, or scab) and deoxynivalenol in wheat and barley are well established and generally available: planting of moderately FHB-resistant cultivars, risk monitoring, and timely use of the most effective fungicides. Yet the adoption of these techniques remains uneven across the FHB-prone portions of the U.S. cereal production area. A national survey was undertaken by the U.S. Wheat and Barley Scab Initiative in 17 states where six market classes of wheat and barley are grown. In 2014, 5,107 usable responses were obtained. The highest percentages reporting losses attributable to FHB in the previous 5 years were in North Dakota, Maryland, Kentucky, and states bordering the Great Lakes but across all states, ≥75% of respondents reported no FHB-related losses in the previous 5 years. Adoption of cultivar resistance was uneven by state and market class and was low except among hard red spring wheat growers. In 13 states, a majority of respondents had not applied an FHB-targeted fungicide in the previous 5 years. Although the primary FHB information source varied by state, crop consultants were considered to be an important source or their primary source of information on risk or management of FHB by the largest percentage of respondents. Use of an FHB risk forecasting website was about twice as high in North Dakota as the 17-state average of 6%. The most frequently cited barriers to adopting FHB management practices were weather or logistics preventing timely fungicide application, difficulty in determining flowering timing for fungicide applications, and the impracticality of FHB-reducing rotations. The results highlight the challenges of managing an episodically damaging crop disease and point to specific areas for improvement.
DA - 2020/3//
PY - 2020/3//
DO - 10.1094/PDIS-10-18-1803-SR
VL - 104
IS - 3
SP - 634-648
SN - 1943-7692
KW - cereals and grains
KW - chemical cultivar/resistance
KW - disease management
KW - disease warning systems
KW - epidemiology
KW - field crops
KW - fungi
ER -
TY - JOUR
TI - Growth performance, oxidative stress and immune status of newly weaned pigs fed peroxidized lipids with or without supplemental vitamin E or polyphenols
AU - Silva-Guillen, Y. V.
AU - Arellano, C.
AU - Boyd, R. D.
AU - Martinez, G.
AU - Heugten, E.
T2 - JOURNAL OF ANIMAL SCIENCE AND BIOTECHNOLOGY
AB - This study evaluated the use of dietary vitamin E and polyphenols on growth, immune and oxidative status of weaned pigs fed peroxidized lipids. A total of 192 piglets (21 days of age and body weight of 6.62 ± 1.04 kg) were assigned within sex and weight blocks to a 2 × 3 factorial arrangement using 48 pens with 4 pigs per pen. Dietary treatments consisted of lipid peroxidation (6% edible soybean oil or 6% peroxidized soybean oil), and antioxidant supplementation (control diet containing 33 IU/kg DL-α-tocopheryl-acetate; control with 200 IU/kg additional dl-α-tocopheryl-acetate; or control with 400 mg/kg polyphenols). Pigs were fed in 2 phases for 14 and 21 days, respectively.Peroxidation of oil for 12 days at 80 °C with exposure to 50 L/min of air substantially increased peroxide values, anisidine value, hexanal, and 2,4-decadienal concentrations. Feeding peroxidized lipids decreased (P < 0.001) body weight (23.16 vs. 18.74 kg), daily gain (473 vs. 346 g/d), daily feed intake (658 vs. 535 g/d) and gain:feed ratio (719 vs. 647 g/kg). Lipid peroxidation decreased serum vitamin E (P < 0.001) and this decrease was larger on day 35 (1.82 vs. 0.81 mg/kg) than day 14 (1.95 vs. 1.38 mg/kg). Supplemental vitamin E, but not polyphenols, increased (P ≤ 0.002) serum vitamin E by 84% and 22% for control and peroxidized diets, respectively (interaction, P = 0.001). Serum malondialdehyde decreased (P < 0.001) with peroxidation on day 14, but not day 35 and protein carbonyl increased (P < 0.001) with peroxidation on day 35, but not day 14. Serum 8-hydroxydeoxyguanosine was not affected (P > 0.05). Total antioxidant capacity decreased with peroxidation (P < 0.001) and increased with vitamin E (P = 0.065) and polyphenols (P = 0.046) for the control oil diet only. Serum cytokine concentrations increased with feeding peroxidized lipids on day 35, but were not affected by antioxidant supplementation (P > 0.05).Feeding peroxidized lipids negatively impacted growth performance and antioxidant capacity of nursery pigs. Supplementation of vitamin E and polyphenols improved total antioxidant capacity, especially in pigs fed control diets, but did not restore growth performance.
DA - 2020/3/5/
PY - 2020/3/5/
DO - 10.1186/s40104-020-0431-9
VL - 11
IS - 1
SP -
SN - 2049-1891
KW - Antioxidants
KW - Immune status
KW - Lipid peroxidation
KW - Oxidative stress
KW - Piglets
KW - Polyphenols
KW - Vitamin E
ER -
TY - JOUR
TI - Heatwave duration: Characterizations using probabilistic inference
AU - Raha, Sohini
AU - Ghosh, Sujit K.
T2 - ENVIRONMETRICS
AB - Abstract Characterization of heatwave duration is becoming increasingly important in environmental research as they pose a significant threat to many human lives worldwide. Although several quantification of the extremities of a heatwave have been proposed in literature, they are mostly improvised and there does not exist a universally accepted definition of heatwave. In this article, we devise a probabilistic inferential framework to characterize heatwave and come up with a definition that can capture the essence of all existing ad hoc definitions. We derive an exact distribution on the frequency of such durations for a stationary Markov process and also an approximate distribution of durations for a stationary non‐Markov time series. For a given site, using a daily time series (of ambient temperature or heat‐index), we define a heatwave as the number of sustained days above a given threshold using the probability distribution of the durations. We illustrate the proposed methodology using daily time series of ambient temperature for a fixed site (of Atlanta) and also using the USCRN consisting of 126 sites across the United States. Furthermore, we also derive an empirical quadratic curve based relationship between expected durations and extreme thresholds. The proofs of the theorems, datasets, algorithms, and computer codes are provided in the supplementary materials.
DA - 2020/8//
PY - 2020/8//
DO - 10.1002/env.2626
VL - 31
IS - 5
SP -
SN - 1099-095X
UR - https://doi.org/10.1002/env.2626
KW - Bayesian
KW - hierarchical model
KW - Poisson approximation
KW - sum of dependent Bernoulli sequence
ER -
TY - JOUR
TI - Hydrologic and water quality performance of two aging and unmaintained dry detention basins receiving highway stormwater runoff
AU - Wissler, Austin D.
AU - Hunt, William F.
AU - McLaughlin, Richard A.
T2 - JOURNAL OF ENVIRONMENTAL MANAGEMENT
AB - Dry detention basins (DDBs) are a type of stormwater control measure (SCM) designed to provide flood storage, peak discharge reduction, and some water quality improvement through sedimentation. DDBs are ubiquitous in the urban environment, but are expensive to maintain. In this study, two overgrown DDBs near Raleigh, NC, receiving highway runoff were monitored for up to one year to quantify their water quality and hydrologic performance. Both basins, B1 and B2, have not received vegetation maintenance since construction in 2007. Flow-weighted composite samples were collected during storm events and analyzed for nutrients (Total Phosphorus (TP), Ortho-phosphorus (OP), Ammonia-N (NH3), NO2-3-N (NOX), and Total Kjeldahl Nitrogen (TKN)), total suspended solids (TSS), and total Cd, Cu, Pb, and Zn. An annual water balance was also conducted to quantify runoff volume reduction. Despite low influent concentrations from the highway, significant removal efficiencies were found for all constituents except NH3 in B1. TP, OP, NOX, TSS, and Zn were reduced in B2. Both basins achieved greater than 41% volume reduction through soil infiltration and evapotranspiration, resulting in significant pollutant load reductions for all detected constituents, between 59% and 79% in B1 and 35% and 81% in B2. This study provides evidence that overgrown and unmaintained DDBs can reduce pollutant concentrations comparable to those reported for maintained DDBs, while reducing more volume than standard DDBs. Moreover, carbon sequestration likely increases while maintenance costs decrease.
DA - 2020/2/1/
PY - 2020/2/1/
DO - 10.1016/j.jenvman.2019.109853
VL - 255
SP -
SN - 1095-8630
KW - Highway
KW - Stormwater
KW - Dry detention basin
KW - Maintenance
KW - Non-point source pollution
KW - Carbon sequestration
ER -
TY - JOUR
TI - Robust estimation for moment condition models with data missing not at random
AU - Li, W.
AU - Yang, S.
AU - Han, P.
T2 - Journal of Statistical Planning and Inference
AB - We consider estimation for parameters defined through moment conditions when data are missing not at random. The missingness mechanism cannot be determined from the data alone, and inference under missingness not at random may be sensitive to unverifiable assumptions about the missingness mechanism. To add protection against model misspecification, we posit multiple models for the response probability and propose a weighting estimator with calibrated weights. Assuming the conditional distribution of the outcome given covariates is correctly modeled, we show that if any one of the multiple models for the response probability is correctly specified, the proposed estimator is consistent for the true value. A simulation study confirms that our estimator has multiple robustness when the outcome data is missing not at random. The method is also applied to an application.
DA - 2020/7//
PY - 2020/7//
DO - 10.1016/j.jspi.2020.01.001
VL - 207
SP - 246-254
SN - 1873-1171
KW - Identification
KW - Empirical likelihood
KW - Missing not at random
KW - Multiple robustness
KW - Semiparametric maximum likelihood estimator
ER -
TY - JOUR
TI - Robust kernel association testing (RobKAT)
AU - Martinez, Kara
AU - Maity, Arnab
AU - Yolken, Robert H.
AU - Sullivan, Patrick F.
AU - Tzeng, Jung-Ying
T2 - GENETIC EPIDEMIOLOGY
AB - Abstract Testing the association between single‐nucleotide polymorphism (SNP) effects and a response is often carried out through kernel machine methods based on least squares, such as the sequence kernel association test (SKAT). However, these least‐squares procedures are designed for a normally distributed conditional response, which may not apply. Other robust procedures such as the quantile regression kernel machine (QRKM) restrict the choice of the loss function and only allow inference on conditional quantiles. We propose a general and robust kernel association test with a flexible choice of the loss function, no distributional assumptions, and has SKAT and QRKM as special cases. We evaluate our proposed robust association test (RobKAT) across various data distributions through a simulation study. When errors are normally distributed, RobKAT controls type I error and shows comparable power with SKAT. In all other distributional settings investigated, our robust test has similar or greater power than SKAT. Finally, we apply our robust testing method to data from the Clinical Antipsychotic Trials of Intervention Effectiveness (CATIE) clinical trial to detect associations between selected genes including the major histocompatibility complex (MHC) region on chromosome six and neurotropic herpesvirus antibody levels in schizophrenia patients. RobKAT detected significant association with four SNP sets ( HST1H2BJ , MHC, POM12L2 , and SLC17A1 ), three of which were undetected by SKAT.
DA - 2020/4//
PY - 2020/4//
DO - 10.1002/gepi.22280
VL - 44
IS - 3
SP - 272-282
SN - 1098-2272
UR - https://doi.org/10.1002/gepi.22280
KW - kernel association test
KW - multimarker hypothesis test
KW - robust regression
KW - schizophrenia
KW - semiparametric
ER -
TY - JOUR
TI - Q-Learning: Theory and Applications
AU - Clifton, Jesse
AU - Laber, Eric
T2 - ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION, VOL 7, 2020
AB - Q-learning, originally an incremental algorithm for estimating an optimal decision strategy in an infinite-horizon decision problem, now refers to a general class of reinforcement learning methods widely used in statistics and artificial intelligence. In the context of personalized medicine, finite-horizon Q-learning is the workhorse for estimating optimal treatment strategies, known as treatment regimes. Infinite-horizon Q-learning is also increasingly relevant in the growing field of mobile health. In computer science, Q-learning methods have achieved remarkable performance in domains such as game-playing and robotics. In this article, we ( a) review the history of Q-learning in computer science and statistics, ( b) formalize finite-horizon Q-learning within the potential outcomes framework and discuss the inferential difficulties for which it is infamous, and ( c) review variants of infinite-horizon Q-learning and the exploration-exploitation problem, which arises in decision problems with a long time horizon. We close by discussing issues arising with the use of Q-learning in practice, including arguments for combining Q-learning with direct-search methods; sample size considerations for sequential, multiple assignment randomized trials; and possibilities for combining Q-learning with model-based methods.
DA - 2020///
PY - 2020///
DO - 10.1146/annurev-statistics-031219-041220
VL - 7
SP - 279-301
SN - 2326-831X
KW - reinforcement learning
KW - dynamic treatment regimes
KW - model-free
KW - causal inference
KW - policy search
ER -
TY - JOUR
TI - Contraction properties of shrinkage priors in logistic regression
AU - Wei, Ran
AU - Ghosal, Subhashis
T2 - JOURNAL OF STATISTICAL PLANNING AND INFERENCE
AB - Bayesian shrinkage priors have received a lot of attention recently because of their efficiency in computation and accuracy in estimation and variable selection. In this paper, we study the contraction properties of shrinkage priors in a logistic regression model where the number of covariates is high. For a shrinkage prior distribution that is heavy-tailed and concentrated around zero with high probability such as the horseshoe prior, the Dirichlet–Laplace prior, and the normal-gamma prior with appropriate choices of hyper-parameters, estimates of the logistic regression coefficient are shown to asymptotically concentrate around the true sparse vector in the L2-sense. It is shown that the proposed contraction rate is comparable with the point mass prior that is studied in Atchadé (2017). The simulation study under the logistic regression model verifies the theoretical results by showing that the horseshoe prior and the Dirichlet–Laplace prior perform like the point mass prior for the estimation, variable selection and prediction, and yield much better results than Bayesian lasso and the non-informative normal prior.
DA - 2020/7//
PY - 2020/7//
DO - 10.1016/j.jspi.2019.12.004
VL - 207
SP - 215-229
SN - 1873-1171
KW - Bayesian variable selection
KW - Continuous shrinkage
KW - Contraction rate
KW - Logistic regression
KW - Point mass prior
ER -
TY - JOUR
TI - Preface of the Special Issue in Honor of Professor Jayanta Kumar Ghosh
AU - Ghosal, Subhashis
T2 - SANKHYA-SERIES A-MATHEMATICAL STATISTICS AND PROBABILITY
DA - 2020/3/3/
PY - 2020/3/3/
DO - 10.1007/s13171-020-00199-z
SP -
SN - 0976-8378
ER -
TY - JOUR
TI - Smart Textile‐Based Personal Thermal Comfort Systems: Current Status and Potential Solutions
AU - Tabor, Jordan
AU - Chatterjee, Kony
AU - Ghosh, Tushar K.
T2 - Advanced Materials Technologies
AB - Abstract Thermophysiological comfort in humans is sought universally but seldom achieved due to biological and physiological variances. Most people in developed parts of the world rely on highly energy‐intensive, and inefficient central heating/cooling systems to achieve thermophysiological comfort which is rarely satisfactory. A potential solution to this issue is a wearable personal thermal comfort system (PTCS) consisting of textile‐based temperature and moisture sensors, thermal and moisture responsive actuators, and/or heating/cooling devices, that can sense the environment and physiology of the wearer, and accordingly provide an individualized thermal environment. Moving thermal regulation away from the built environment to the microclimate surrounding the human body using textiles has the potential to provide personalized thermal comfort and energy savings. Such a system may employ thermal comfort models and leverage the Internet of Things (IoT) and machine learning (ML) to understand individuals' comfort requirements. Herein, the current state of textile‐based active and passive comfort systems/technologies are summarized, including their environmental impact, major thermal comfort models, and factors influencing comfort. Also, active and passive textile‐based devices (sensors, actuators, and flexible heating/cooling devices) that may be incorporated into a textile‐based wearable PTCS are comprehensively discussed with an emphasis on their advantages, limitations, and prospects.
DA - 2020/5//
PY - 2020/5//
DO - 10.1002/admt.201901155
UR - https://doi.org/10.1002/admt.201901155
KW - actuators
KW - e-textiles
KW - flexible sensors
KW - thermal comfort
KW - thermoelectric fabrics
ER -
TY - JOUR
TI - Lethal and sublethal effects of toxicants on bumble bee populations: a modelling approach
AU - Banks, J. E.
AU - Banks, H. T.
AU - Myers, N.
AU - Laubmeier, A. N.
AU - Bommarco, R.
T2 - ECOTOXICOLOGY
AB - Abstract Pollinator decline worldwide is well-documented; globally, chemical pesticides (especially the class of pesticides known as neonicotinoids) have been implicated in hymenopteran decline, but the mechanics and drivers of population trends and dynamics of wild bees is poorly understood. Declines and shifts in community composition of bumble bees (Bombus spp .) have been documented in North America and Europe, with a suite of lethal and sub-lethal effects of pesticides on bumble bee populations documented. We employ a mathematical model parameterized with values taken from the literature that uses differential equations to track bumble bee populations through time in order to attain a better understanding of toxicant effects on a developing colony of bumble bees. We use a delay differential equation (DDE) model, which requires fewer parameter estimations than agent-based models while affording us the ability to explicitly describe the effect of larval incubation and colony history on population outcomes. We explore how both lethal and sublethal effects such as reduced foraging ability may combine to affect population outcomes, and discuss the implications for the protection and conservation of ecosystem services.
DA - 2020/4//
PY - 2020/4//
DO - 10.1007/s10646-020-02162-y
VL - 29
IS - 3
SP - 237-245
SN - 1573-3017
KW - Hymenoptera
KW - Neonicitinoid
KW - Delay differential equation
ER -
TY - JOUR
TI - Spatiotemporal signal detection using continuous shrinkage priors
AU - Jhuang, An-Ting
AU - Fuentes, Montserrat
AU - Bandyopadhyay, Dipankar
AU - Reich, Brian J.
T2 - STATISTICS IN MEDICINE
AB - Periodontal disease (PD) is a chronic inflammatory disease that affects the gum tissue and bone supporting the teeth. Although tooth‐site level PD progression is believed to be spatio‐temporally referenced, the whole‐mouth average periodontal pocket depth (PPD) has been commonly used as an indicator of the current/active status of PD. This leads to imminent loss of information, and imprecise parameter estimates. Despite availability of statistical methods that accommodates spatiotemporal information for responses collected at the tooth‐site level, the enormity of longitudinal databases derived from oral health practice‐based settings render them unscalable for application. To mitigate this, we introduce a Bayesian spatiotemporal model to detect problematic/diseased tooth‐sites dynamically inside the mouth for any subject obtained from large databases. This is achieved via a spatial continuous sparsity‐inducing shrinkage prior on spatially varying linear‐trend regression coefficients. A low‐rank representation captures the nonstationary covariance structure of the PPD outcomes, and facilitates the relevant Markov chain Monte Carlo computing steps applicable to thousands of study subjects. Application of our method to both simulated data and to a rich database of electronic dental records from the HealthPartners Institute reveal improved prediction performances, compared with alternative models with usual Gaussian priors for regression parameters and conditionally autoregressive specification of the covariance structure.
DA - 2020/6/15/
PY - 2020/6/15/
DO - 10.1002/sim.8514
VL - 39
IS - 13
SP - 1817-1832
SN - 1097-0258
KW - nonstationary covariance
KW - periodontal disease
KW - shrinkage priors
KW - space-time disease surveillance
ER -
TY - JOUR
TI - Metal contamination of river otters in North Carolina
AU - Sanders, Charles W., II
AU - Pacifici, Krishna
AU - Hess, George R.
AU - Olfenbuttel, Colleen
AU - DePerno, Christopher S.
T2 - ENVIRONMENTAL MONITORING AND ASSESSMENT
DA - 2020///
PY - 2020///
DO - 10.1007/s10661-020-8106-8
VL - 192
IS - 2
ER -
TY - JOUR
TI - HyPhy 2.5-A Customizable Platform for Evolutionary Hypothesis Testing Using Phylogenies
AU - Pond, Sergei L. Kosakovsky
AU - Poon, Art F. Y.
AU - Velazquez, Ryan
AU - Weaver, Steven
AU - Hepler, N. Lance
AU - Murrell, Ben
AU - Shank, Stephen D.
AU - Magalis, Brittany Rife
AU - Bouvier, Dave
AU - Nekrutenko, Anton
AU - Wisotsky, Sadie
AU - Spielman, Stephanie J.
AU - Frost, Simon D. W.
AU - Muse, Spencer V
T2 - MOLECULAR BIOLOGY AND EVOLUTION
AB - Abstract HYpothesis testing using PHYlogenies (HyPhy) is a scriptable, open-source package for fitting a broad range of evolutionary models to multiple sequence alignments, and for conducting subsequent parameter estimation and hypothesis testing, primarily in the maximum likelihood statistical framework. It has become a popular choice for characterizing various aspects of the evolutionary process: natural selection, evolutionary rates, recombination, and coevolution. The 2.5 release (available from www.hyphy.org) includes a completely re-engineered computational core and analysis library that introduces new classes of evolutionary models and statistical tests, delivers substantial performance and stability enhancements, improves usability, streamlines end-to-end analysis workflows, makes it easier to develop custom analyses, and is mostly backward compatible with previous HyPhy releases.
DA - 2020/1//
PY - 2020/1//
DO - 10.1093/molbev/msz197
VL - 37
IS - 1
SP - 295-299
SN - 1537-1719
KW - evolutionary analysis
KW - natural selection
KW - hypothesis testing
KW - statistical inference
KW - software engineering
ER -
TY - JOUR
TI - Improving Cancer Drug Discovery by Studying Cancer across the Tree of Life
AU - Somarelli, Jason A.
AU - Boddy, Amy M.
AU - Gardner, Heather L.
AU - DeWitt, Suzanne Bartholf
AU - Tuohy, Joanne
AU - Megquier, Kate
AU - Sheth, Maya U.
AU - Hsu, Shiaowen David
AU - Thorne, Jeffrey L.
AU - London, Cheryl A.
AU - Eward, William C.
T2 - MOLECULAR BIOLOGY AND EVOLUTION
AB - Abstract Despite a considerable expenditure of time and resources and significant advances in experimental models of disease, cancer research continues to suffer from extremely low success rates in translating preclinical discoveries into clinical practice. The continued failure of cancer drug development, particularly late in the course of human testing, not only impacts patient outcomes, but also drives up the cost for those therapies that do succeed. It is clear that a paradigm shift is necessary if improvements in this process are to occur. One promising direction for increasing translational success is comparative oncology—the study of cancer across species, often involving veterinary patients that develop naturally-occurring cancers. Comparative oncology leverages the power of cross-species analyses to understand the fundamental drivers of cancer protective mechanisms, as well as factors contributing to cancer initiation and progression. Clinical trials in veterinary patients with cancer provide an opportunity to evaluate novel therapeutics in a setting that recapitulates many of the key features of human cancers, including genomic aberrations that underly tumor development, response and resistance to treatment, and the presence of comorbidities that can affect outcomes. With a concerted effort from basic scientists, human physicians and veterinarians, comparative oncology has the potential to enhance the cost-effectiveness and efficiency of pipelines for cancer drug discovery and other cancer treatments.
DA - 2020/1//
PY - 2020/1//
DO - 10.1093/molbev/msz254
VL - 37
IS - 1
SP - 11-17
SN - 1537-1719
KW - veterinary oncology
KW - cross-species studies
KW - cancer drug discovery
KW - evolutionary biology
ER -
TY - JOUR
TI - Data transforming augmentation for heteroscedastic models
AU - Tak, Hyungsuk
AU - You, Kisung
AU - Ghosh, Sujit K.
AU - Su, Bingyue
AU - Kelly, Joseph
T2 - JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS
AB - Data augmentation (DA) turns seemingly intractable computational problems into simple ones by augmenting latent missing data. In addition to computational simplicity, it is now well-established that DA equipped with a deterministic transformation can improve the convergence speed of iterative algorithms such as an EM algorithm or Gibbs sampler. In this article, we outline a framework for the transformation-based DA, which we call data transforming augmentation (DTA), allowing augmented data to be a deterministic function of latent and observed data, and unknown parameters. Under this framework, we investigate a novel DTA scheme that turns heteroscedastic models into homoscedastic ones to take advantage of simpler computations typically available in homoscedastic cases. Applying this DTA scheme to fitting linear mixed models, we demonstrate simpler computations and faster convergence rates of resulting iterative algorithms, compared with those under a non-transformation-based DA scheme. We also fit a Beta-Binomial model using the proposed DTA scheme, which enables sampling approximate marginal posterior distributions that are available only under homoscedasticity. An R package, Rdta, is publicly available at CRAN.
DA - 2020/7/2/
PY - 2020/7/2/
DO - 10.1080/10618600.2019.1704295
VL - 29
IS - 3
SP - 659-667
SN - 1537-2715
UR - https://doi.org/10.1080/10618600.2019.1704295
KW - Beta-Binomial
KW - EM algorithm
KW - Gibbs sampler
KW - hierarchical model
KW - linear mixed model
KW - missing data
ER -
TY - JOUR
TI - Use of unconventional mixed Acetone-Butanol-Ethanol solvents for anthocyanin extraction from Purple-Fleshed sweetpotatoes
AU - Zuleta-Correa, Ana
AU - Chinn, Mari Sum
AU - Alfaro-Córdoba, Marcela
AU - Truong, Van-Den
AU - Yencho, George Craig
AU - Bruno-Bárcena, José Manuel
T2 - Food Chemistry
AB - Anthocyanins from purple-fleshed sweetpotatoes constitute highly valued natural colorants and functional ingredients. In the past, anthocyanin extraction conditions and efficiencies using a single acidified solvent have been assessed. However, the potential of solvent mixes that can be generated by fermentation of biomass-derived sugars have not been explored. In this study, the effects of single and mixed solvent, time, temperature, sweetpotato genotype and preparation, on anthocyanin and phenolic extraction were evaluated. Results indicated that unconventional diluted solvent mixes containing acetone, butanol, and ethanol were superior or equally efficient for extracting anthocyanins when compared to commonly used concentrated extractants. In addition, analysis of anthocyanidins concentrations including cyanidin (cy), peonidin (pe), and pelargonidin (pl), indicated that different ratios of pn/cy were obtained depending on the solvent used. These results could be useful when selecting processing conditions that better suit particular end-use applications and more environmentally friendly process development for purple sweetpotatoes.
DA - 2020///
PY - 2020///
DO - 10.1016/j.foodchem.2019.125959
VL - 314
SP - 125959
UR - http://www.scopus.com/inward/record.url?eid=2-s2.0-85078103152&partnerID=MN8TOARS
KW - Ipomoea batatas
KW - Anthocyanidins
KW - Phenolics
KW - Cyanidin
KW - Peonidin
KW - Temperature
KW - Flour
ER -
TY - JOUR
TI - Unraveling the Hexaploid Sweetpotato Inheritance Using Ultra-Dense Multilocus Mapping
AU - Mollinari, M.
AU - Olukolu, B.A.
AU - Da Pereira, G.S.
AU - Khan, A.
AU - Gemenet, D.
AU - Craig Yencho, G.
AU - Zeng, Z.-B.
T2 - G3: Genes|Genomes|Genetics
AB - The hexaploid sweetpotato (Ipomoea batatas (L.) Lam., 2n = 6x = 90) is an important staple food crop worldwide and plays a vital role in alleviating famine in developing countries. Due to its high ploidy level, genetic studies in sweetpotato lag behind major diploid crops significantly. We built an ultra-dense multilocus integrated genetic map and characterized the inheritance system in a sweetpotato full-sib family using our newly developed software, MAPpoly. The resulting genetic map revealed 96.5% collinearity between I. batatas and its diploid relative I. trifida We computed the genotypic probabilities across the whole genome for all individuals in the mapping population and inferred their complete hexaploid haplotypes. We provide evidence that most of the meiotic configurations (73.3%) were resolved in bivalents, although a small portion of multivalent signatures (15.7%), among other inconclusive configurations (11.0%), were also observed. Except for low levels of preferential pairing in linkage group 2, we observed a hexasomic inheritance mechanism in all linkage groups. We propose that the hexasomic-bivalent inheritance promotes stability to the allelic transmission in sweetpotato.
DA - 2020/1//
PY - 2020/1//
DO - 10.1534/g3.119.400620
VL - 10
IS - 1
SP - 281-292
UR - http://dx.doi.org/10.1534/g3.119.400620
KW - Polyploidy
KW - Genetic Linkage
KW - Hexasomic Inheritance
KW - Haplotyping
KW - Preferential Pairing
KW - Multivalent
ER -
TY - JOUR
TI - Quantitative trait loci and differential gene expression analyses reveal the genetic basis for negatively associated β-carotene and starch content in hexaploid sweetpotato [Ipomoea batatas (L.) Lam.]
AU - Gemenet, D.C.
AU - Silva Pereira, G.
AU - De Boeck, B.
AU - Wood, J.C.
AU - Mollinari, M.
AU - Olukolu, B.A.
AU - Diaz, F.
AU - Mosquera, V.
AU - Ssali, R.T.
AU - David, M.
AU - Kitavi, M.N.
AU - Burgos, G.
AU - Felde, T.Z.
AU - Ghislain, M.
AU - Carey, E.
AU - Swanckaert, J.
AU - Coin, L.J.M.
AU - Fei, Z.
AU - Hamilton, J.P.
AU - Yada, B.
AU - Yencho, G.C.
AU - Zeng, Z.-B.
AU - Mwanga, R.O.M.
AU - Khan, A.
AU - Gruneberg, W.J.
AU - Buell, C.R.
T2 - Theoretical and Applied Genetics
AB - β-Carotene content in sweetpotato is associated with the Orange and phytoene synthase genes; due to physical linkage of phytoene synthase with sucrose synthase, β-carotene and starch content are negatively correlated. In populations depending on sweetpotato for food security, starch is an important source of calories, while β-carotene is an important source of provitamin A. The negative association between the two traits contributes to the low nutritional quality of sweetpotato consumed, especially in sub-Saharan Africa. Using a biparental mapping population of 315 F1 progeny generated from a cross between an orange-fleshed and a non-orange-fleshed sweetpotato variety, we identified two major quantitative trait loci (QTL) on linkage group (LG) three (LG3) and twelve (LG12) affecting starch, β-carotene, and their correlated traits, dry matter and flesh color. Analysis of parental haplotypes indicated that these two regions acted pleiotropically to reduce starch content and increase β-carotene in genotypes carrying the orange-fleshed parental haplotype at the LG3 locus. Phytoene synthase and sucrose synthase, the rate-limiting and linked genes located within the QTL on LG3 involved in the carotenoid and starch biosynthesis, respectively, were differentially expressed in Beauregard versus Tanzania storage roots. The Orange gene, the molecular switch for chromoplast biogenesis, located within the QTL on LG12 while not differentially expressed was expressed in developing roots of the parental genotypes. We conclude that these two QTL regions act together in a cis and trans manner to inhibit starch biosynthesis in amyloplasts and enhance chromoplast biogenesis, carotenoid biosynthesis, and accumulation in orange-fleshed sweetpotato. Understanding the genetic basis of this negative association between starch and β-carotene will inform future sweetpotato breeding strategies targeting sweetpotato for food and nutritional security.
DA - 2020/1//
PY - 2020/1//
DO - 10.1007/s00122-019-03437-7
VL - 133
IS - 1
SP - 23-36
UR - http://dx.doi.org/10.1007/s00122-019-03437-7
ER -
TY - JOUR
TI - Distribution of fiber intersections in two-dimensional random fiber web cases with a mixture of two fiber lengths
AU - Chun, Heuiju
AU - Suh, Moon W.
T2 - TEXTILE RESEARCH JOURNAL
AB - The statistical distribution of the number of fiber intersections in a unit area is of great importance in determining the physical and mechanical properties of random fiber webs and the products produced. The distribution of the number of fiber intersections determines the non-uniformity of the basis weight and can be used in designing optimal control strategies relating to such physical properties as strength, elongation, air/water permeability, acoustics and filtering efficiencies of fiber webs and nonwoven fabrics. This paper developed a geometrical and probabilistic model for the number of fiber intersections in two-dimensional random fiber webs, where two distinct fiber lengths are mixed at varying ratios. This work is an extension of a previously derived paper where the model assumed that all fiber lengths are equal. Here, we present a geometrical probabilistic model, theories for deriving expectations and variances of the number of intersections in random fiber webs. The model and statistical parameters are validated through an extensive computer simulation study.
DA - 2020/8//
PY - 2020/8//
DO - 10.1177/0040517519898158
VL - 90
IS - 15-16
SP - 1851-1859
SN - 1746-7748
KW - fiber intersections
KW - mixed fiber length webs
KW - mean
KW - variance
KW - nonwovens
ER -
TY - JOUR
TI - Statistical Inference for High-Dimensional Models via Recursive Online-Score Estimation
AU - Shi, Chengchun
AU - Song, Rui
AU - Lu, Wenbin
AU - Li, Runze
T2 - JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
AB - In this paper, we develop a new estimation and valid inference method for single or low-dimensional regression coefficients in high-dimensional generalized linear models. The number of the predictors is allowed to grow exponentially fast with respect to the sample size. The proposed estimator is computed by solving a score function. We recursively conduct model selection to reduce the dimensionality from high to a moderate scale and construct the score equation based on the selected variables. The proposed confidence interval (CI) achieves valid coverage without assuming consistency of the model selection procedure. When the selection consistency is achieved, we show the length of the proposed CI is asymptotically the same as the CI of the "oracle" method which works as well as if the support of the control variables were known. In addition, we prove the proposed CI is asymptotically narrower than the CIs constructed based on the de-sparsified Lasso estimator (van de Geer et al., 2014) and the decorrelated score statistic (Ning and Liu, 2017). Simulation studies and real data applications are presented to back up our theoretical findings.
DA - 2020///
PY - 2020///
DO - 10.1080/01621459.2019.1710154
KW - Confidence interval
KW - Generalized linear models
KW - Online estimation
KW - Ultrahigh dimensions
ER -
TY - JOUR
TI - A New Liver Expression Quantitative Trait Locus Map From 1,183 Individuals Provides Evidence for Novel Expression Quantitative Trait Loci of Drug Response, Metabolic, and Sex-Biased Phenotypes
AU - Etheridge, Amy S.
AU - Gallins, Paul J.
AU - Jima, Dereje
AU - Broadaway, K. Alaine
AU - Ratain, Mark J.
AU - Schuetz, Erin
AU - Schadt, Eric
AU - Schroder, Adrian
AU - Molony, Cliona
AU - Zhou, Yihui
AU - Mohlke, Karen L.
AU - Wright, Fred A.
AU - Innocenti, Federico
T2 - CLINICAL PHARMACOLOGY & THERAPEUTICS
AB - Expression quantitative trait locus (eQTL) studies in human liver are crucial for elucidating how genetic variation influences variability in disease risk and therapeutic outcomes and may help guide strategies to obtain maximal efficacy and safety of clinical interventions. Associations between expression microarray and genome-wide genotype data from four human liver eQTL studies (n = 1,183) were analyzed. More than 2.3 million cis-eQTLs for 15,668 genes were identified. When eQTLs were filtered against a list of 1,496 drug response genes, 187,829 cis-eQTLs for 1,191 genes were identified. Additionally, 1,683 sex-biased cis-eQTLs were identified, as well as 49 and 73 cis-eQTLs that colocalized with genome-wide association study signals for blood metabolite or lipid levels, respectively. Translational relevance of these results is evidenced by linking DPYD eQTLs to differences in safety of chemotherapy, linking the sex-biased regulation of PCSK9 expression to anti-lipid therapy, and identifying the G-protein coupled receptor GPR180 as a novel drug target for hypertriglyceridemia.
DA - 2020/6//
PY - 2020/6//
DO - 10.1002/cpt.1751
VL - 107
IS - 6
SP - 1383-1393
SN - 1532-6535
ER -
TY - JOUR
TI - Designing Dry Swales for Stormwater Quality Improvement Using the Aberdeen Equation
AU - Hunt, W. F.
AU - Fassman-Beck, E. A.
AU - Ekka, S. A.
AU - Shaneyfelt, K. C.
AU - Deletic, A.
T2 - JOURNAL OF SUSTAINABLE WATER IN THE BUILT ENVIRONMENT
AB - This case study presents a semiempirical method for designing water quality swales to treat stormwater runoff that is an alternative to current mostly anecdotal design approaches. Water quality swales are intended to reduce pollutant concentrations; they are not just flow conveyance systems. The design presented herein is a two-part process: (1) hydraulic design, and (2) treatment design. A hydraulic design feature unique to water quality swales includes maximum flow depths typically lower than grass height. Frequency analysis is used to estimate the water quality design storm intensity, and the design peak flow rate is estimated using the Rational method. Subsequently, Manning’s equation is used to determine the swale cross-section and slope. A relatively high roughness coefficient (n=∼0.35) is applied because the water is not intended to overtop the vegetation. This case study used the Aberdeen equation to calculate pollutant removal efficiencies if particle-size information was available. The method was applied to field-monitored swales in Auckland, New Zealand and Knightdale, North Carolina, US, and was found to accurately predict sediment capture. The conceptual approach presented here can be used to estimate reductions in total suspended solids by swales. However, the method needs to be validated with appropriate monitoring data in estimating removal of metals and other particulate-bound pollutants, but it is not applicable to the dissolved fraction of pollutants.
DA - 2020/2//
PY - 2020/2//
DO - 10.1061/JSWBAY.0000886
VL - 6
IS - 1
SP -
SN - 2379-6111
ER -
TY - JOUR
TI - Platelet aggregometry testing during aspirin or clopidogrel treatment and measurement of clopidogrel metabolite concentrations in dogs with protein-losing nephropathy
AU - Shropshire, Sarah
AU - Johnson, Tyler
AU - Olver, Christine
T2 - JOURNAL OF VETERINARY INTERNAL MEDICINE
AB - Abstract Background Dogs with protein‐losing nephropathy (PLN) are treated with antiplatelet drugs for thromboprophylaxis but no standardized method exists to measure drug response. It is also unknown if clopidogrel metabolite concentrations [CM] differ between healthy and PLN dogs. Objectives Assess response to aspirin or clopidogrel in PLN dogs using platelet aggregometry (PA) and compare [CM] between healthy and PLN dogs. Animals Six healthy and 14 PLN dogs. Methods Platelet aggregometry using adenosine diphosphate (ADP), arachidonic acid (AA), and saline was performed in healthy dogs at baseline and 1‐week postclopidogrel administration to identify responders or nonresponders. A decrease of ≥60% for ADP or ≥30% for AA at 1 or 3 hours postpill was used to define a responder. At 1 and 3 hours postclopidogrel, [CM] and PA were measured in healthy and PLN dogs. Platelet aggregometry was performed in PLN dogs at baseline, 1, 6, and 12 weeks after clopidogrel or aspirin administration. Results In PLN dogs receiving clopidogrel, PA differed from baseline at all time points for ADP but not for AA at any time point. Most dogs responded at 1 or both time points except for 1 dog that showed no response. For PLN dogs receiving aspirin, no differences from baseline were observed at any time point for either ADP or AA. No differences in [CM] were found at either time point between healthy and PLN dogs. Conclusions and Clinical Importance Platelet aggregometry may represent an objective method to evaluate response to clopidogrel or aspirin treatment and PLN dogs appear to metabolize clopidogrel similarly to healthy dogs.
DA - 2020///
PY - 2020///
DO - 10.1111/jvim.15694
ER -
TY - JOUR
TI - How Urban Identity, Affect, and Knowledge Predict Perceptions About Coyotes and Their Management
AU - Drake, Michael D.
AU - Peterson, M. Nils
AU - Griffith, Emily H.
AU - Olfenbuttel, Colleen
AU - DePerno, Cristopher S.
AU - Moorman, Christopher E.
T2 - ANTHROZOOS
AB - Globally, the number of humans and wildlife species sharing urban spaces continues to grow. As these populations grow, so too does the frequency of human–wildlife interactions in urban areas. Carnivores in particular pose urban wildlife conservation challenges owing to the strong emotions they elicit and the potential threats they can present to humans. These challenges can be better addressed with an understanding of the different factors that influence public perceptions of carnivores and their management. We conducted mail surveys in four cities in North Carolina (n =721) to explore how (a) city of residence, (b) affectual connections to coyotes (Canis latrans), and (c) biological knowledge predicted perceptions of the danger posed by coyotes, the support for wild coyotes living nearby, and the support for lethal coyote removal methods. Our results provide the first assessment of how public perceptions of carnivores and their management vary between cities of different types. Residents from a tourism-driven city were more supportive of coyotes than residents from an industrial city and less concerned about risk than residents from a commercial city. We found affectual connection to coyotes and city of residence were consistent predictors of coyote perceptions. Respondents’ knowledge of coyote biology was not a significant predictor of any perceptions of coyotes despite the relatively high statistical power of the tests. Affectual connection to coyotes had the greatest effect on predicting coyote perceptions, suggesting efforts to promote positive emotional connections to wildlife may be a better way to increase acceptance of carnivores in urban areas than focusing on biological knowledge.
DA - 2020/1/2/
PY - 2020/1/2/
DO - 10.1080/08927936.2020.1694302
VL - 33
IS - 1
SP - 5-19
SN - 1753-0377
KW - affect
KW - Canis latrans
KW - coyotes
KW - urban identity
KW - wildlife knowledge
ER -
TY - JOUR
TI - Improving Safety, Efficiency, and Productivity: Evaluation of Fall Protection Systems for Bridge Work Using Wearable Technology and Utility Analysis
AU - Zuluaga, Carlos M.
AU - Albert, Alex
AU - Winkel, Munir A.
T2 - JOURNAL OF CONSTRUCTION ENGINEERING AND MANAGEMENT
AB - The construction industry is experiencing a number of challenges. For example, construction workplaces report poor safety performance, widespread inefficiencies, and stagnant productivity rates. These challenges often translate into higher-order issues including cost overruns, schedule growths, and project failure. Accordingly, much of construction research has focused on identifying best practices to improve safety, efficiency, and productivity. However, the majority of these efforts focus on resolving one of these challenges (e.g., safety) rather than holistically addressing safety, efficiency, and productivity in unison. Unfortunately, such an approach can yield unintended consequences in certain circumstances. For example, a narrow focus on productivity may adversely affect safety performance, and vice versa. One nationwide safety issue that has received much recent attention is the protection of highway and bridge workers from falls to lower levels when working on bridge decks. In these circumstances, highway and bridge workers largely rely on existing bridge guardrails for their protection against falls. However, most bridge guardrails do not offer a barrier height of 107±8 cm (42±3 in.) for sufficient protection as per regulatory requirements. To protect these workers, a few transportation agencies are beginning to adopt passive fall protection systems that can be attached to the guardrails to temporarily increase the barrier height. The purpose of the current research was to support these efforts by evaluating four fall protection systems that are actively being considered for adoption based on the expected safety, efficiency, and productivity benefits they offer. The study objectives were accomplished through 96 field trials where physiological responses, postural demands, activity rates, and the associated utility were gathered from participating workers using wearable technology and a questionnaire survey. The research effort identified fall protection systems that offer the most advantages in terms of safety, efficiency, and productivity. The adoption of the recommended systems can yield substantial benefits in terms of safety, efficiency, and productivity, apart from reducing the risk of falls.
DA - 2020/2/1/
PY - 2020/2/1/
DO - 10.1061/(ASCE)CO.1943-7862.0001764
VL - 146
IS - 2
SP -
SN - 1943-7862
KW - Fall protection
KW - Construction safety
KW - Wearable technology
KW - Productivity
KW - Efficiency
ER -
TY - JOUR
TI - Model misspecification, Bayesian versus credibility estimation, and Gibbs posteriors
AU - Hong, Liang
AU - Martin, Ryan
T2 - SCANDINAVIAN ACTUARIAL JOURNAL
AB - In the context of predicting future claims, a fully Bayesian analysis – one that specifies a statistical model, prior distribution, and updates using Bayes's formula – is often viewed as the gold-standard, while Bühlmann's credibility estimator serves as a simple approximation. But those desirable properties that give the Bayesian solution its elevated status depend critically on the posited model being correctly specified. Here we investigate the asymptotic behavior of Bayesian posterior distributions under a misspecified model, and our conclusion is that misspecification bias generally has damaging effects that can lead to inaccurate inference and prediction. The credibility estimator, on the other hand, is not sensitive at all to model misspecification, giving it an advantage over the Bayesian solution in those practically relevant cases where the model is uncertain. This begs the question: does robustness to model misspecification require that we abandon uncertainty quantification based on a posterior distribution? Our answer to this question is No, and we offer an alternative Gibbs posterior construction. Furthermore, we argue that this Gibbs perspective provides a new characterization of Bühlmann's credibility estimator.
DA - 2020/8/8/
PY - 2020/8/8/
DO - 10.1080/03461238.2019.1711154
VL - 2020
IS - 7
SP - 634-649
SN - 1651-2030
KW - Asymptotics
KW - Bernstein-von Mises phenomenon
KW - exponential family
KW - robustness
KW - uncertainty quantification
ER -
TY - BOOK
TI - Dynamic Treatment Regimes: Statistical Methods for Precision Medicine
AU - Tsiatis, A.A.
AU - Davidian, M.
AU - Laber, E.B.
AU - Holloway, S.T.
DA - 2020///
PY - 2020///
DO - 10.1201/9780429192692/dynamic-treatment-regimes-anastasios-tsiatis-marie-davidian-shannon-holloway-eric-labe
PB - Chapman & Hall/CRC Press
SN - 9781498769778
UR - https://www.taylorfrancis.com/books/mono/10.1201/9780429192692/dynamic-treatment-regimes-anastasios-tsiatis-marie-davidian-shannon-holloway-eric-labe
ER -
TY - JOUR
TI - A test of homogeneity of distributions when observations are subject to measurement errors
AU - Lee, DongHyuk
AU - Lahiri, Soumendra N.
AU - Sinha, Samiran
T2 - BIOMETRICS
AB - When the observed data are contaminated with errors, the standard two-sample testing approaches that ignore measurement errors may produce misleading results, including a higher type-I error rate than the nominal level. To tackle this inconsistency, a nonparametric test is proposed for testing equality of two distributions when the observed contaminated data follow the classical additive measurement error model. The proposed test takes into account the presence of errors in the observed data, and the test statistic is defined in terms of the (deconvoluted) characteristic functions of the latent variables. Proposed method is applicable to a wide range of scenarios as no parametric restrictions are imposed either on the distribution of the underlying latent variables or on the distribution of the measurement errors. Asymptotic null distribution of the test statistic is derived, which is given by an integral of a squared Gaussian process with a complicated covariance structure. For data-based calibration of the test, a new nonparametric Bootstrap method is developed under the two-sample measurement error framework and its validity is established. Finite sample performance of the proposed test is investigated through simulation studies, and the results show superior performance of the proposed method than the standard tests that exhibit inconsistent behavior. Finally, the proposed method was applied to real data sets from the National Health and Nutrition Examination Survey. An R package MEtest is available through CRAN.
DA - 2020/9//
PY - 2020/9//
DO - 10.1111/biom.13207
VL - 76
IS - 3
SP - 821-833
SN - 1541-0420
KW - Bootstrap
KW - characteristic function
KW - chi-square
KW - Gaussian process
KW - power
KW - two-sample test
ER -
TY - JOUR
TI - Writing Assignments to Assess Statistical Thinking
AU - Woodard, Victoria
AU - Lee, Hollylynne
AU - Woodard, Roger
T2 - JOURNAL OF STATISTICS EDUCATION
AB - One of the main goals of statistics is to use data to provide evidence in support of an argument. This article will discuss some popular forms of writing assessments currently in use, to demonstrate the differences between the methods for structuring the students’ learning to support their arguments with evidence. We share a model, which was originally created to assess students in introductory statistics and has been adapted for the second course in statistics, which takes a unique approach toward assessing the students’ understanding of statistical concepts through writing. In this model, students are expected to answer prompts that required them to (1) take a stance on an argument, (2) defend their position with facts given in the prompt, (3) discern the implications that those facts implied, and (4) give a proper conclusion to their argument. We provide examples of a few of the writing assignment prompts used in the course, their intended assessment purpose, and common answers that students gave to these assignments.Supplementary materials for this article are available online.
DA - 2020/1/2/
PY - 2020/1/2/
DO - 10.1080/10691898.2019.1696257
VL - 28
IS - 1
SP - 32-44
SN - 1069-1898
KW - Argumentation
KW - Statistical thinking
KW - Second statistics course
KW - Written assessment
ER -
TY - JOUR
TI - Correctly modeling plant-insect-herbivore-pesticide interactions as aggregate data
AU - Banks, H. T.
AU - Banks, John E.
AU - Catenacci, Jared
AU - Joyner, Michele
AU - Stark, John
T2 - MATHEMATICAL BIOSCIENCES AND ENGINEERING
AB - We consider a population dynamics model in investigating data from controlled experiments with aphids in broccoli patches surrounded by different margin types (bare or weedy ground) and three levels of insecticide spray (no, light, or heavy spray). The experimental data is clearly aggregate in nature. In previous efforts [1], the aggregate nature of the data was ignored. In this paper, we embrace this aspect of the experiment and correctly model the data as aggregate data, comparing the results to the previous approach. We discuss cases in which the approach may provide similar results as well as cases in which there is a clear difference in the resulting fit to the data.
DA - 2020///
PY - 2020///
DO - 10.3934/mbe.2020091
VL - 17
IS - 2
SP - 1743-1756
SN - 1551-0018
KW - plant-insect interactions
KW - inverse problems
KW - hypothesis testing and standard errors in dynamical models
KW - aggregate data
KW - Prohorov metric
ER -
TY - JOUR
TI - Effects of Proportional Hazard Assumption on Variable Selection Methods for Censored Data
AU - Sheng, Alvin
AU - Ghosh, Sujit K.
T2 - STATISTICS IN BIOPHARMACEUTICAL RESEARCH
AB - The Cox proportional hazard (PH) model is widely used to determine the effects of risk factors and treatments (covariates) on survival time of subjects that might be right censored. The selection of covariates depends crucially on the specific form of the conditional hazard model, which is often assumed to be PH, accelerated failure time (AFT), or proportional odds (PO). However, we show that none of these semiparametric models allow for the crossing of the survival functions and hence such strong assumptions may adversely affect the selection of variables. Moreover, the most commonly used PH assumption may also be violated when there is a delayed effect of the risk factors. Taking into account all of these modeling assumptions, this study examines the effect of the PH assumption on covariate selection when the data generating model may have non-PH. In particular, variable selection under two alternative models are explored: (i) the penalized PH model (using the elastic-net penalty) and (ii) the linear spline based hazard regression model. We apply the aforementioned models to the ACTG-175 dataset and simulated datasets with survival times generated from the Weibull and log-normal distributions. We also examine the effect on covariate selection of stratifying the analysis on the off-treatment indicator.
DA - 2020/4/2/
PY - 2020/4/2/
DO - 10.1080/19466315.2019.1694578
VL - 12
IS - 2
SP - 199-209
SN - 1946-6315
UR - https://doi.org/10.1080/19466315.2019.1694578
KW - AIDS trials
KW - Crossing survival curves
KW - Hazard regression
KW - Penalized regression
ER -
TY - JOUR
TI - Doubly robust inference when combining probability and non-probability samples with high dimensional data
AU - Yang, Shu
AU - Kim, Jae Kwang
AU - Song, Rui
T2 - Journal of the Royal Statistical Society: Series B (Statistical Methodology)
AB - Summary We consider integrating a non-probability sample with a probability sample which provides high dimensional representative covariate information of the target population. We propose a two-step approach for variable selection and finite population inference. In the first step, we use penalized estimating equations with folded concave penalties to select important variables and show selection consistency for general samples. In the second step, we focus on a doubly robust estimator of the finite population mean and re-estimate the nuisance model parameters by minimizing the asymptotic squared bias of the doubly robust estimator. This estimating strategy mitigates the possible first-step selection error and renders the doubly robust estimator root n consistent if either the sampling probability or the outcome model is correctly specified.
DA - 2020/1/7/
PY - 2020/1/7/
DO - 10.1111/rssb.12354
VL - 1
J2 - J. R. Stat. Soc. B
LA - en
OP -
SN - 1369-7412
UR - http://dx.doi.org/10.1111/rssb.12354
DB - Crossref
KW - Data integration
KW - Double robustness
KW - Generalizability
KW - Penalized estimating equation
KW - Variable selection
ER -
TY - JOUR
TI - Inference in partially identified models with many moment inequalities using Lasso
AU - Bugni, Federico A.
AU - Caner, Mehmet
AU - Kock, Anders Bredahl
AU - Lahiri, Soumendra
T2 - JOURNAL OF STATISTICAL PLANNING AND INFERENCE
AB - This paper considers inference in a partially identified moment (in)equality model with many moment inequalities. We propose a novel two-step inference procedure that combines the methods proposed by Chernozhukov et al. (2018a) (Chernozhukov et al., 2018a, hereafter) with a first step moment inequality selection based on the Lasso. Our method controls asymptotic size uniformly, both in the underlying parameter and the data distribution. Also, the power of our method compares favorably with that of the corresponding two-step method in Chernozhukov et al. (2018a) for large parts of the parameter space, both in theory and in simulations. Finally, we show that our Lasso-based first step can be implemented by thresholding standardized sample averages, and so it is straightforward to implement.
DA - 2020/5//
PY - 2020/5//
DO - 10.1016/j.jspi.2019.09.013
VL - 206
SP - 211-248
SN - 1873-1171
KW - Many moment inequalities
KW - Self-normalizing sum
KW - Multiplier bootstrap
KW - Empirical bootstrap
KW - Lasso
KW - Inequality selection
ER -
TY - JOUR
TI - Asymptotic theory and inference of predictive mean matching imputation using a superpopulation model framework
AU - Yang, Shu
AU - Kim, Jae Kwang
T2 - SCANDINAVIAN JOURNAL OF STATISTICS
AB - Abstract Predictive mean matching imputation is popular for handling item nonresponse in survey sampling. In this article, we study the asymptotic properties of the predictive mean matching estimator for finite‐population inference using a superpopulation model framework. We also clarify conditions for its robustness. For variance estimation, the conventional bootstrap inference is invalid for matching estimators with a fixed number of matches due to the nonsmoothness nature of the matching estimator. We propose a new replication variance estimator, which is asymptotically valid. The key strategy is to construct replicates directly based on the linear terms of the martingale representation for the matching estimator, instead of individual records of variables. Simulation studies confirm that the proposed method provides valid inference.
DA - 2020/9//
PY - 2020/9//
DO - 10.1111/sjos.12429
VL - 47
IS - 3
SP - 839-861
SN - 1467-9469
KW - hot deck imputation
KW - Jackknife variance estimation
KW - martingale central limit theorem
KW - missing at random
ER -
TY - JOUR
TI - Empirical Priors and Coverage of Posterior Credible Sets in a Sparse Normal Mean Model
AU - Martin, Ryan
AU - Ning, Bo
T2 - SANKHYA-SERIES A-MATHEMATICAL STATISTICS AND PROBABILITY
AB - Bayesian methods provide a natural means for uncertainty quantification, that is, credible sets can be easily obtained from the posterior distribution. But is this uncertainty quantification valid in the sense that the posterior credible sets attain the nominal frequentist coverage probability? This paper investigates the frequentist validity of posterior uncertainty quantification based on a class of empirical priors in the sparse normal mean model. In particular, we show that our marginal posterior credible intervals achieve the nominal frequentist coverage probability under conditions slightly weaker than needed for selection consistency and a Bernstein–von Mises theorem for the full posterior, and numerical investigations suggest that our empirical Bayes method has superior frequentist coverage probability properties compared to other fully Bayes methods.
DA - 2020/8//
PY - 2020/8//
DO - 10.1007/s13171-019-00189-w
VL - 82
IS - 2
SP - 477-498
SN - 0976-8378
KW - Bayesian inference
KW - Bernstein-von Mises theorem
KW - Concentration rate
KW - High-dimensional model
KW - Uncertainty quantification
ER -
TY - JOUR
TI - tuxnet: a simple interface to process RNA sequencing data and infer gene regulatory networks
AU - Spurney, Ryan J.
AU - Broeck, Lisa
AU - Clark, Natalie M.
AU - Fisher, Adam P.
AU - Balaguer, Maria A. de Luis
AU - Sozzani, Rosangela
T2 - PLANT JOURNAL
AB - Summary Predicting gene regulatory networks (GRNs) from expression profiles is a common approach for identifying important biological regulators. Despite the increased use of inference methods, existing computational approaches often do not integrate RNA‐sequencing data analysis, are not automated or are restricted to users with bioinformatics backgrounds. To address these limitations, we developed tuxnet , a user‐friendly platform that can process raw RNA‐sequencing data from any organism with an existing reference genome using a modified tuxedo pipeline ( hisat 2 + cufflinks package) and infer GRNs from these processed data. tuxnet is implemented as a graphical user interface and can mine gene regulations, either by applying a dynamic Bayesian network (DBN) inference algorithm, genist , or a regression tree‐based pipeline, rtp‐star . We obtained time‐course expression data of a PERIANTHIA ( PAN ) inducible line and inferred a GRN using genist to illustrate the use of tuxnet while gaining insight into the regulations downstream of the Arabidopsis root stem cell regulator PAN . Using rtp‐star , we inferred the network of ATHB13 , a downstream gene of PAN, for which we obtained wild‐type and mutant expression profiles. Additionally, we generated two networks using temporal data from developmental leaf data and spatial data from root cell‐type data to highlight the use of tuxnet to form new testable hypotheses from previously explored data. Our case studies feature the versatility of tuxnet when using different types of gene expression data to infer networks and its accessibility as a pipeline for non‐bioinformaticians to analyze transcriptome data, predict causal regulations, assess network topology and identify key regulators.
DA - 2020/2//
PY - 2020/2//
DO - 10.1111/tpj.14558
VL - 101
IS - 3
SP - 716-730
SN - 1365-313X
KW - Arabidopsis thaliana
KW - gene regulatory network inference
KW - graphical user interface
KW - RNA sequencing processing
KW - stem cell maintenance
KW - technical advance
ER -
TY - JOUR
TI - 3D Printing of Textiles: Potential Roadmap to Printing with Fibers
AU - Chatterjee, Kony
AU - Ghosh, Tushar K.
T2 - Advanced Materials
AB - 3D printing (3DP) has transformed engineering, manufacturing, and the use of advanced materials due to its ability to produce objects from a variety of materials, ranging from soft polymers to rigid ceramics. 3DP offers the advantage of being able to print at a variety of lengths scales; from a few micrometers to many meters. 3DP has the unique ability to produce customized small lots, efficiently. Yet, one crucial industry that has not been able to adequately explore its potential is textile manufacturing. The research in 3DP of textiles has lagged behind other areas primarily due to the difficulty in obtaining some of the unique characteristics of strength, flexibility, etc., of textiles, utilizing a fundamentally different manufacturing technology. Textiles are their own class of materials due to the specific structural developments that occur during the various stages of textile manufacturing: from fiber extrusion to assembly of the fibers to fabrics. Here, the current 3DP technologies are reviewed with emphasis on soft and anisotropic structures, as well as the efforts toward 3DP of textiles. Finally, a potential pathway to 3DP of textiles, dubbed as printing with fibers to create textile structures is proposed for further exploration.
DA - 2020/1//
PY - 2020/1//
DO - 10.1002/adma.201902086
VL - 12
SP - 1902086
UR - https://doi.org/10.1002/adma.201902086
KW - 3D printing textiles
KW - additive manufacturing
KW - printing with fibers
KW - soft materials
ER -
TY - JOUR
TI - Solution paths for the generalized lasso with applications to spatially varying coefficients regression
AU - Zhao, Yaqing
AU - Bondell, Howard
T2 - COMPUTATIONAL STATISTICS & DATA ANALYSIS
AB - Penalized regression can improve prediction accuracy and reduce dimension. The generalized lasso problem is used in many applications in various fields. The generalized lasso penalizes a linear transformation of the coefficients rather than the coefficients themselves. The proposed algorithm solves the generalized lasso problem and provides the full solution path. A confidence set can then be constructed on the generalized lasso parameters based on the modified residual bootstrap lasso. The approach is demonstrated using spatially varying coefficients regression, and it is shown to be both accurate and efficient compared to previous work.
DA - 2020/2//
PY - 2020/2//
DO - 10.1016/j.csda.2019.106821
VL - 142
SP -
SN - 1872-7352
KW - Generalized lasso
KW - Penalized regression
KW - Regularization
KW - Solution path algorithm
ER -
TY - JOUR
TI - A multivariate spatial skew-t process for joint modeling of extreme precipitation indexes
AU - Hazra, Arnab
AU - Reich, Brian J.
AU - Staicu, Ana-Maria
T2 - ENVIRONMETRICS
AB - Abstract To study trends in extreme precipitation across the United States over the years 1951–2017, we analyze 10 climate indexes that represent extreme precipitation, such as annual maximum of daily precipitation and annual maximum of consecutive five‐day average precipitation. We consider the gridded data produced by the CLIMDEX project ( http://www.climdex.org/gewocs.html ), constructed using daily precipitation data. These indexes exhibit spatial and mutual dependence. In this paper, we propose a multivariate spatial skew‐ t process for joint modeling of extreme precipitation indexes and discuss its theoretical properties. The model framework allows Bayesian inference while maintaining a computational time that is competitive with common multivariate geostatistical approaches. In a numerical study, we find that the proposed model outperforms several simpler alternatives in terms of various model selection criteria. We apply the proposed model to estimate the average decadal change in the extreme precipitation indexes throughout the United States and find several significant local changes.
DA - 2020/5//
PY - 2020/5//
DO - 10.1002/env.2602
VL - 31
IS - 3
SP -
SN - 1099-095X
KW - climate change
KW - extremal dependence
KW - extremal trend analysis
KW - extreme precipitation indexes
KW - multivariate spatial skew-t process
KW - separable covariance
ER -
TY - JOUR
TI - Fine-Scale Spatiotemporal Air Pollution Analysis Using Mobile Monitors on Google Street View Vehicles
AU - Guan, Yawen
AU - Johnson, Margaret C.
AU - Katzfuss, Matthias
AU - Mannshardt, Elizabeth
AU - Messier, Kyle P.
AU - Reich, Brian J.
AU - Song, Joon J.
T2 - JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
AB - People are increasingly concerned with understanding their personal environment, including possible exposure to harmful air pollutants. To make informed decisions on their day-to-day activities, they are interested in real-time information on a localized scale. Publicly available, fine-scale, high-quality air pollution measurements acquired using mobile monitors represent a paradigm shift in measurement technologies. A methodological framework utilizing these increasingly fine-scale measurements to provide real-time air pollution maps and short-term air quality forecasts on a fine-resolution spatial scale could prove to be instrumental in increasing public awareness and understanding. The Google Street View study provides a unique source of data with spatial and temporal complexities, with the potential to provide information about commuter exposure and hot spots within city streets with high traffic. We develop a computationally efficient spatiotemporal model for these data and use the model to make short-term forecasts and high-resolution maps of current air pollution levels. We also show via an experiment that mobile networks can provide more nuanced information than an equally sized fixed-location network. This modeling framework has important real-world implications in understanding citizens’ personal environments, as data production and real-time availability continue to be driven by the ongoing development and improvement of mobile measurement technologies. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.
DA - 2020/7/2/
PY - 2020/7/2/
DO - 10.1080/01621459.2019.1665526
VL - 115
IS - 531
SP - 1111-1124
SN - 1537-274X
KW - Google Street View Air Quality Data
KW - Kriging
KW - Mobile sensors
KW - Spatiotemporal models
KW - Vecchia approximation
ER -
TY - JOUR
TI - Bayesian Nonparametric Policy Search With Application to Periodontal Recall Intervals
AU - Guan, Qian
AU - Reich, Brian J.
AU - Laber, Eric B.
AU - Bandyopadhyay, Dipankar
T2 - JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
AB - Tooth loss from periodontal disease is a major public health burden in the United States. Standard clinical practice is to recommend a dental visit every six months; however, this practice is not evidence-based, and poor dental outcomes and increasing dental insurance premiums indicate room for improvement. We consider a tailored approach that recommends recall time based on patient characteristics and medical history to minimize disease progression without increasing resource expenditures. We formalize this method as a dynamic treatment regime which comprises a sequence of decisions, one per stage of intervention, that follow a decision rule which maps current patient information to a recommendation for their next visit time. The dynamics of periodontal health, visit frequency, and patient compliance are complex, yet the estimated optimal regime must be interpretable to domain experts if it is to be integrated into clinical practice. We combine nonparametric Bayesian dynamics modeling with policy-search algorithms to estimate the optimal dynamic treatment regime within an interpretable class of regimes. Both simulation experiments and application to a rich database of electronic dental records from the HealthPartners HMO shows that our proposed method leads to better dental health without increasing the average recommended recall time relative to competing methods. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.
DA - 2020/7/2/
PY - 2020/7/2/
DO - 10.1080/01621459.2019.1660169
VL - 115
IS - 531
SP - 1066-1078
SN - 1537-274X
KW - Dirichlet process prior
KW - Dynamic treatment regimes
KW - Observational data
KW - Periodontal disease
KW - Practice-based setting
KW - Precision medicine
KW - Sequential optimization
ER -
TY - JOUR
TI - Construction, Properties, and Analysis of Group-Orthogonal Supersaturated Designs
AU - Jones, Bradley
AU - Lekivetz, Ryan
AU - Majumdar, Dibyen
AU - Nachtsheim, Christopher J.
AU - Stallrich, Jonathan W.
T2 - TECHNOMETRICS
AB - In this article, we propose a new method for constructing supersaturated designs that is based on the Kronecker product of two carefully chosen matrices. The construction method leads to a partitioning of the factors of the design such that the factors within a group are correlated to the others within the same group, but are orthogonal to any factor in any other group. We refer to the resulting designs as group-orthogonal supersaturated designs. We leverage this group structure to obtain an unbiased estimate of the error variance, and to develop an effective, design-based model selection procedure. Simulation results show that the use of these designs, in conjunction with our model selection procedure enables the identification of larger numbers of active main effects than have previously been reported for supersaturated designs. The designs can also be used in group screening; however, unlike previous group-screening procedures, with our designs, main effects in a group are not confounded. Supplementary materials for this article are available online.
DA - 2020/7/2/
PY - 2020/7/2/
DO - 10.1080/00401706.2019.1654926
VL - 62
IS - 3
SP - 403-414
SN - 1537-2723
KW - E(s2)-optimality
KW - UE(s(2))-optimality
KW - Group screening designs
KW - Hadamard matrices
KW - Model selection
ER -
TY - JOUR
TI - Detection, variability, and predictability of monsoon onset and withdrawal dates: A review
AU - Bombardi, Rodrigo J.
AU - Moron, Vincent
AU - Goodnight, James S.
T2 - INTERNATIONAL JOURNAL OF CLIMATOLOGY
AB - Abstract This article presents a review of the scientific literature on detection, sources of variability, and predictability of the timing of monsoons. The timing of monsoons is characterized by the beginning (commonly referred to as onset) and end (commonly referred to as demise, cessation, retreat, or withdrawal) dates of the summer monsoons. The main methods used to detect the timing of monsoons are divided into two categories: local‐scale methods and regional to large‐scale methods. The sources of variability of the timing of monsoons are also separated into two categories: local‐scale and large‐scale sources. Finally, the article presents a summary of the literature on the predictability of the timing of monsoons using both dynamical and statistical approaches. We show that all methods are parameterized in some way. A comparison between two different methods shows that while there might be large differences in the definition of onset and demise dates at the local level, spatial aggregation usually reduces the noise and enhances the regional monsoonal signal, which may be predictable.
DA - 2020/2//
PY - 2020/2//
DO - 10.1002/joc.6264
VL - 40
IS - 2
SP - 641-667
SN - 1097-0088
KW - demise
KW - monsoon
KW - onset
KW - predictability
KW - timing
KW - variability
ER -
TY - JOUR
TI - Modelling the effects of field spatial scale and natural enemy colonization behaviour on pest suppression in diversified agroecosystems
AU - Banks, John E.
AU - Laubmeier, Amanda N.
AU - Banks, H. Thomas
T2 - AGRICULTURAL AND FOREST ENTOMOLOGY
AB - Abstract Diversifying agroecosystems by establishing or retaining natural vegetation in and around crop areas has long been recognized as a potentially effective means of bolstering pest control as a result of attracting more numerous and diverse natural enemies, although outcomes are inconsistent across species. Little is known about the underlying mechanisms driving such differences in species responses, creating challenges for determining how best to manage landscapes for maximizing environmental services such as biological control. The present study addresses gaps in our understanding of the link between noncrop vegetation in field margins and pest suppression by using a system of partial differential equations to model population‐level predator–prey interactions, as well as spatial processes, aiming to capture the dynamics of crop plants, herbivores and two generalist predators. We focus on differences in how two predators (a carabid and a ladybird beetle) colonize crop fields where they forage for prey, examining differences in how they move into the fields from adjacent vegetation as a potential driver of differences in overall pest suppression. The results obtained demonstrate that predator colonization behaviour and spatial scale are important factors with respect to determining the effectiveness of biological control.
DA - 2020/2//
PY - 2020/2//
DO - 10.1111/afe.12354
VL - 22
IS - 1
SP - 30-40
SN - 1461-9563
KW - Beetle
KW - differential equation
KW - diffusion
KW - dispersal
KW - habitat heterogeneity
ER -
TY - JOUR
TI - Nonparametric Estimation of Multivariate Mixtures
AU - Zheng, Chaowen
AU - Wu, Yichao
T2 - JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
AB - A multivariate mixture model is determined by three elements: the number of components, the mixing proportions, and the component distributions. Assuming that the number of components is given and ...
DA - 2020/7/2/
PY - 2020/7/2/
DO - 10.1080/01621459.2019.1635481
VL - 115
IS - 531
SP - 1456-1471
SN - 1537-274X
KW - Density estimation
KW - Nonparametric mixture model
KW - Tensor
ER -
TY - JOUR
TI - A Sparse Random Projection-Based Test for Overall Qualitative Treatment Effects
AU - Shi, Chengchun
AU - Lu, Wenbin
AU - Song, Rui
T2 - JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
AB - In contrast to the classical “one-size-fits-all” approach, precision medicine proposes the customization of individualized treatment regimes to account for patients’ heterogeneity in response to treatments. Most of existing works in the literature focused on estimating optimal individualized treatment regimes. However, there has been less attention devoted to hypothesis testing regarding the existence of overall qualitative treatment effects, especially when there are a large number of prognostic covariates. When covariates do not have qualitative treatment effects, the optimal treatment regime will assign the same treatment to all patients regardless of their covariate values. In this article, we consider testing the overall qualitative treatment effects of patients’ prognostic covariates in a high-dimensional setting. We propose a sample splitting method to construct the test statistic, based on a nonparametric estimator of the contrast function. When the dimension of covariates is large, we construct the test based on sparse random projections of covariates into a low-dimensional space. We prove the consistency of our test statistic. In the regular cases, we show the asymptotic power function of our test statistic is asymptotically the same as the “oracle” test statistic which is constructed based on the “optimal” projection matrix. Simulation studies and real data applications validate our theoretical findings. Supplementary materials for this article are available online.
DA - 2020/7/2/
PY - 2020/7/2/
DO - 10.1080/01621459.2019.1604368
VL - 115
IS - 531
SP - 1201-1213
SN - 1537-274X
KW - High-dimensional testing
KW - Optimal treatment regime
KW - Precision medicine
KW - Qualitative treatment effects
KW - Sparse random projection
ER -
TY - JOUR
TI - MIMIX: A Bayesian Mixed-Effects Model for Microbiome Data From Designed Experiments
AU - Grantham, Neal S.
AU - Guan, Yawen
AU - Reich, Brian J.
AU - Borer, Elizabeth T.
AU - Gross, Kevin
T2 - JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
AB - Recent advances in bioinformatics have made high-throughput microbiome data widely available, and new statistical tools are required to maximize the information gained from these data. For example, analysis of high-dimensional microbiome data from designed experiments remains an open area in microbiome research. Contemporary analyses work on metrics that summarize collective properties of the microbiome, but such reductions preclude inference on the fine-scale effects of environmental stimuli on individual microbial taxa. Other approaches model the proportions or counts of individual taxa as response variables in mixed models, but these methods fail to account for complex correlation patterns among microbial communities. In this article, we propose a novel Bayesian mixed-effects model that exploits cross-taxa correlations within the microbiome, a model we call microbiome mixed model (MIMIX). MIMIX offers global tests for treatment effects, local tests and estimation of treatment effects on individual taxa, quantification of the relative contribution from heterogeneous sources to microbiome variability, and identification of latent ecological subcommunities in the microbiome. MIMIX is tailored to large microbiome experiments using a combination of Bayesian factor analysis to efficiently represent dependence between taxa and Bayesian variable selection methods to achieve sparsity. We demonstrate the model using a simulation experiment and on a 2 × 2 factorial experiment of the effects of nutrient supplement and herbivore exclusion on the foliar fungal microbiome of Andropogon gerardii, a perennial bunchgrass, as part of the global Nutrient Network research initiative. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.
DA - 2020/4/2/
PY - 2020/4/2/
DO - 10.1080/01621459.2019.1626242
VL - 115
IS - 530
SP - 599-609
SN - 1537-274X
KW - Continuous shrinkage prior
KW - Factor analysis
KW - Microbiome
KW - Mixed model
KW - Nutrient Network
KW - OTU abundance data
ER -
TY - JOUR
TI - Testing and Estimation of Social Network Dependence With Time to Event Data
AU - Su, Lin
AU - Lu, Wenbin
AU - Song, Rui
AU - Huang, Danyang
T2 - JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
AB - Lin Sua, Wenbin Lua*, Rui Songa & Danyang Huangba Department of Statistics, North Carolina State University, Raleigh, NC; b School of Statistics, Remin University, Beijing, China
DA - 2020/4/2/
PY - 2020/4/2/
DO - 10.1080/01621459.2019.1617153
VL - 115
IS - 530
SP - 570-582
SN - 1537-274X
KW - Cox model
KW - EM algorithm
KW - Social network dependence
KW - Time-to-event data
ER -
TY - JOUR
TI - Estimating Dynamic Treatment Regimes in Mobile Health Using V-Learning
AU - Luckett, Daniel J.
AU - Laber, Eric B.
AU - Kahkoska, Anna R.
AU - Maahs, David M.
AU - Mayer-Davis, Elizabeth
AU - Kosorok, Michael R.
T2 - JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
AB - The vision for precision medicine is to use individual patient characteristics to inform a personalized treatment plan that leads to the best possible health-care for each patient. Mobile technologies have an important role to play in this vision as they offer a means to monitor a patient's health status in real-time and subsequently to deliver interventions if, when, and in the dose that they are needed. Dynamic treatment regimes formalize individualized treatment plans as sequences of decision rules, one per stage of clinical intervention, that map current patient information to a recommended treatment. However, most existing methods for estimating optimal dynamic treatment regimes are designed for a small number of fixed decision points occurring on a coarse time-scale. We propose a new reinforcement learning method for estimating an optimal treatment regime that is applicable to data collected using mobile technologies in an out-patient setting. The proposed method accommodates an indefinite time horizon and minute-by-minute decision making that are common in mobile health applications. We show that the proposed estimators are consistent and asymptotically normal under mild conditions. The proposed methods are applied to estimate an optimal dynamic treatment regime for controlling blood glucose levels in patients with type 1 diabetes.
DA - 2020/4/2/
PY - 2020/4/2/
DO - 10.1080/01621459.2018.1537919
VL - 115
IS - 530
SP - 692-706
SN - 1537-274X
KW - Markov decision processes
KW - Precision medicine
KW - Reinforcement learning
KW - Type 1 diabetes
ER -