1. A Penalized Two-pass Regression to Predict Stock Returns with Time-Varying Risk Premia
    Gaetan Bakalli, Stéphane Guerrier and Olivier Scaillet.

    Journal of Econometrics, major revision invited.

    We develop a penalized two-pass regression with time-varying factor loadings. The penalization in the first pass enforces sparsity for the time-variation drivers while also maintaining compatibility with the no arbitrage restrictions by regularizing appropriate groups of coefficients. The second pass delivers risk premia estimates to predict equity excess returns. Our Monte Carlo results and our empirical results on a large cross-sectional data set of US individual stocks show that penalization without grouping can yield to nearly all estimated time-varying models violating the no arbitrage restrictions. Moreover, our results demonstrate that the proposed method reduces the prediction errors compared to a penalized approach without appropriate grouping or a time-invariant factor model.
  2. Prevalence Estimation from Random Samples and Census Data with Participation Bias
    Stéphane Guerrier, Christoph Kuzmics and Maria-Pia Victoria-Feser.

    Annals of Applied Statistics, major revision invited .

    Countries officially record the number of COVID-19 cases based on medical tests of a subset of the population with unknown participation bias. For prevalence estimation, the official information is typically discarded and, instead, small random survey samples are taken. We derive (maximum likelihood and method of moment) prevalence estimators, based on a survey sample, that additionally utilize the official information, and that are substantially more accurate than the simple sample proportion of positive cases. Put differently, using our estimators, the same level of precision can be obtained with substantially smaller survey samples. We take into account the possibility of measurement errors due to the sensitivity and specificity of the medical testing procedure. The proposed estimators and associated confidence intervals are implemented in the companion open source R package cape.
  3. Multi-Signal Approaches for Repeated Sampling Schemes in Inertial Sensor Calibration

    IEEE Transactions on Signal Processing, major revision submitted.

    Inertial sensor calibration plays a progressively important role in many areas of research among which navigation engineering. By performing this task accurately, it is possible to significantly increase general navigation performance by correctly filtering out the deterministic and stochastic measurement errors that characterize such devices. While different techniques are available to model and remove the deterministic errors, there has been considerable research over the past years with respect to modelling the stochastic errors which have complex structures. In order to do the latter, different replicates of these error signals are collected and a model is identified and estimated based on one of these replicates. While this procedure has allowed to improve navigation performance, it has not yet taken advantage of the information coming from all the other replicates collected on the same sensor. However, it has been observed that there is often a change of error behaviour between replicates which can also be explained by different (constant) external conditions under which each replicate was taken. Whatever the reason for the difference between replicates, it appears that the model structure remains the same between replicates but the parameter values vary. In this work we therefore consider and study the properties of different approaches that allow to combine the information from all replicates considering this phenomenon, confirming their validity both in simulation settings and also when applied to real inertial sensor error signals. By taking into account parameter variation between replicates, this work highlights how these approaches can improve the average navigation precision as well as obtain reliable estimates of the uncertainty of the navigation solution.
  4. Scale-wise Variance Minimization for Optimal Virtual Signals
    Yuming Zhang, Davide A. Cucci, Roberto Molinari and Stéphane Guerrier.

    IEEE Transactions on Signal Processing, major revision submitted.

    The increased use of low-cost gyroscopes within inertial sensors for navigation purposes, among others, has brought to the development of a considerable amount of research in improving their measurement precision. Aside from developing methods that allow to model and account for the deterministic and stochastic components that contribute to the measurement errors of these devices, an approach that has been put forward in recent years is to make use of arrays of such sensors in order to combine their measurements thereby reducing the impact of individual sensor noise. Nevertheless combining these measurements is not straightforward given the complex stochastic nature of these errors and, although some solutions have been suggested, these are limited to certain specific settings which do not allow to achieve solutions in more general and common circumstances. Hence, in this work we put forward a non-parametric method that makes use of the wavelet cross-covariance at different scales to combine the measurements coming from an array of gyroscopes in order to deliver an optimal measurement signal without needing any assumption on the processes underlying the individual error signals. We also study an appropriate non-parametric approach for the estimation of the asymptotic covariance matrix of the wavelet cross-covariance estimator which has important applications beyond the scope of this work. The theoretical properties of the proposed approach are studied and are supported by simulations and real applications, indicating that this method represents an appropriate and general tool for the construction of optimal virtual signals that are particularly relevant for arrays of gyroscopes. Moreover, our results can support the creation of optimal signals for other types of inertial sensors other than gyroscopes as well as for redundant measurements in other domains other than navigation.
  5. Chameleon MicroRNAs in Breast Cancer: the Elusive Role as Regulatory Factors in Cancer Progression
    Cesare Miglioli, Gaetan Bakalli, Samuel Orso, Mucyo Karemera, Roberto Molinari, Stéphane Guerrier and Nabil Mili.

    Scientific Reports, major revision invited.

    Breast cancer is one of the most frequent cancers affecting women. Non-coding micro RNAs (miRNAs) seem to play an important role in the regulation of pathways involved in tumor occurrence and progression. Extending on the research in Haakensen et al., where significant miRNAs were selected as being associated with the progression from normal breast tissue to breast cancer, in this work we put forward 112 sets of miRNA combinations, each including at most 5 expressions with high accuracy in discriminating healthy breast tissue from breast carcinoma. Our results are based on a recently developed machine learning technique which, instead of selecting a single model (or combination of features), delivers a set of models with equivalent predictive capabilities that allow to interpret and visualize the interaction of these features. These results shed new light on the biological action of the selected miRNAs which can behave in different ways according to the miRNA network with which they interact. Indeed, these revealed connections may contribute to explain why, in some cases, different studies attribute opposite functions to the same miRNA. It is therefore possible to understand how the role of a genomic variable may change when considered in interaction with other sets of variables, as opposed to only considering its effect when it is evaluated within a unique combination of features. The approach proposed in this work provides a statistical basis for the notion of chameleon miRNAs and is inspired by the emerging field of systems biology.
  6. Robust Two-step Wavelet-based Inference for Time Series Models
    Stéphane Guerrier, Roberto Molinari, Maria-Pia Victoria-Feser and Haotian Xu.

    Journal of the American Statistical Association (Theory & Methods), in press.

    Latent time series models such as (the independent sum of) ARMA(p, q) models with additional stochastic processes are increasingly used for data analysis in biology, ecology, engineering, and economics. Inference on and/or prediction from these models can be highly challenging: (i) the data may contain outliers that can adversely affect the estimation procedure; (ii) the computational complexity can become prohibitive when the time series are extremely large; (iii) model selection adds another layer of (computational) complexity; and (iv) solutions that address (i), (ii), and (iii) simultaneously do not exist in practice. This paper aims at jointly addressing these challenges by proposing a general framework for robust two-step estimation based on a bounded influence M-estimator of the wavelet variance. We first develop the conditions for the joint asymptotic normality of the latter estimator thereby providing the necessary tools to perform (direct) inference for scale-based analysis of signals. Taking advantage of the model-independent weights of this first-step estimator, we then develop the asymptotic properties of two-step robust estimators using the framework of the generalized method of wavelet moments (GMWM). Simulation studies illustrate the good finite sample performance of the robust GMWM estimator and applied examples highlight the practical relevance of the proposed approach.
  7. Granger-Causal Testing for Irregularly Sampled Time Series with Application to Nitrogen Signaling in Arabidopsis
    Sachin Heerah, Roberto Molinari, Stéphane Guerrier and Amy Marshall-Colon.

    Bioinformatics, 7(16), p.2450-2460.

    Identification of system-wide causal relationships can contribute to our understanding of long-distance, intercellular signalling in biological organisms. Dynamic transcriptome analysis holds great potential to uncover coordinated biological processes between organs. However, many existing dynamic transcriptome studies are characterized by sparse and often unevenly spaced time points that make the identification of causal relationships across organs analytically challenging. Application of existing statistical models, designed for regular time series with abundant time points, to sparse data may fail to reveal biologically significant, causal relationships. With increasing research interest in biological time series data, there is a need for new statistical methods that are able to determine causality within and between time series data sets. Here, a statistical framework was developed to identify (Granger) causal gene-gene relationships of unevenly spaced, multivariate time series data from two different tissues of Arabidopsis thaliana in response to a nitrogen signal.
  8. A Two-sample Nonparametric Test for Circular Data–its Exact Distribution and Performance
    S Rao Jammalamadaka, Stéphane Guerrier and Vasudevan Mangalam.

    Sankhya B, 83, p.140-166 .

    A nonparametric test labelled ‘Rao Spacing-frequencies test’ is explored and developed for testing whether two circular samples come from the same population. Its exact distribution and performance relative to comparable tests such as the Wheeler-Watson test and the Dixon test in small samples, are discussed. Although this test statistic is shown to be asymptotically normal, as one would expect, this large sample distribution does not provide satisfactory approximations for small to moderate samples. Exact critical values for small samples are obtained and tables provided here, using combinatorial techniques, and asymptotic critical regions are assessed against these. For moderate sample sizes in-between i.e. when the samples are too large making combinatorial techniques computationally prohibitive but yet asymptotic regions do not provide a good approximation, we provide a simple Monte Carlo procedure that gives very accurate critical values. As is well-known, the large number of usual rank-based tests are not applicable in the context of circular data since the values of such ranks depend on the arbitrary choice of origin and the sense of rotation used (clockwise or anti-clockwise). Tests that are invariant under the group of rotations, depend on the data through the so-called ‘spacing frequencies’, the frequencies of one sample that fall in between the spacings (or gaps) made by the other. The Wheeler-Watson, Dixon, and the proposed Rao tests are of this form and are explicitly useful for circular data, but they also have the added advantage of being valid and useful for comparing any two samples on the real line. Our study and simulations establish the ‘Rao spacing-frequencies test’ as a desirable, and indeed preferable test in a wide variety of contexts for comparing two circular samples, and as a viable competitor even for data on the real line. Computational help for implementing any of these tests, is made available online “TwoCircles” R package and is part of this paper.
  9. Polymeric Micelle Formulations for the Cutaneous Delivery of Sirolimus: A New Approach for the Treatment of Facial Angiofibromas in Tuberous Sclerosis Complex
    Julie Quartier, Maria Lapteva, Younes Boulaguiem, Stéphane Guerrier and Yogeshvar Kalia.

    International Journal of Pharmaceutics, 604, p.1-13.

    Facial angiofibromas are benign tumors characteristic of tuberous sclerosis complex. The disease involves the mTOR pathway and the cutaneous manifestation responds to topical treatment with sirolimus (SIR). However, there are no approved topical SIR products and extemporaneous formulations have been sub-optimal. The aims of this study were (i) to develop aqueous formulations of SIR loaded in polymeric micelles prepared using D-α-tocopherol polyethylene glycol 1000 succinate (TPGS) and (ii) to use the cutaneous biodistribution method, in conjunction with a new statistical approach, to investigate the feasibility of SIR delivery to the viable epidermis. Optimized micelle solutions and hydrogels (0.2%) were developed and stable at 4 °C for at least 6 and 3 months, respectively. Cutaneous delivery experiments (infinite and finite dose) using porcine skin demonstrated that both formulations increased SIR cutaneous bioavailability as compared to the control (ointment 0.2%). Moreover, studies with the micellar hydrogel 0.2% demonstrated SIR deposition in the viable epidermis with no transdermal permeation. These encouraging results confirmed that polymeric micelles enabled development of aqueous SIR formulations capable of targeted epidermal delivery. Furthermore, the cutaneous biodistribution provided a detailed insight into drug bioavailability in the different skin compartments that could complement/explain clinical observations of formulation efficacy.
  10. Empirical Predictive Modeling Approach to Quantifying Social Vulnerability to Natural Hazards
    Yi Wang, Paolo Gardoni, Colleen Murphy and Stéphane Guerrier.

    Annals of the American Association of Geographers, 111(5), p.1559-1583.

    Conventionally, natural hazard scholars quantify social vulnerability based on social indicators to manifest the extent to which locational communities are susceptible to adverse impacts of natural hazard events and are prone to limited or delayed recoveries. They usually overlook the different geographical distributions of social vulnerability at different hazard intensities and in distinct response and recovery phases, however. In addition, conventional approaches to quantifying social vulnerability usually establish the relationship between social indicators and social vulnerability with little evidence from empirical data science. In this article, we introduce a general framework of a predictive modeling approach to quantifying social vulnerability given intensity during a response or recovery phase. We establish the relationship between social indicators and social vulnerability with an empirical statistical method and historical data on hazard effects. The new metric of social vulnerability given an intensity measure can be coupled with hazard maps for risk analysis to predict adverse impacts or poor recoveries associated with future natural hazard events. An example based on data on casualties, house damages, and peak ground accelerations of the 2015 Gorkha earthquake in Nepal and pre-event social indicators at the district level shows that the proposed approach can be applied for vulnerability quantification and risk analysis in terms of specific hazard impacts.
  11. Non Applicability of Validated Predictive Models for Intensive Care Admission and Death of COVID-19 Patients in a Secondary Care Hospital in Belgium
    Nicolas Parisi, Aurore Janier-Dubry, Ester Ponzetto, Charalambos Pavlopoulos, Gaetan Bakalli, Roberto Molinari, Stéphane Guerrier and Nabil Mili.

    Journal of Emergency and Critical Care Medicine, 5(22), p.1-13.

    Background: Simple and reliable predictive scores for intensive care admissions and death based on clinical data in coronavirus disease 2019 (COVID-19) patients are numerous but may be misleading. Predictive scores for admission to intensive care unit (ICU) or death based on clinical and easily affordable laboratory data are still needed in secondary hospital and hospitals in developing countries that do not have high-performance laboratories. Methods: The goal of this study is to verify that a recently published predictive score conducted on a large scale in China (Liang score) can be used on patients coming from a Belgian population catchment area. Monocentric retrospective cohort study of 66 patients with known COVID-19 disease run from early March to end of May in Clinique Saint-Pierre Ottignies, a secondary care hospital in Belgium. The outcomes of the study are (I) admission in the ICU and (II) death. All patients admitted in the Emergency Department with a positive RT-PCR SARS-CoV-2 test were included in the study. Routine clinical and laboratory data were collected at their admission and during their stay, as well as chest X-rays and CT-scans. Liang score was used as benchmark. Logistic regression models were used to develop predictive. Results: Liang score performs poorly, both in terms of admission to intensive care and in terms of death. In our cohort, it appears that lactate dehydrogenase (LDH) above 579 UI/L and venous lactate above 3.02 mmol/L may be considered as good predictive biological factors for ICU admission. With regards to death risk, neutrophil-lymphocyte ratio (NLR) above 22.1, tobacco abuse status and respiratory impairment appears to be relevant predictive factors. Conclusions: Firstly, a promising score from a large-scale study in China appears to perform poorly when applied to a European cohort, whether to predict for admission to ICU or death. Secondly, biological features that are quite significant for the admission to ICU such as LDH or venous lactate cannot predict death. Thirdly, simple and affordable variables such as LDH, LDH + sex, or LDH + sex + venous lactate have a very good sensitivity and an acceptable specificity for ICU admission.


  1. Wavelet-Based Moment-Matching Techniques for Inertial Sensor Calibration
    Stéphane Guerrier, Juan Jurado, Mehran Khaghani, Gaetan Bakalli, Mucyo Karemera, Roberto Molinari, Samuel Orso, John Raquet, Christine Schubert and Jan Skaloud.

    IEEE Transactions on Instrumentation & Measurement, 69(10), p.7542-7551.

    The task of inertial sensor calibration has required the development of various techniques to take into account the sources of measurement error coming from such devices. The calibration of the stochastic errors of these sensors has been the focus of increasing amount of research in which the method of reference has been the so-called “Allan variance (AV) slope method” which, in addition to not having appropriate statistical properties, requires a subjective input which makes it prone to mistakes. To overcome this, recent research has started proposing “automatic” approaches where the parameters of the probabilistic models underlying the error signals are estimated by matching functions of the AV or wavelet variance with their model-implied counterparts. However, given the increased use of such techniques, there has been no study or clear direction for practitioners on which approach is optimal for the purpose of sensor calibration. This article, for the first time, formally defines the class of estimators based on this technique and puts forward theoretical and applied results that, comparing with estimators in this class, suggest the use of the Generalized method of Wavelet moments (GMWM) as an optimal choice. In addition to analytical proofs, experiment-driven Monte Carlo simulations demonstrated the superior performance of this estimator. Further analysis of the error signal from a gyroscope was also provided to further motivate performing such analyses, as real-world observed error signals may show significant deviation from manufacturer-provided error models.
  2. Generalized Additive Models: An Efficient Method for Short-Term Energy Prediction in Office Buildings
    Thulasi Ram Khamma, Yuming Zhang, Stéphane Guerrier and Mohamed Boubekri.

    Energy, 213, p.118834.

    In 2018, commercial buildings accounted for nearly 18.2% of the total energy consumption in the USA, making it a significant contributor to the greenhouse gases emissions (see, e.g. [1]). Specifically, office buildings accounted for 14% of the energy usage by the commercial sector. Hence, their energy performance has to be closely monitored and evaluated to address the critical issues of greenhouse gases emissions. Several data-driven statistical and machine learning models have been developed to assess the energy performance of office buildings based on historical data. While these methods often provide reliable prediction accuracy, they typically offer little interpretation of the relationships between variables and their impacts on energy consumption. Moreover, model interpretability is essential to understand, control and manage the variables affecting the energy consumption and therefore, such a feature is crucial and should be emphasized in the modeling procedure in order to obtain reliable and actionable results. For this reason, we use generalized additive models as a flexible, efficient and interpretable alternative to existing approaches in modeling and predicting the energy consumption in office buildings. To demonstrate the advantages of this approach, we consider an application to energy consumption data of HVAC systems in a mixed-use multi-tenant office building in Chicago, Illinois, USA. We present the building characteristics and various influential variables, based on which we construct a generalized additive model. We compare the prediction performance using various commonly used calibration metrics between the proposed model and existing methods, including support vector machine as well as classification and regression tree. We find that the proposed method outperforms the existing approaches, especially in terms of short term prediction.
  3. Worldwide Predictions of Earthquake Casualty Rates with Seismic Intensity Measure and Socioeconomic Data: a Fragility-based Formulation
    Yi Wang, Paolo Gardoni, Colleen Murphy and Stéphane Guerrier.

    Natural Hazards Review, 21(2), p.1-40.

    This paper presents a fragility-based Bayesian formulation to predict earthquake casualty rates for countries worldwide. The earthquake casualty rate of a community is defined as the probability that a person in the community is killed or injured given an intensity measure of the earthquake at the site of the community. Casualty data of 902 earthquakes worldwide from 2013 to 2017, information on population distributions, and the national socioeconomic data are used to calibrate the model. A model based on data from 2013 to 2016 is used to predict casualty rates of earthquakes in 2017. The comparisons of the model predictions with the actual observations show good agreement. With the fragility-based formulation, the proposed model can be fully coupled with seismic hazard maps for risk analysis. An example is shown in this paper to apply the model calibrated with the full data set with reference to a worldwide seismic hazard map to conduct a fully coupled seismic risk analysis and predict the expected casualty rates and counts due to earthquakes in future years for countries worldwide.
  4. Targeting Hallmarks of Cancer with a Food-system–based Approach
    James C Lachance, Sridhar Radhakrishnan, Gaurav Madiwale, Stéphane Guerrier and Jairam Vanamala.

    Nutrition, 69 (110563), p.1-23.

    Although extensive resources are dedicated to the development and study of cancer drugs, the cancer burden is expected to rise by about 70% over the next 2 decade. This highlights a critical need to develop effective, evidence-based strategies for countering the global rise in cancer incidence. Except in high-risk populations, cancer drugs are not generally suitable for use in cancer prevention owing to potential side effects and substantial monetary costs (Sporn, 2011). There is overwhelming epidemiological and experimental evidence that the dietary bioactive compounds found in whole plant-based foods have significant anticancer and chemopreventative properties. These bioactive compounds often exert pleiotropic effects and act synergistically to simultaneously target multiple pathways of cancer. Common bioactive compounds in fruits and vegetables include carotenoids, glucosinolates, and polyphenols. These compounds have been shown to target multiple hallmarks of cancer in vitro and in vivo and potentially to address the diversity and heterogeneity of certain cancers. Although many studies have been conducted over the past 30 y, the scientific community has still not reached a consensus on exactly how the benefit of bioactive compounds in fruits and vegetables can be best harnessed to help reduce the risk for cancer. Different stages of the food processing system, from "farm-to-fork," can affect the retention of bioactive compounds and thus the chemopreventative properties of whole foods, and there are opportunities to improve handling of foods throughout the stages in order to best retain their chemopreventative properties. Potential target stages include, but are not limited to, pre- and postharvest management, storage, processing, and consumer practices. Therefore, there is a need for a comprehensive food-system-based approach that not only taking into account the effects of the food system on anticancer activity of whole foods, but also exploring solutions for consumers, policymakers, processors, and producers. Improved knowledge about this area of the food system can help us adjust farm-to-fork operations in order to consistently and predictably deliver desired bioactive compounds, thus better utilizing them as invaluable chemopreventative tools in the fight to reduce the growing burden of cancer worldwide.


  1. Simulation-based Bias Correction Methods for Complex Models
    Stéphane Guerrier, Elise Dupuis-Lozeron, Yanyuan Ma and Maria-Pia Victoria-Feser.

    Journal of the American Statistical Association (Theory & Methods), 114(525), p.146-157.

    Along with the ever increasing data size and model complexity, an important challenge frequently encountered in constructing new estimators or in implementing a classical one such as the maximum likelihood estimator, is the computational aspect of the estimation procedure. To carry out estimation, approximate methods such as pseudo-likelihood functions or approximated estimating equations are increasingly used in practice as these methods are typically easier to implement numerically although they can lead to inconsistent and/or biased estimators. In this context, we extend and provide refinements on the known bias correction properties of two simulation-based methods, respectively, indirect inference and bootstrap, each with two alternatives. These results allow one to build a framework defining simulation-based estimators that can be implemented for complex models. Indeed, based on a biased or even inconsistent estimator, several simulation-based methods can be used to define new estimators that are both consistent and with reduced finite sample bias. This framework includes the classical method of the indirect inference for bias correction without requiring specification of an auxiliary model. We demonstrate the equivalence between one version of the indirect inference and the iterative bootstrap, both correct sample biases up to the order n− 3. The iterative method can be thought of as a computationally efficient algorithm to solve the optimization problem of the indirect inference. Our results provide different tools to correct the asymptotic as well as finite sample biases of estimators and give insight on which method should be applied for the problem at hand. The usefulness of the proposed approach is illustrated with the estimation of robust income distributions and generalized linear latent variable models. Supplementary materials for this article are available online.
  2. Multivariate Signal Modeling with Applications to Inertial Sensor Calibration
    Haotian Xu, Stéphane Guerrier, Roberto Molinari and Mucyo Karemera.

    IEEE Transactions on Signal Processing, 67(19), p.5143-5152.

    The common approach to inertial sensor calibration has been to model the stochastic error signals of individual sensors independently, whether as components of a single inertial measurement unit (IMU) in different directions or arrayed in the same direction for redundancy. For this purpose, research in this domain has been focused on the proposal of various methods to improve the estimation of these models both from a computational and a statistical point of view. However, the separate calibration of the individual sensors is unable to take into account the dependence between each of them which can have an important impact on the precision of the navigation systems. In this paper, we develop a new approach to simultaneously model the individual signals and the dependence between them by studying the quantity called Wavelet Cross-Covariance and using it to extend the application of the Generalized Method of Wavelet Moments. This new method can be used in other settings for time series modeling, especially in cases where the dependence among signals may be hard to detect. Moreover, in the field of inertial sensor calibration, this approach can deliver important contributions among which the possibility to test dependence between sensors, integrate their dependence within the navigation filter and construct an optimal virtual sensor that can be used to simplify and improve navigation accuracy. The advantages of this method and its usefulness for inertial sensor calibration are highlighted through a simulation study and an applied example with a small array of XSens MTi-G IMUs.
  3. A Multisignal Wavelet Variance-based Framework for Inertial Sensor Stochastic Error Modeling
    Ahmed Radi, Gaetan Bakalli, Stéphane Guerrier, Naser El-Sheimy, Abu Sesay and Roberto Molinari.

    IEEE Transactions on Instrumentation and Measurement, 68(12), p.4924-4936.

    The calibration of low-cost inertial sensors has become increasingly important over the last couple of decades, especially when dealing with sensor stochastic errors. This procedure is commonly performed on a single error measurement from an inertial sensor taken over a certain amount of time, although it is extremely frequent for different replicates to be taken for the same sensor, thereby delivering important information which is often left unused. In order to address the latter problem, this paper presents a general wavelet variance-based framework for multisignal inertial sensor calibration, which can improve the modeling and model selection procedures of sensor stochastic errors using all replicates from a calibration procedure and allows to understand the properties, such as stationarity, of these stochastic errors. The applications using microelectromechanical system inertial measurement units confirm the importance of this new framework, and a new graphical user interface makes these tools available to the general user. The latter is developed based on an R package called mgmwm and allows the user to select a type of sensor for which different replicates are available and to easily make use of the approaches presented in this paper in order to carry out the appropriate calibration procedure.
  4. Predicting Fatality Rates due to Earthquakes Accounting for Community Vulnerability
    Yi Wang, Paolo Gardoni, Colleen Murphy and Stéphane Guerrier.

    Earthquake spectra, 35(2), p.513-536.

    The existing prediction models for earthquake fatalities usually require a detailed building inventory that might not be readily available. In addition, existing models tend to overlook the socioeconomic characteristics of communities of interest as well as zero-fatality data points. This paper presents a methodology that develops a probabilistic zero-inflated beta regression model to predict earthquake fatality rates given the geographic distributions of earthquake intensities with data reflecting community vulnerability. As an illustration, the prediction model is calibrated using fatality data from 61 earthquakes affecting Taiwan from 1999 to 2016, as well as information on the socioeconomic and environmental characteristics of the affected communities. Using a local seismic hazard map, the calibrated prediction model is used in a seismic risk analysis for Taiwan that predicts the expected fatality rates and counts caused by earthquakes in future years.


  1. Parametric Inference for Index Functionals
    Stéphane Guerrier, Samuel Orso and Maria-Pia Victoria-Feser.

    Econometrics, invited paper for the special issue Econometrics and Income Inequality, 6(2), 22.

    In this paper, we study the finite sample accuracy of confidence intervals for index functional built via parametric bootstrap, in the case of inequality indices. To estimate the parameters of the assumed parametric data generating distribution, we propose a Generalized Method of Moment estimator that targets the quantity of interest, namely the considered inequality index. Its primary advantage is that the scale parameter does not need to be estimated to perform parametric bootstrap, since inequality measures are scale invariant. The very good finite sample coverages that are found in a simulation study suggest that this feature provides an advantage over the parametric bootstrap using the maximum likelihood estimator. We also find that overall, a parametric bootstrap provides more accurate inference than its non or semi-parametric counterparts, especially for heavy tailed income distributions.
  2. Use of a New Online Calibration Platform with Applications to Inertial Sensors
    Philipp Clausen, Jan Skaloud, Roberto Molinari, Justin Lee and Stéphane Guerrier.

    IEEE Aerospace and Electronic Systems Magazine, 33(8), p.30-36.

    In many fields, going from economics to physics, it is common to deal with measurements that are taken in time. These measurements are often explained by known external factors that describe a large part of their behavior. For example, the evolution of the unemployment rate in time can be explained by the behavior of the gross domestic product (the external factor in this case). However, in many cases the external factors are not enough to explain the entire behavior of the measurements, and it is necessary to use so-called stochastic models (or probabilistic models) that describe how the measurements are dependent on each other through time (i.e., the measurements are explained by the behavior of the previous measurements themselves). The treatment and analysis of the latter kind of behavior is known by various names, such as timeseries analysis or signal processing. In the majority of cases, the goal of this analysis is to estimate the parameters of the underlying models which, in some sense, explain how and to what extent the observations depend on each other through time.
  3. Is Nonmetastatic Cutaneous Melanoma Predictable through Genomic Biomarkers?
    Mattia Branca, Samuel Orso, Roberto Molinari, Haotian Xu, Stéphane Guerrier, Yuming Zhang and Nabil Mili.

    Melanoma Research, 28(1), p.21-29.

    Cutaneous melanoma is a highly aggressive skin cancer whose treatment and prognosis are critically affected by the presence of metastasis. In this study, we address the following issue: which gene transcripts and what kind of interactions between them can allow to predict nonmetastatic from metastatic melanomas with a high level of accuracy? We carry out a meta-analysis on the first gene expression set of the Leeds melanoma cohort, as made available online on 11 May 2016 through the ArrayExpress platform with MicroArray Gene Expression number 4725. According to the authors, primary melanoma mRNA expression was measured in 204 tumours using an illumina DASL HT12 4 whole-genome array. The tumour transcripts were selected through a recently proposed predictive-based regression algorithm for gene-network selection. A set of 64 equivalent models, each including only two gene transcripts, were each sufficient to accurately classify primary tumours into metastatic and nonmetastatic melanomas. The sensitivity and specificity of the genomic-based models were, respectively, 4% (95% confidence interval: 0.11-21.95%) and 99% (95% confidence interval: 96.96-99.99%). The very high specificity coupled with a significantly large positive likelihood ratio leads to a conclusive increase in the likelihood of disease when these biomarkers are present in the primary tumour. In conjunction with other highly sensitive methods, this approach can aspire to be part of the future standard diagnosis methods for the screening of metastatic cutaneous melanoma. The small dimension of the selected transcripts models enables easy handling of large-scale genomic testing procedures. Moreover, some of the selected transcripts have an understandable link with what is known about cutaneous melanoma oncogenesis, opening a window on the molecular pathways underlying the metastatic process of this disease.
  4. A Computationally Efficient Framework for Automatic Inertial Sensor Calibration
    James Balamuta, Roberto Molinari, Stéphane Guerrier and Wenchao Yang.

    IEEE Sensors Journal, 18(4), p.1636-1646.

    The calibration of (low-cost) inertial sensors has become increasingly important over the past years, since their use has grown exponentially in many applications going from unmanned aerial vehicle navigation to 3-D animation. However, this calibration procedure is often quite problematic since, aside from compensating for deterministic measurement errors due to physical phenomena such as dynamics or temperature, the stochastic signals issued from these sensors in static settings have a complex spectral structure and the methods available to estimate the parameters of these models are either unstable, computationally intensive, and/or statistically inconsistent. This paper presents a new software platform for calibration of the stochastic component in inertial sensor measurement errors based on the generalized method of wavelet moments, which provides a computationally efficient, flexible, user-friendly, and statistically sound tool to estimate and select from a wide range of complex models. In addition, all this is possible also in a robust framework allowing to perform sensor calibration when the data are affected by outliers. The software is developed within the open-source statistical software R and is based on C++ language allowing it to achieve high computational performance.


  1. A Study of the Allan Variance for Constant-Mean Nonstationary Processes
    Haotian Xu, Stéphane Guerrier, Roberto Molinari and Yuming Zhang.

    IEEE Signal Processing Letters, 24(8), p.1257-1260.

    The Allan variance (AV) is a widely used quantity in areas focusing on error measurement as well as in the general analysis of variance for autocorrelated processes in domains such as engineering and, more specifically, metrology. The form of this quantity is widely used to detect noise patterns and indications of stability within signals. However, the properties of this quantity are not known for commonly occurring processes whose covariance structure is nonstationary and, in these cases, an erroneous interpretation of the AV could lead to misleading conclusions. This letter generalizes the theoretical form of the AV to some nonstationary processes while at the same time being valid also for weakly stationary processes. Some simulation examples show how this new form can help to understand the processes for which the AV is able to distinguish these from the stationary cases and hence allow for a better interpretation of this quantity in applied cases.


  1. A Predictive based Regression Algorithm for Gene Network Selection
    Stéphane Guerrier, Nabil Mili, Roberto Molinari, Samuel Orso, Marco Avella-Medina and Yanyuan Ma.

    Frontiers in Genetics, Statistical Genetics and Methodology, 7(97), p.1-11.

    Gene selection has become a common task in most gene expression studies. The objective of such research is often to identify the smallest possible set of genes that can still achieve good predictive performance. To do so, many of the recently proposed classification methods require some form of dimension-reduction of the problem which finally provide a single model as an output and, in most cases, rely on the likelihood function in order to achieve variable selection. We propose a new prediction-based objective function that can be tailored to the requirements of practitioners and can be used to assess and interpret a given problem. Based on cross-validation techniques and the idea of importance sampling, our proposal scans low-dimensional models under the assumption of sparsity and, for each of them, estimates their objective function to assess their predictive power in order to select. Two applications on cancer data sets and a simulation study show that the proposal compares favorably with competing alternatives such as, for example, Elastic Net and Support Vector Machine. Indeed, the proposed method not only selects smaller models for better, or at least comparable, classification errors but also provides a set of selected models instead of a single one, allowing to construct a network of possible models for a target prediction accuracy level.
  2. Theoretical Limitations of Allan Variance-based Regression for Time Series Model Estimation
    Stéphane Guerrier, Roberto Molinari and Yannick Stebler.

    IEEE Signal Processing Letters, 23(5), p.597-601.

    This letter formally proves the statistical inconsistency of the Allan variance-based estimation of latent (composite) model parameters. This issue has not been sufficiently investigated and highlighted since it is a technique that is still being widely used in practice, especially within the engineering domain. Indeed, among others, this method is frequently used for inertial sensor calibration, which often deals with latent time series models and practitioners in these domains are often unaware of its limitations. To prove the inconsistency of this method, we first provide a formal definition and subsequently deliver its theoretical properties, highlighting its limitations by comparing it with another statistically sound method.
  3. Wavelet-based Improvements for Inertial Sensor Error Modeling
    Stéphane Guerrier, Roberto Molinari and Yannick Stebler.

    IEEE Transactions on Instrumentation and Measurement, 65(12), p.2693-2700.

    The parametric estimation of stochastic error signals is a common task in many engineering applications, such as inertial sensor calibration. In the latter case, the error signals are often of complex nature, and very few approaches are available to estimate the parameters of these processes. A frequently used approach for this purpose is the maximum likelihood (ML), which is usually implemented through a Kalman filter and found via the expectation-maximization algorithm. Although the ML is a statistically sound and efficient estimator, its numerical instability has brought to the use of alternative methods, the main one being the generalized method of wavelet moments (GMWM). The latter is a straightforward, consistent, and computationally efficient approach, which nevertheless loses statistical efficiency compared with the ML method. To narrow this gap, in this paper, we show that the performance of the GMWM estimator can be enhanced by making use of model moments in addition to those provided by the vector of wavelet variances. The theoretical findings are supported by simulations that highlight how the new estimator not only improves the finite sample performance of the GMWM but also allows it to approach the statistical efficiency of the ML. Finally, a case study with an inertial sensor demonstrates how useful this development is for the purposes of sensor calibration.
  4. Discussion on Maximum Likelihood-based Methods for Inertial Sensor Calibration
    Stéphane Guerrier, Roberto Molinari and James Balamuta.

    IEEE Sensors Journal, 16(14), p.5522-5523.

    This letter highlights some issues which were overlooked in a recently published paper called maximum likelihood identification of inertial sensor noise model parameters. The latter paper does not consider existing alternative methods, which specifically tackle this issue in a possibly more direct manner and, although remaining a generally valid proposal, does not appear to improve on the earlier proposals. Finally, a simulation study rectifies the poor results of an estimator of reference in the same publication.
  5. Member Plan Choice and Migration in Response to Changes in Member Premiums after Massachusetts Health Insurance Reform
    Ian Duncan and Stéphane Guerrier.

    North American Actuarial Journal, p.1-16.

    In 2006 Massachusetts implemented a substantial reform of its health insurance market that included a new program for uninsured individuals with income between 100% of Federal Poverty (the upper limit for state Medicaid benefits) and 300% of Federal Poverty. Enrollment was compulsory for all citizens because of a mandate. Consumers who enrolled in this program, which offered generous benefits with low copays, received graduated subsidies depending on their income. Five insurers were contracted to underwrite the program, and consumers were able to choose their insurer. Insurers bid annually, and the member contribution was set according to an affordability schedule for the lowest-bidding insurer. Consumers could choose from the range of insurers, but if they chose a plan other than the lowest cost, their contributions reflected the difference. Premiums were changed annually at July 1, and members were eligible to move to a different plan at this date; a number of members migrated each year. This study aims to quantify the effect of this premium-induced switching behavior. Prior studies of member switching behavior have looked at employer plans and estimated the elasticity of response to changes in member contributions. The Massachusetts environment is unique in that there is a mandate (so being uninsured is not an option) and members may choose insurer but not benefit plan. Thus a study of migration in Massachusetts is uniquely able to quantify the effect of price (contribution rates) on member switching behavior. We find elasticity averaging −0.21 for 2013 (the last year of the study) to be somewhat lower (in absolute value) than previous studies of employer populations. Elasticity has also been significantly increasing with time and appeared to have at least doubled over the studied period (i.e., 2008–2013). Prior studies have estimated higher elasticities in the range −0.3 to −0.6. We found that the data contained many outliers in terms of both changes in contributions and percentage of members switching plans. The effect of outliers was moderated by the choice of robust regression models, leading us to question whether other studies may have been affected by outliers, leading to overestimates of the elasticities.
  6. Use of a Publicly Available Database to Determine the Impact of Diabetes on Length of Hospital Stay for Elective Orthopedic Procedures in California
    David Kerr, Meroe Yadollahi, Hemerson Bautista, Xin Chen, Shuyan Dong, Stéphane Guerrier, Remmert J Laan and Ian Duncan.

    Population Health Management, p.1-17.

    In California, 1 in 3 hospital beds are occupied by adults with diabetes. The aim of this study was to examine whether diabetes impacts length of stay (LOS) following common elective orthopedic procedures compared to nondiabetic individuals, and also the performance of hospitals across California for these procedures. Using the Public Use California Patient Discharge Data Files for 2010-2012, the authors examined LOS for elective discharges for hip, spine, or knee surgery (n = 318,861) from the total population of all discharges (n = 11,476,073) for 309 hospitals across California. In all, 16% of discharges had a codiagnosis of diabetes. Unadjusted average LOS was 3.11 days without and 3.40 days with diabetes (mean difference 0.29 [95% confidence interval (0.27, 0.31) days, P < 0.01]). After adjusting for covariates, diabetes no longer resulted in a significant difference in LOS. However, the presence of common comorbidities did significantly impact LOS. Average LOS for patients with diabetes also varied widely by hospital, ranging between -50% and +100% of the mean LOS for all hospitals. Diabetes does not prolong LOS after orthopedic procedures unless comorbidities are present. Nevertheless, across California there is significant variation in LOS between individual hospitals, which may inform the decision-making process for prospective patients and payers.


  1. Automatic Identification and Calibration of Stochastic Parameters in Inertial Sensors
    Stéphane Guerrier, Roberto Molinari and Jan Skaloud.

    Journal of the Institute of Navigation, 62(4), p.265-272.

    We present an algorithm for determining the nature of stochastic processes and their parameters based on the analysis of time series of inertial errors. The algorithm is suitable mainly (but not only) for situations where several stochastic processes are superposed. The proposed approach is based on a recently developed method called the Generalized Method of Wavelet Moments (GMWM), whose estimator was proven to be consistent and asymptotically normally distributed. This method delivers a global selection criterion based on the wavelet variance that can be used to determine the suitability of a candidate model (compared to other models) and apply it to low-cost inertial sensors. By allowing candidate model ranking, this approach enables us to construct an algorithm for automatic model identification and determination. The benefits of this methodology are highlighted by providing practical examples of model selection for two types of MEMS IMUs.
  2. An Approach for Observing and Modeling Errors in MEMS-based Inertial Sensors under Vehicle Dynamic
    Yannick Stebler, Stéphane Guerrier and Jan Skaloud.

    IEEE Transactions on Instrumentation and Measurement, 64(11), p.2926-2936.

    This paper studies the error behavior of low-cost inertial sensors in dynamic conditions. After proposing a method for error observations per sensor (i.e., gyroscope or accelerometer) and axes, their properties are estimated via the methodology of generalized method of wavelet moments. The developed model parameters are compared with those obtained under static conditions. Then, an attempt is presented to link the parameters of the established model to the dynamic of the vehicle. It is found that a linear relation explains a large portion of the exhibited variability. These findings suggest that the static methods employed for the calibration of inertial sensors could be improved when exploiting such a relationship.


  1. Generalized Method of Wavelet Moments for Inertial Navigation Filter Design
    Yannick Stebler, Stéphane Guerrier, Jan Skaloud and Maria-Pia Victoria-Feser.

    IEEE Transactions on Aerospace and Electronic Systems, 50(3), p.2269-2283.

    The integration of observations issued from a satellite-based system (GNSS) with an inertial navigation system (INS) is usually performed through a Bayesian filter such as the extended Kalman filter (EKF). The task of designing the navigation EKF is strongly related to the inertial sensor error modeling problem. Accelerometers and gyroscopes may be corrupted by random errors of complex spectral structure. Consequently, identifying correct error-state parameters in the INS/GNSS EKF becomes difficult when several stochastic processes are superposed. In such situations, classical approaches like the Allan variance (AV) or power spectral density (PSD) analysis fail due to the difficulty of separating the error processes in the spectral domain. For this purpose, we propose applying a recently developed estimator based on the generalized method of wavelet moments (GMWM), which was proven to be consistent and asymptotically normally distributed. The GMWM estimator matches theoretical and sample-based wavelet variances (WVs), and can be computed using the method of indirect inference. This article mainly focuses on the implementation aspects related to the GMWM, and its integration within a general navigation filter calibration procedure. Regarding this, we apply the GMWM on error signals issued from MEMS-based inertial sensors by building and estimating composite stochastic processes for which classical methods cannot be used. In a first stage, we validate the resulting models using AV and PSD analyses and then, in a second stage, we study the impact of the resulting stochastic models design in terms of positioning accuracy using an emulated scenario with statically observed error signatures. We demonstrate that the GMWM-based calibration framework enables to estimate complex stochastic models in terms of the resulting navigation accuracy that are relevant for the observed structure of errors.
  2. Estimation of Time Series Models via Robust Wavelet Variance
    Stéphane Guerrier, Roberto Molinari and Maria-Pia Victoria-Feser.

    Austrian Journal of Statistics, 43(3-4), p.267- 277.

    A robust approach to the estimation of time series models is proposed. Taking from a new estimation method called the Generalized Method of Wavelet Moments (GMWM) which is an indirect method based on the Wavelet Variance (WV), we replace the classical estimator of the WV with a recently proposed robust M-estimator to obtain a robust version of the GMWM. The simulation results show that the proposed approach can be considered as a valid robust approach to the estimation of time series and state-space models.


  1. Wavelet-variance-based Estimation for Composite Stochastic Processes
    Stéphane Guerrier, Jan Skaloud, Yannick Stebler and Maria-Pia Victoria-Feser.

    Journal of the American Statistical Association (Theory & Methods), 108(503), p.1021-1030.

    This article presents a new estimation method for the parameters of a time series model. We consider here composite Gaussian processes that are the sum of independent Gaussian processes which, in turn, explain an important aspect of the time series, as is the case in engineering and natural sciences. The proposed estimation method offers an alternative to classical estimation based on the likelihood, that is straightforward to implement and often the only feasible estimation method with complex models. The estimator furnishes results as the optimization of a criterion based on a standardized distance between the sample wavelet variances (WV) estimates and the model-based WV. Indeed, the WV provides a decomposition of the variance process through different scales, so that they contain the information about different features of the stochastic model. We derive the asymptotic properties of the proposed estimator for inference and perform a simulation study to compare our estimator to the MLE and the LSE with different models. We also set sufficient conditions on composite models for our estimator to be consistent, that are easy to verify. We use the new estimator to estimate the stochastic error's parameters of the sum of three first order Gauss–Markov processes by means of a sample of over 800, 000 issued from gyroscopes that compose inertial navigation systems. Supplementary materials for this article are available online.


  1. Fault Detection and Isolation in Multiple MEMS-IMUs Configurations
    Stéphane Guerrier, Adrian Waegli, Jan Skaloud and Maria-Pia Victoria-Feser.

    IEEE Transactions on Aerospace and Electronic Systems, 48(3), p.2015-2031.

    This research presents methods for detecting and isolating faults in multiple micro-electro-mechanical system inertial measurement unit (MEMS-IMU) configurations. First, geometric configurations with n sensor triads are investigated. It is proved that the relative orientation between sensor triads is irrelevant to system optimality in the absence of failures. Then, the impact of sensor failure or decreased performance is investigated. Three fault detection and isolation (FDI) approaches (i.e., the parity space method, Mahalanobis distance method and its direct robustification) are reviewed theoretically and in the context of experiments using reference signals. It is shown that in the presence of multiple outliers the best performing detection algorithm is the robust version of the Mahalanobis distance.


  1. Constrained Expectation-Maximization Algorithm for Stochastic Inertial Error Modeling: Study of Feasibility
    Yannick Stebler, Stéphane Guerrier, Jan Skaloud and Maria-Pia Victoria-Feser.

    Measurement Science and Technology, 22(8), p.121-135.

    Stochastic modeling is a challenging task for low-cost sensors whose errors can have complex spectral structures. This makes the tuning process of the INS/GNSS Kalman filter often sensitive and difficult. For example, first-order Gauss–Markov processes are very often used in inertial sensor models. But the estimation of their parameters is a non-trivial task if the error structure is mixed with other types of noises. Such an estimation is often attempted by computing and analyzing Allan variance plots. This contribution demonstrates solving situations when the estimation of error parameters by graphical interpretation is rather difficult. The novel strategy performs direct estimation of these parameters by means of the expectation-maximization (EM) algorithm. The algorithm results are first analyzed with a critical and practical point of view using simulations with typically encountered error signals. These simulations show that the EM algorithm seems to perform better than the Allan variance and offers a procedure to estimate first-order Gauss–Markov processes mixed with other types of noises. At the same time, the conducted tests revealed limits of this approach that are related to the convergence and stability issues. Suggestions are given to circumvent or mitigate these problems when complexity of error structure is 'reasonable'. This work also highlights the fact that the suggested approach via EM algorithm and the Allan variance may not be able to estimate the parameters of complex error models reasonably well and shows the need for new estimation procedures to be developed in this context. Finally, an empirical scenario is presented to support the former findings. There, the positive effect of using the more sophisticated EM-based error modeling on a filtered trajectory is highlighted.


  1. Noise Reduction and Estimation in Multiple Micro-electro-mechanical Inertial Systems
    Adrian Waegli, Jan Skaloud, Stéphane Guerrier, Maria Eulalia Parés and Ismael Colomina.

    Measurement Science and Technology, 21(6), p.231-242.

    This research studies the reduction and the estimation of the noise level within a redundant configuration of low-cost (MEMS-type) inertial measurement units (IMUs). Firstly, independent observations between units and sensors are assumed and the theoretical decrease in the system noise level is analyzed in an experiment with four MEMS-IMU triads. Then, more complex scenarios are presented in which the noise level can vary in time and for each sensor. A statistical method employed for studying the volatility of financial markets (GARCH) is adapted and tested for the usage with inertial data. This paper demonstrates experimentally and through simulations the benefit of direct noise estimation in redundant IMU setups.
  2. © Copyright 2021 Stéphane Guerrier.