Publications
2024
Assessing COVID-19 Prevalence in Austria with Infection Surveys and Case Count Data as Auxiliary Information
Journal of the American Statistical Association 119, p. 1722-1735.
Countries officially record the number of COVID-19 cases based on medical tests of a subset of the population. These case count data obviously suffer from participation bias, and for prevalence estimation, these data are typically discarded in favor of infection surveys, or possibly also completed with auxiliary information. One exception is the series of infection surveys recorded by the Statistics Austria Federal Institute to study the prevalence of COVID-19 in Austria in April, May, and November 2020. In these infection surveys, participants were additionally asked if they were simultaneously recorded as COVID-19 positive in the case count data. In this article, we analyze the benefits of properly combining the outcomes from the infection survey with the case count data, to analyze the prevalence of COVID-19 in Austria in 2020, from which the case ascertainment rate can be deduced. The results show that our approach leads to a significant efficiency gain. Indeed, considerably smaller infection survey samples suffice to obtain the same level of estimation accuracy. Our estimation method can also handle measurement errors due to the sensitivity and specificity of medical testing devices and to the nonrandom sample weighting scheme of the infection survey. The proposed estimators and associated confidence intervals are implemented in the companion open source R package pempi available on the Comprehensive R Archive Network (CRAN). Supplementary materials for this article are available online including a standardized description of the materials available for reproducing the work.Accounting for Vibration Noise in Stochastic Measurement Errors of Inertial Sensors
IEEE Transactions in Signal Processing 72.
The measurement of data over time and/or space is of utmost importance in a wide range of domains from engineering to physics. Devices that perform these measurements, such as inertial sensors, need to be extremely precise to obtain correct system diagnostics and accurate predictions, consequently requiring a rigorous calibration procedure before being employed. Most of the research over the past years has focused on delivering methods that can explain and estimate the complex stochastic components of these errors. In this context, the Generalized Method of Wavelet Moments emerges as a computationally efficient estimator with appropriate statistical properties and with different advantages over existing methods such as those based on likelihood estimation and the Allan variance. However it has this far not accounted for a significant stochastic noise that arises for many of these devices: vibration noise. This component can originate from different sources, including the internal mechanics of the sensors as well as the movement of these devices when placed on moving objects. To remove this disturbance from signals, this work puts forward a modelling framework for this specific type of noise and adapts the Generalized Method of Wavelet Moments to estimate these models. We deliver the asymptotic properties of this method when applied to processes that include vibration noise and show the considerable practical advantages of this approach in simulation and applied case studies.Finite Sample Corrections for Average Equivalence Testing
Statistics in Medicine 43 (5) p. 833-854.
Average (bio)equivalence tests are used to assess if a parameter, like the mean difference in treatment response between two conditions for example, lies within a given equivalence interval, hence allowing to conclude that the conditions have “equivalent” means. The two one-sided tests (TOST) procedure, consisting in testing whether the target parameter is respectively significantly greater and lower than some pre-defined lower and upper equivalence limits, is typically used in this context, usually by checking whether the confidence interval for the target parameter lies within these limits. This intuitive and visual procedure is however known to be conservative, especially in the case of highly variable drugs, where it shows a rapid power loss, often reaching zero, hence making it impossible to conclude for equivalence when it is actually true. Here, we propose a finite sample correction of the TOST procedure, the -TOST, which consists in a correction of the significance level of the TOST allowing to guarantee a test size (or type-I error rate) of . This new procedure essentially corresponds to a finite sample and variability correction of the TOST procedure. We show that this procedure is uniformly more powerful than the TOST, easy to compute, and that its operating characteristics outperform the ones of its competitors. A case study about econazole nitrate deposition in porcine skin is used to illustrate the benefits of the proposed method and its advantages compared to other available procedures.Estimation and Uncertainty Quantification of Magma Interaction Times Using Statistical Emulation
Volcanica 7 (2), p. 525-539.
Evolution of volcanic plumbing systems towards eruptions of different styles and sizes largely depends on processes at crustaldepths that are outside our observational capabilities. These processes can be modelled and the outputs of the simulationscan be compared with the chemistry of the erupted products, geophysical and geodetic data to retrieve information on the archi-tecture of the plumbing system and the processes leading to eruption. The interaction between magmas with different physicaland chemical properties often precedes volcanic eruptions. Thus, sophisticated numerical models have been developed thatdescribe in detail the dynamics of interacting magmas, specifically aimed at evaluating pre-eruptive magma mingling and mixingtimescales. However, our ability to explore the parameter space in order to match petrological and geophysical observationsis limited by the extremely high computational costs of these multiphase, multicomponent computational fluid dynamics sim-ulations. To overcome these limitations, we present a statistical emulator that is able to reproduce the numerical simulationsresults providing the temporal evolution of the distribution of magma chemistry as a function of a set of input parameters suchas magma densities and reservoir shapes. The whole rock composition of volcanic rocks is one of the most common measur-able parameter collected for eruptions. The statistical emulator can be used to invert the observed distribution of whole rockchemistry to determine the duration of interaction between magmas preceding an eruption and identify the best matching inputparameters of the numerical model. Importantly, the statistical emulator intrinsically includes error propagation, thus providingconfidence intervals on predicted interaction timescales on the base of the intrinsic uncertainty of the input parameters of thenumerical simulations.The Maternal-Fetal Neurodevelopmental Groundings of Preterm Birth Risk
Heliyon 10 (7).
Background: Altered neurodevelopment is a major clinical sequela of Preterm Birth (PTB) being currently unexplored in-utero. Aim: To study the link between fetal brain functional (FbF) connectivity and preterm birth, using resting-state functional magnetic resonance imaging (rs-fMRI). Study design: Prospective single-centre cohort study. Subjects: A sample of 31 singleton pregnancies at 28–34 weeks assigned to a low PTB risk (LR) (n = 19) or high PTB risk (HR) (n = 12) group based on a) the Maternal Frailty Inventory (MaFra) for PTB risk; b) a case-specific PTB risk gradient. Methods: Fetal brain rs-fMRI was performed on 1.5T MRI scanner. First, directed causal relations representing fetal brain functional connectivity measurements were estimated using the Greedy Equivalence Search (GES) algorithm. HR vs. LR group differences were then tested with a novel ad-hoc developed Monte Carlo permutation test. Second, a MaFra-only random forest (RF) was compared against a MaFra-Neuro RF, trained by including also the most important fetal brain functional connections. Third, correlation and regression analyses were performed between MaFra-Neuro class probabilities and i) the GA at birth; ii) PTB risk gradient, iii) perinatal clinical conditions and iv) PTB below 37 weeks. Results: First, fewer fetal brain functional connections were evident in the HR group. Second, the MaFra-Neuro RF improved PTB risk prediction. Third, MaFra-Neuro class probabilities showed a significant association with: i) GA at birth; ii) PTB risk gradient, iii) perinatal clinical conditions and iv) PTB below 37 weeks. Conclusion: Fetal brain functional connectivity is a novel promising predictor of PTB, linked to maternal risk profiles, ahead of birth, and clinical markers of neurodevelopmental risk, at birth, thus potentially “connecting” different PTB phenotypes.Patient-Perceived Impact of the COVID-19 Pandemic on Medication Adherence and Access to Care for Long-Term Diseases: A Cross-Sectional Online Survey
COVID 4 (2) p. 191-207.
The COVID-19 pandemic has been associated with lifestyle changes, reduced access to care and potential impacts on medication self-management. Our main objectives are to evaluate the impact of the pandemic on patient adherence and access to care and long-term medications and determine its association with sociodemographic and clinical factors. This study is part of the Specchio-COVID-19 longitudinal cohort study in Geneva, Switzerland, conducted through an online questionnaire. Among the 982 participants (median age: 56; 61% female), 827 took long-term medications. There were 76 reported changes in medication dosages, of which 24 (31%) were without a physician’s recommendation, and 51 delays in initiation or premature medication interruptions, of which 24 (47%) were without a physician’s recommendation. Only 1% (9/827) of participants faced medication access issues. Participants taking a respiratory medication had a four-times greater odds of reporting more regular medication (OR = 4.27; CI 95%: 2.11–8.63) intake, whereas each year increase in age was significantly associated with 6% fewer relative risks of discontinuation (OR = 0.94; CI 95%: 0.91–0.97) and 3% fewer relative risks of changes in medication dosage (OR = 0.97; CI 95%: 0.95–1.00). Despite the limited impact of the pandemic on adherence and access to medications, our results emphasize the need for understanding patient challenges when self-managing their long-term medication, notably during public health crises.Dermobile: A cost‐effective Portable Device for Erythema Evaluation
Skin Research and Technology 30 (4).
A common cause of skin erythema is the increased blood flow and angiogenesis associated with skin irritation. Visual assessment done by specialists is the gold standard for evaluating erythema. 1 Nevertheless, visual assessment alone has significant drawbacks since it depends heavily on the observer's experience and skills. Therefore, instrumental‐based solutions were developed to overcome user subjectivity while providing an objective assessment of skin redness. We conducted a prospective in‐vivo, in human clinical trial to investigate the feasibility of using a combination of dermatoscope, adapter, and mobile phone—which we have named Dermobile—as an alternative to the Mexameter MX18 for objective skin erythema evaluation. The study was performed at the Geneva University Hospitals following the approval of the local ethical committee (BASEC2022‐D0083) and the principles of the Declaration of Helsinki.
2023
A Penalized Two-pass Regression to Predict Stock Returns with Time-Varying Risk Premia
Journal of Econometrics, 237 (2).
We develop a penalized two-pass regression with time-varying factor loadings. The penalization in the first pass enforces sparsity for the time-variation drivers while also maintaining compatibility with the no arbitrage restrictions by regularizing appropriate groups of coefficients. The second pass delivers risk premia estimates to predict equity excess returns. Our Monte Carlo results and our empirical results on a large cross-sectional data set of US individual stocks show that penalization without grouping can yield to nearly all estimated time-varying models violating the no arbitrage restrictions. Moreover, our results demonstrate that the proposed method reduces the prediction errors compared to a penalized approach without appropriate grouping or a time-invariant factor model.Multi-Signal Approaches for Repeated Sampling Schemes in Inertial Sensor Calibration
IEEE Transactions on Signal Processing 71, p. 1103-1114.
Inertial sensor calibration plays a progressively important role in many areas of research among which navigation engineering. By performing this task accurately, it is possible to significantly increase general navigation performance by correctly filtering out the deterministic and stochastic measurement errors that characterize such devices. While different techniques are available to model and remove the deterministic errors, there has been considerable research over the past years with respect to modelling the stochastic errors which have complex structures. In order to do the latter, different replicates of these error signals are collected and a model is identified and estimated based on one of these replicates. While this procedure has allowed to improve navigation performance, it has not yet taken advantage of the information coming from all the other replicates collected on the same sensor. However, it has been observed that there is often a change of error behavior between replicates. It appears that the model structure remains the same between replicates but the parameter values vary. Assuming the model structure has been identified, in this work we therefore consider and study the properties of different approaches that allow to combine the information from all replicates considering this phenomenon, confirming their validity both in simulation settings and also when applied to real inertial sensor error signals. By taking into account parameter variation between replicates, this work highlights how these approaches can improve the average navigation precision as well as obtain reliable estimates of the uncertainty of the navigation solution.On Performance Evaluation of Inertial Navigation Systems: The Case of Stochastic Calibration
IEEE Transactions on Instrumentation and Measurement 72, p.1-17.
In this work, we address the problem of rigorously evaluating the performances of an inertial navigation system (INS) during its design phase in presence of multiple alternative choices. We introduce a framework based on Monte Carlo simulations in which a standard extended Kalman filter is coupled with realistic and user-configurable noise generation mechanisms to recover a reference trajectory from noisy measurements. The evaluation of several statistical metrics of the solution, aggregated over hundreds of simulated realizations, provides reasonable estimates of the expected performances of the system in real-world conditions. This framework allows the user to make a choice between alternative setups. To show the generality of our approach, we consider an example application to the problem of stochastic calibration. Two competing stochastic modeling techniques, namely, the widely popular Allan variance linear regression and the emerging generalized method of wavelet moments, are rigorously compared in terms of the framework’s defined metrics and in multiple scenarios. We find that the latter provides substantial advantages for certain classes of inertial sensors. Our framework allows considering a wide range of problems related to the quantification of navigation system performances, such as the robustness of integrated navigation systems [such as INS/global navigation satellite system (GNSS)] with respect to outliers or other modeling imperfections. While real-world experiments are essential to assess to performance of new methods, they tend to be costly and are typically unable to lead to a sufficient number of replicates to provide suitable estimates of, for example, the correctness of the estimated uncertainty. Therefore, our method can contribute to bridging the gap between these experiments and pure statistical consideration as usually found in the stochastic calibration literature.The Generalized Method of Wavelet Moments with eXogenous inputs: a fast approach for the analysis of GNSS position time series
Journal of Geodesy 97 (2).
The global navigation satellite system (GNSS) daily position time series are often described as the sum of stochastic processes and geophysical signals which allow to study global and local geodynamical effects such as plate tectonics, earthquakes, or ground water variations. In this work, we propose to extend the Generalized Method of Wavelet Moments (GMWM) to estimate the parameters of linear models with correlated residuals. This statistical inferential framework is applied to GNSS daily position time-series data to jointly estimate functional (geophysical) as well as stochastic noise models. Our method is called GMWMX, with X standing for eXogenous variables: it is semi-parametric, computationally efficient and scalable. Unlike standard methods such as the widely used maximum likelihood estimator (MLE), our methodology offers statistical guarantees, such as consistency and asymptotic normality, without relying on strong parametric assumptions. At the Gaussian model, our results (theoretical and obtained in simulations) show that the estimated parameters are similar to the ones obtained with the MLE. The computational performances of our approach have important practical implications. Indeed, the estimation of the parameters of large networks of thousands of GNSS stations (some of them being recorded over several decades) quickly becomes computationally prohibitive. Compared to standard likelihood-based methods, the GMWMX has a considerably reduced algorithmic complexity of order {log(??)??} for a time series of length n. Thus, the GMWMX appears to provide a reduction in processing time of a factor of 10–1000 compared to likelihood-based methods depending on the considered stochastic model, the length of the time series and the amount of missing data. As a consequence, the proposed method allows the estimation of large-scale problems within minutes on a standard computer. We validate the performances of our method via Monte Carlo simulations by generating GNSS daily position time series with missing observations and we consider composite stochastic noise models including processes presenting long-range dependence such as power law or Matérn processes. The advantages of our method are also illustrated using real time series from GNSS stations located in the Eastern part of the USA.Generation of Potent Antibacterial Compounds through Enzymatic and Chemical Modifications of the trans-δ-viniferin Scaffold
Scientific Reports, 13 (1).
Stilbene dimers are well-known for their diverse biological activities. In particular, previous studies have demonstrated the high antibacterial potential of a series of trans-δ-viniferin-related compounds against gram-positive bacteria such as Staphylococcus aureus. The trans-δ-viniferin scaffold has multiple chemical functions and can therefore be modified in various ways to generate derivatives. Here we report the synthesis of 40 derivatives obtained by light isomerization, O-methylation, halogenation and dimerization of other stilbene monomers. The antibacterial activities of all generated trans-δ-viniferin derivatives were evaluated against S. aureus and information on their structure–activity relationships (SAR) was obtained using a linear regression model. Our results show how several parameters, such as the O-methylation pattern and the presence of halogen atoms at specific positions, can determine the antibacterial activity. Taken together, these results can serve as a starting point for further SAR investigations.The Action of Physiological and Synthetic Steroids on the Calcium Channel CatSper in Human Sperm
Frontiers in Cell and Developmental Biology, 11.
The sperm-specific Ca2+ channel CatSper (cation channel of sperm) controls intracellular Ca2+ and plays an essential role in sperm function. It is also promiscuously activated by a wide range of synthetic and natural compounds. To investigate the effect of pharmaceutical compounds on CatSper and sperm function, we developed a high-throughput-screening (HTS) assay to measure changes in intracellular calcium concentration ([Ca2+]I) in human sperm and screened 1,280 approved and off-patent drugs from the Prestwick chemical library. Of the 90 steroids tested, more than half (48/90, 53%) were able to both induce an increase in [Ca2+]I and reduce the progesterone (P4)-induced Ca2+ influx in human sperm in a dose-dependent manner. Among a selection of 10 steroids with potent activating and inhibiting effects on P4-induced CatSper activation we found that P4, pregnenolone, dydrogesterone, epiandrosterone, nandrolone and DHEA activated CatSper at physiological concentrations. Stanozolol, epiandrosterone and pregnenolone induced an acrosomal response (AR), while stanozolol and estropipate promoted sperm penetration into viscous media. Structure-activity relationship (SAR) analysis revealed the structural features of steroids capable of activating CatSper. This allowed the identification of new commercially available steroids with efficacy close to that of P4. Overall, our results indicate that the majority of natural or synthetic steroids modulate human CatSper with varying potency and share the same binding site as P4. They may have an off-target effect on CatSper in human sperm, which could interfere with the fertilization process.Influence of Molecular Structure and Physicochemical Properties of Immunosuppressive Drugs on Micelle Formulation Characteristics and Cutaneous Delivery
Pharmaceutics 15 (4), p.1278.
The aim of this study was to investigate whether subtle differences in molecular properties affected polymeric micelle characteristics and their ability to deliver poorly water-soluble drugs into the skin. D-α-tocopherol-polyethylene glycol 1000 was used to prepare micelles containing ascomycin-derived immunosuppressants—sirolimus (SIR), pimecrolimus (PIM) and tacrolimus (TAC)—which have similar structures and physicochemical properties and have dermatological applications. Micelle formulations were prepared by thin-film hydration and extensively characterized. Cutaneous delivery and biodistribution were determined and compared. Sub-10 nm micelles were obtained for the three immunosuppressants with incorporation efficiencies >85%. However, differences were observed for drug loading, stability (at the highest concentration), and their in vitro release kinetics. These were attributed to differences in drug aqueous solubility and lipophilicity. Differences between the cutaneous biodistribution profiles and drug deposition in the different skin compartments pointed to the impact of differences in thermodynamic activity. Therefore, despite their structural similarities, SIR, TAC and PIM did not demonstrate the same behaviour either in the micelles or when applied to the skin. These outcomes indicate that polymeric micelles should be optimized even for closely related drug molecules and support the hypothesis that drugs are released from micelles prior to skin penetration.Platform Combining Statistical Modeling and Patient-Derived Organoids to Facilitate Personalized Treatment of Colorectal Carcinoma
Journal of Experimental & Clinical Cancer Research 42 (1).
This study presents a novel approach for designing personalized treatment for colorectal cancer (CRC) patients. The approach combines ex vivo organoid efficacy testing with mathematical modeling of the results. The study utilized a validated phenotypic approach called Therapeutically Guided Multidrug Optimization (TGMO) to identify optimized drug combinations (ODC) that showed low-dose synergistic effects in 3D human CRC models. The ODCs were validated using patient-derived organoids (PDO) from both primary and metastatic CRC cases. Molecular characterization of the CRC material was performed using whole-exome sequencing and RNAseq. In PDO from patients with liver metastases, the identified ODCs demonstrated significant inhibition of cell viability, outperforming the standard CRC chemotherapy (FOLFOXIRI) administered at clinical doses. Additionally, patient-specific ODCs based on TGMO showed superior efficacy compared to the current chemotherapy standard of care. This approach enables the optimization of synergistic multi-drug combinations tailored to individual patients within a clinically relevant timeframe.Dynamics of Career Intentions in a Medical Student Cohort: a Four-year Longitudinal Study
BMC Medical Education 23 (1), p.1-11.
This study examines the stability of medical students' career intentions over a four-year period and investigates the associations between unstable career intentions and students' characteristics. Two cohorts of medical students were surveyed annually from the end of their pre-clinical curriculum to graduation. The survey included measures of career intention, personality, coping strategies, empathy, and motives for becoming a physician. A score ranging from 0 to 10 was developed to quantify the instability of career intentions. The results showed that most students fell on a continuum between being firmly committed and undecided. Only a small proportion of students did not change their specialty intention over the four years, while another group changed every year. The study identified that an intention to work in private practice in year 3 and the motive to care for patients were associated with more stable career intentions. The findings suggest that external factors may play a significant role in career decision-making and highlight the need for further research and support to assist students in making informed career choices.
2022
Robust Two-step Wavelet-based Inference for Time Series Models
Journal of the American Statistical Association, 117(540), 1996-2013.
Latent time series models such as (the independent sum of) ARMA(p, q) models with additional stochastic processes are increasingly used for data analysis in biology, ecology, engineering, and economics. Inference on and/or prediction from these models can be highly challenging: (i) the data may contain outliers that can adversely affect the estimation procedure; (ii) the computational complexity can become prohibitive when the time series are extremely large; (iii) model selection adds another layer of (computational) complexity; and (iv) solutions that address (i), (ii), and (iii) simultaneously do not exist in practice. This paper aims at jointly addressing these challenges by proposing a general framework for robust two-step estimation based on a bounded influence M-estimator of the wavelet variance. We first develop the conditions for the joint asymptotic normality of the latter estimator thereby providing the necessary tools to perform (direct) inference for scale-based analysis of signals. Taking advantage of the model-independent weights of this first-step estimator, we then develop the asymptotic properties of two-step robust estimators using the framework of the generalized method of wavelet moments (GMWM). Simulation studies illustrate the good finite sample performance of the robust GMWM estimator and applied examples highlight the practical relevance of the proposed approach.Scale-wise Variance Minimization for Optimal Virtual Signals
IEEE Transactions on Signal Processing, 70, 5320-5333.
The increased use of low-cost gyroscopes within inertial sensors for navigation purposes, among others, has brought to the development of a considerable amount of research in improving their measurement precision. Aside from developing methods that allow to model and account for the deterministic and stochastic components that contribute to the measurement errors of these devices, an approach that has been put forward in recent years is to make use of arrays of such sensors in order to combine their measurements thereby reducing the impact of individual sensor noise. Nevertheless combining these measurements is not straightforward given the complex stochastic nature of these errors and, although some solutions have been suggested, these are limited to certain specific settings which do not allow to achieve solutions in more general and common circumstances. Hence, in this work we put forward a non-parametric method that makes use of the wavelet cross-covariance at different scales to combine the measurements coming from an array of gyroscopes in order to deliver an optimal measurement signal without needing any assumption on the processes underlying the individual error signals. We also study an appropriate non-parametric approach for the estimation of the asymptotic covariance matrix of the wavelet cross-covariance estimator which has important applications beyond the scope of this work. The theoretical properties of the proposed approach are studied and are supported by simulations and real applications, indicating that this method represents an appropriate and general tool for the construction of optimal virtual signals that are particularly relevant for arrays of gyroscopes. Moreover, our results can support the creation of optimal signals for other types of inertial sensors other than gyroscopes as well as for redundant measurements in other domains other than navigation.Dexamethasone Exposure in Normal-weight and Obese Hospitalized COVID-19 Patients: An Observational Exploratory Trial
Clinical and Translational Science, 15(7), 1796-1804.
During the latest pandemic, the RECOVERY study showed the benefits of dexamethasone (DEX) use in COVID-19 patients. Obesity has been proven to be an independent risk factor for severe forms of infection, but little information is available in the literature regarding DEX dose adjustment according to body weight. We conducted a prospective, observational, exploratory study at Geneva University Hospitals to assess the impact of weight on DEX pharmacokinetics (PK) in normal-weight versus obese COVID-19 hospitalized patients.Students’ intentions to practice primary care are associated with their motives to become doctors: a longitudinal study
BMC Medical Education 22 (1), p.1-10.
Background: Medical schools can contribute to the insufficient primary care physician workforce by influencing students’ career preferences. Primary care career choice evolves between matriculation and graduation and is influenced by several individual and contextual factors. This study explored the longitudinal dynamics of primary care career intentions and the association of students’ motives for becoming doctors with these intentions in a cohort of undergraduate medical students followed over a four-year period. Methods: The sample consisted of medical students from two classes recruited into a cohort study during their first academic year, and who completed a yearly survey over a four-year period from their third (end of pre-clinical curriculum) to their sixth (before graduation) academic year. Main outcome measures were students’ motives for becoming doctors (ten motives rated on a 6-point scale) and career intentions (categorized into primary care, non-primary care, and undecided). Population-level flows of career intentions were investigated descriptively. Changes in the rating of motives over time were analyzed using Wilcoxon tests. Two generalized linear mixed models were used to estimate which motives were associated with primary care career intentions. Results: The sample included 217 students (60% females). Career intentions mainly evolved during clinical training, with smaller changes at the end of pre-clinical training. The proportion of students intending to practice primary care increased over time from 12.8% (year 3) to 24% (year 6). Caring for patients was the most highly rated motive for becoming a doctor. The importance of the motives cure diseases, saving lives, and vocation decreased over time. Primary care career intentions were positively associated with the motives altruism and private practice, and negatively associated with the motives prestige, academic interest and cure diseases. Conclusion: Our study indicates that career intentions are not fixed and change mainly during clinical training, supporting the influence of clinical experiences on career-related choices. The impact of students’ motives on primary care career choice suggests strategies to increase the attractivity of this career, such as reinforcing students’ altruistic values and increasing the academic recognition of primary care.Evidence of Antagonistic Predictive Effects of miRNAs in Breast Cancer Cohorts Through Data-Driven Networks
Scientific reports, 12(5166), p.1-16.
Non-coding micro RNAs (miRNAs) dysregulation seems to play an important role in the pathways involved in breast cancer occurrence and progression. In different studies, opposite functions may be assigned to the same miRNA, either promoting the disease or protecting from it. Our research tackles the following issues: (i) why aren’t there any concordant findings in many research studies regarding the role of miRNAs in the progression of breast cancer? (ii) could a miRNA have either an activating effect or an inhibiting one in cancer progression according to the other miRNAs with which it interacts? For this purpose, we analyse the AHUS dataset made available on the ArrayExpress platform by Haakensen et al. The breast tissue specimens were collected over 7 years between 2003 and 2009. miRNA-expression profiling was obtained for 55 invasive carcinomas and 70 normal breast tissue samples. Our statistical analysis is based on a recently developed model and feature selection technique which, instead of selecting a single model (i.e. a unique combination of miRNAs), delivers a set of models with equivalent predictive capabilities that allows to interpret and visualize the interaction of these features. As a result, we discover a set of 112 indistinguishable models (in a predictive sense) each with 4 or 5 miRNAs. Within this set, by comparing the model coefficients, we are able to identify three classes of miRNA: (i) oncogenic miRNAs; (ii) protective miRNAs; (iii) undefined miRNAs which can play both an oncogenic and a protective role according to the network with which they interact. These results shed new light on the biological action of miRNAs in breast cancer and may contribute to explain why, in some cases, different studies attribute opposite functions to the same miRNA.The Variance Inflation Factor to Account for Correlations in Likelihood Ratio Tests: Deformation Analysis with Terrestrial Laser Scanners
Journal of Geodesy, 96(11), 1-18.
The measurement noise of a terrestrial laser scanner (TLS) is correlated. Neglecting those correlations affects the dispersion of the parameters when the TLS point clouds are mathematically modelled: statistical tests for the detection of outliers or deformation become misleading. The account for correlations is, thus, mandatory to avoid unfavourable decisions. Unfortunately, fully populated variance covariance matrices (VCM) are often associated with computational burden. To face that challenge, one answer is to rescale a diagonal VCM with a simple und physically justifiable variance inflation factor (VIF). Originally developed for a short-range correlation model, we extend the VIF to account for long-range dependence coming from, for example, atmospheric turbulent effects. The validation of the VIF is performed for the congruency test for deformation with Monte Carlo simulations. Our real application uses data from a bridge under load.
2021
Granger-Causal Testing for Irregularly Sampled Time Series with Application to Nitrogen Signaling in Arabidopsis
Bioinformatics, 7(16), p.2450-2460.
Identification of system-wide causal relationships can contribute to our understanding of long-distance, intercellular signalling in biological organisms. Dynamic transcriptome analysis holds great potential to uncover coordinated biological processes between organs. However, many existing dynamic transcriptome studies are characterized by sparse and often unevenly spaced time points that make the identification of causal relationships across organs analytically challenging. Application of existing statistical models, designed for regular time series with abundant time points, to sparse data may fail to reveal biologically significant, causal relationships. With increasing research interest in biological time series data, there is a need for new statistical methods that are able to determine causality within and between time series data sets. Here, a statistical framework was developed to identify (Granger) causal gene-gene relationships of unevenly spaced, multivariate time series data from two different tissues of Arabidopsis thaliana in response to a nitrogen signal.A Two-sample Nonparametric Test for Circular Data–its Exact Distribution and Performance
Sankhya B, 83, p.140-166.
A nonparametric test labelled ‘Rao Spacing-frequencies test’ is explored and developed for testing whether two circular samples come from the same population. Its exact distribution and performance relative to comparable tests such as the Wheeler-Watson test and the Dixon test in small samples, are discussed. Although this test statistic is shown to be asymptotically normal, as one would expect, this large sample distribution does not provide satisfactory approximations for small to moderate samples. Exact critical values for small samples are obtained and tables provided here, using combinatorial techniques, and asymptotic critical regions are assessed against these. For moderate sample sizes in-between i.e. when the samples are too large making combinatorial techniques computationally prohibitive but yet asymptotic regions do not provide a good approximation, we provide a simple Monte Carlo procedure that gives very accurate critical values. As is well-known, the large number of usual rank-based tests are not applicable in the context of circular data since the values of such ranks depend on the arbitrary choice of origin and the sense of rotation used (clockwise or anti-clockwise). Tests that are invariant under the group of rotations, depend on the data through the so-called ‘spacing frequencies’, the frequencies of one sample that fall in between the spacings (or gaps) made by the other. The Wheeler-Watson, Dixon, and the proposed Rao tests are of this form and are explicitly useful for circular data, but they also have the added advantage of being valid and useful for comparing any two samples on the real line. Our study and simulations establish the ‘Rao spacing-frequencies test’ as a desirable, and indeed preferable test in a wide variety of contexts for comparing two circular samples, and as a viable competitor even for data on the real line. Computational help for implementing any of these tests, is made available online “TwoCircles” R package and is part of this paper.Polymeric Micelle Formulations for the Cutaneous Delivery of Sirolimus: A New Approach for the Treatment of Facial Angiofibromas in Tuberous Sclerosis Complex
International Journal of Pharmaceutics, 604, p.1-13.
Facial angiofibromas are benign tumors characteristic of tuberous sclerosis complex. The disease involves the mTOR pathway and the cutaneous manifestation responds to topical treatment with sirolimus (SIR). However, there are no approved topical SIR products and extemporaneous formulations have been sub-optimal. The aims of this study were (i) to develop aqueous formulations of SIR loaded in polymeric micelles prepared using D-α-tocopherol polyethylene glycol 1000 succinate (TPGS) and (ii) to use the cutaneous biodistribution method, in conjunction with a new statistical approach, to investigate the feasibility of SIR delivery to the viable epidermis. Optimized micelle solutions and hydrogels (0.2%) were developed and stable at 4 °C for at least 6 and 3 months, respectively. Cutaneous delivery experiments (infinite and finite dose) using porcine skin demonstrated that both formulations increased SIR cutaneous bioavailability as compared to the control (ointment 0.2%). Moreover, studies with the micellar hydrogel 0.2% demonstrated SIR deposition in the viable epidermis with no transdermal permeation. These encouraging results confirmed that polymeric micelles enabled development of aqueous SIR formulations capable of targeted epidermal delivery. Furthermore, the cutaneous biodistribution provided a detailed insight into drug bioavailability in the different skin compartments that could complement/explain clinical observations of formulation efficacy.Empirical Predictive Modeling Approach to Quantifying Social Vulnerability to Natural Hazards
Annals of the American Association of Geographers, 111(5), p.1559-1583.
Conventionally, natural hazard scholars quantify social vulnerability based on social indicators to manifest the extent to which locational communities are susceptible to adverse impacts of natural hazard events and are prone to limited or delayed recoveries. They usually overlook the different geographical distributions of social vulnerability at different hazard intensities and in distinct response and recovery phases, however. In addition, conventional approaches to quantifying social vulnerability usually establish the relationship between social indicators and social vulnerability with little evidence from empirical data science. In this article, we introduce a general framework of a predictive modeling approach to quantifying social vulnerability given intensity during a response or recovery phase. We establish the relationship between social indicators and social vulnerability with an empirical statistical method and historical data on hazard effects. The new metric of social vulnerability given an intensity measure can be coupled with hazard maps for risk analysis to predict adverse impacts or poor recoveries associated with future natural hazard events. An example based on data on casualties, house damages, and peak ground accelerations of the 2015 Gorkha earthquake in Nepal and pre-event social indicators at the district level shows that the proposed approach can be applied for vulnerability quantification and risk analysis in terms of specific hazard impacts.Non Applicability of Validated Predictive Models for Intensive Care Admission and Death of COVID-19 Patients in a Secondary Care Hospital in Belgium
Journal of Emergency and Critical Care Medicine, 5(22), p.1-13.
Background: Simple and reliable predictive scores for intensive care admissions and death based on clinical data in coronavirus disease 2019 (COVID-19) patients are numerous but may be misleading. Predictive scores for admission to intensive care unit (ICU) or death based on clinical and easily affordable laboratory data are still needed in secondary hospital and hospitals in developing countries that do not have high-performance laboratories. Methods: The goal of this study is to verify that a recently published predictive score conducted on a large scale in China (Liang score) can be used on patients coming from a Belgian population catchment area. Monocentric retrospective cohort study of 66 patients with known COVID-19 disease run from early March to end of May in Clinique Saint-Pierre Ottignies, a secondary care hospital in Belgium. The outcomes of the study are (I) admission in the ICU and (II) death. All patients admitted in the Emergency Department with a positive RT-PCR SARS-CoV-2 test were included in the study. Routine clinical and laboratory data were collected at their admission and during their stay, as well as chest X-rays and CT-scans. Liang score was used as benchmark. Logistic regression models were used to develop predictive. Results: Liang score performs poorly, both in terms of admission to intensive care and in terms of death. In our cohort, it appears that lactate dehydrogenase (LDH) above 579 UI/L and venous lactate above 3.02 mmol/L may be considered as good predictive biological factors for ICU admission. With regards to death risk, neutrophil-lymphocyte ratio (NLR) above 22.1, tobacco abuse status and respiratory impairment appears to be relevant predictive factors. Conclusions: Firstly, a promising score from a large-scale study in China appears to perform poorly when applied to a European cohort, whether to predict for admission to ICU or death. Secondly, biological features that are quite significant for the admission to ICU such as LDH or venous lactate cannot predict death. Thirdly, simple and affordable variables such as LDH, LDH + sex, or LDH + sex + venous lactate have a very good sensitivity and an acceptable specificity for ICU admission.
2020
Wavelet-Based Moment-Matching Techniques for Inertial Sensor Calibration
IEEE Transactions on Instrumentation & Measurement, 69(10), p.7542-7551.
The task of inertial sensor calibration has required the development of various techniques to take into account the sources of measurement error coming from such devices. The calibration of the stochastic errors of these sensors has been the focus of increasing amount of research in which the method of reference has been the so-called “Allan variance (AV) slope method” which, in addition to not having appropriate statistical properties, requires a subjective input which makes it prone to mistakes. To overcome this, recent research has started proposing “automatic” approaches where the parameters of the probabilistic models underlying the error signals are estimated by matching functions of the AV or wavelet variance with their model-implied counterparts. However, given the increased use of such techniques, there has been no study or clear direction for practitioners on which approach is optimal for the purpose of sensor calibration. This article, for the first time, formally defines the class of estimators based on this technique and puts forward theoretical and applied results that, comparing with estimators in this class, suggest the use of the Generalized method of Wavelet moments (GMWM) as an optimal choice. In addition to analytical proofs, experiment-driven Monte Carlo simulations demonstrated the superior performance of this estimator. Further analysis of the error signal from a gyroscope was also provided to further motivate performing such analyses, as real-world observed error signals may show significant deviation from manufacturer-provided error models.Generalized Additive Models: An Efficient Method for Short-Term Energy Prediction in Office Buildings
Energy, 213, p.118834.
In 2018, commercial buildings accounted for nearly 18.2% of the total energy consumption in the USA, making it a significant contributor to the greenhouse gases emissions (see, e.g. [1]). Specifically, office buildings accounted for 14% of the energy usage by the commercial sector. Hence, their energy performance has to be closely monitored and evaluated to address the critical issues of greenhouse gases emissions. Several data-driven statistical and machine learning models have been developed to assess the energy performance of office buildings based on historical data. While these methods often provide reliable prediction accuracy, they typically offer little interpretation of the relationships between variables and their impacts on energy consumption. Moreover, model interpretability is essential to understand, control and manage the variables affecting the energy consumption and therefore, such a feature is crucial and should be emphasized in the modeling procedure in order to obtain reliable and actionable results. For this reason, we use generalized additive models as a flexible, efficient and interpretable alternative to existing approaches in modeling and predicting the energy consumption in office buildings. To demonstrate the advantages of this approach, we consider an application to energy consumption data of HVAC systems in a mixed-use multi-tenant office building in Chicago, Illinois, USA. We present the building characteristics and various influential variables, based on which we construct a generalized additive model. We compare the prediction performance using various commonly used calibration metrics between the proposed model and existing methods, including support vector machine as well as classification and regression tree. We find that the proposed method outperforms the existing approaches, especially in terms of short term prediction.Worldwide Predictions of Earthquake Casualty Rates with Seismic Intensity Measure and Socioeconomic Data: a Fragility-based Formulation
Natural Hazards Review, 21(2), p.1-40.
This paper presents a fragility-based Bayesian formulation to predict earthquake casualty rates for countries worldwide. The earthquake casualty rate of a community is defined as the probability that a person in the community is killed or injured given an intensity measure of the earthquake at the site of the community. Casualty data of 902 earthquakes worldwide from 2013 to 2017, information on population distributions, and the national socioeconomic data are used to calibrate the model. A model based on data from 2013 to 2016 is used to predict casualty rates of earthquakes in 2017. The comparisons of the model predictions with the actual observations show good agreement. With the fragility-based formulation, the proposed model can be fully coupled with seismic hazard maps for risk analysis. An example is shown in this paper to apply the model calibrated with the full data set with reference to a worldwide seismic hazard map to conduct a fully coupled seismic risk analysis and predict the expected casualty rates and counts due to earthquakes in future years for countries worldwide.Targeting Hallmarks of Cancer with a Food-system–based Approach
Nutrition, 69 (110563), p.1-23.
Although extensive resources are dedicated to the development and study of cancer drugs, the cancer burden is expected to rise by about 70% over the next 2 decade. This highlights a critical need to develop effective, evidence-based strategies for countering the global rise in cancer incidence. Except in high-risk populations, cancer drugs are not generally suitable for use in cancer prevention owing to potential side effects and substantial monetary costs (Sporn, 2011). There is overwhelming epidemiological and experimental evidence that the dietary bioactive compounds found in whole plant-based foods have significant anticancer and chemopreventative properties. These bioactive compounds often exert pleiotropic effects and act synergistically to simultaneously target multiple pathways of cancer. Common bioactive compounds in fruits and vegetables include carotenoids, glucosinolates, and polyphenols. These compounds have been shown to target multiple hallmarks of cancer in vitro and in vivo and potentially to address the diversity and heterogeneity of certain cancers. Although many studies have been conducted over the past 30 y, the scientific community has still not reached a consensus on exactly how the benefit of bioactive compounds in fruits and vegetables can be best harnessed to help reduce the risk for cancer. Different stages of the food processing system, from "farm-to-fork," can affect the retention of bioactive compounds and thus the chemopreventative properties of whole foods, and there are opportunities to improve handling of foods throughout the stages in order to best retain their chemopreventative properties. Potential target stages include, but are not limited to, pre- and postharvest management, storage, processing, and consumer practices. Therefore, there is a need for a comprehensive food-system-based approach that not only taking into account the effects of the food system on anticancer activity of whole foods, but also exploring solutions for consumers, policymakers, processors, and producers. Improved knowledge about this area of the food system can help us adjust farm-to-fork operations in order to consistently and predictably deliver desired bioactive compounds, thus better utilizing them as invaluable chemopreventative tools in the fight to reduce the growing burden of cancer worldwide.
2019
Simulation-based Bias Correction Methods for Complex Models
Journal of the American Statistical Association, 114(525), p.146-157.
Along with the ever increasing data size and model complexity, an important challenge frequently encountered in constructing new estimators or in implementing a classical one such as the maximum likelihood estimator, is the computational aspect of the estimation procedure. To carry out estimation, approximate methods such as pseudo-likelihood functions or approximated estimating equations are increasingly used in practice as these methods are typically easier to implement numerically although they can lead to inconsistent and/or biased estimators. In this context, we extend and provide refinements on the known bias correction properties of two simulation-based methods, respectively, indirect inference and bootstrap, each with two alternatives. These results allow one to build a framework defining simulation-based estimators that can be implemented for complex models. Indeed, based on a biased or even inconsistent estimator, several simulation-based methods can be used to define new estimators that are both consistent and with reduced finite sample bias. This framework includes the classical method of the indirect inference for bias correction without requiring specification of an auxiliary model. We demonstrate the equivalence between one version of the indirect inference and the iterative bootstrap, both correct sample biases up to the order n− 3. The iterative method can be thought of as a computationally efficient algorithm to solve the optimization problem of the indirect inference. Our results provide different tools to correct the asymptotic as well as finite sample biases of estimators and give insight on which method should be applied for the problem at hand. The usefulness of the proposed approach is illustrated with the estimation of robust income distributions and generalized linear latent variable models. Supplementary materials for this article are available online.Multivariate Signal Modeling with Applications to Inertial Sensor Calibration
IEEE Transactions on Signal Processing, 67(19), p.5143-5152.
The common approach to inertial sensor calibration has been to model the stochastic error signals of individual sensors independently, whether as components of a single inertial measurement unit (IMU) in different directions or arrayed in the same direction for redundancy. For this purpose, research in this domain has been focused on the proposal of various methods to improve the estimation of these models both from a computational and a statistical point of view. However, the separate calibration of the individual sensors is unable to take into account the dependence between each of them which can have an important impact on the precision of the navigation systems. In this paper, we develop a new approach to simultaneously model the individual signals and the dependence between them by studying the quantity called Wavelet Cross-Covariance and using it to extend the application of the Generalized Method of Wavelet Moments. This new method can be used in other settings for time series modeling, especially in cases where the dependence among signals may be hard to detect. Moreover, in the field of inertial sensor calibration, this approach can deliver important contributions among which the possibility to test dependence between sensors, integrate their dependence within the navigation filter and construct an optimal virtual sensor that can be used to simplify and improve navigation accuracy. The advantages of this method and its usefulness for inertial sensor calibration are highlighted through a simulation study and an applied example with a small array of XSens MTi-G IMUs.A Multisignal Wavelet Variance-based Framework for Inertial Sensor Stochastic Error Modeling
IEEE Transactions on Instrumentation and Measurement, 68(12), p.4924-4936.
The calibration of low-cost inertial sensors has become increasingly important over the last couple of decades, especially when dealing with sensor stochastic errors. This procedure is commonly performed on a single error measurement from an inertial sensor taken over a certain amount of time, although it is extremely frequent for different replicates to be taken for the same sensor, thereby delivering important information which is often left unused. In order to address the latter problem, this paper presents a general wavelet variance-based framework for multisignal inertial sensor calibration, which can improve the modeling and model selection procedures of sensor stochastic errors using all replicates from a calibration procedure and allows to understand the properties, such as stationarity, of these stochastic errors. The applications using microelectromechanical system inertial measurement units confirm the importance of this new framework, and a new graphical user interface makes these tools available to the general user. The latter is developed based on an R package called mgmwm and allows the user to select a type of sensor for which different replicates are available and to easily make use of the approaches presented in this paper in order to carry out the appropriate calibration procedure.Predicting Fatality Rates due to Earthquakes Accounting for Community Vulnerability
Earthquake spectra, 35(2), p.513-536.
The existing prediction models for earthquake fatalities usually require a detailed building inventory that might not be readily available. In addition, existing models tend to overlook the socioeconomic characteristics of communities of interest as well as zero-fatality data points. This paper presents a methodology that develops a probabilistic zero-inflated beta regression model to predict earthquake fatality rates given the geographic distributions of earthquake intensities with data reflecting community vulnerability. As an illustration, the prediction model is calibrated using fatality data from 61 earthquakes affecting Taiwan from 1999 to 2016, as well as information on the socioeconomic and environmental characteristics of the affected communities. Using a local seismic hazard map, the calibrated prediction model is used in a seismic risk analysis for Taiwan that predicts the expected fatality rates and counts caused by earthquakes in future years.
2018
Parametric Inference for Index Functionals
Econometrics, invited paper for the special issue Econometrics and Income Inequality, 6(2), 22.
In this paper, we study the finite sample accuracy of confidence intervals for index functional built via parametric bootstrap, in the case of inequality indices. To estimate the parameters of the assumed parametric data generating distribution, we propose a Generalized Method of Moment estimator that targets the quantity of interest, namely the considered inequality index. Its primary advantage is that the scale parameter does not need to be estimated to perform parametric bootstrap, since inequality measures are scale invariant. The very good finite sample coverages that are found in a simulation study suggest that this feature provides an advantage over the parametric bootstrap using the maximum likelihood estimator. We also find that overall, a parametric bootstrap provides more accurate inference than its non or semi-parametric counterparts, especially for heavy tailed income distributions.Use of a New Online Calibration Platform with Applications to Inertial Sensors
IEEE Aerospace and Electronic Systems Magazine, 33(8), p.30-36.
In many fields, going from economics to physics, it is common to deal with measurements that are taken in time. These measurements are often explained by known external factors that describe a large part of their behavior. For example, the evolution of the unemployment rate in time can be explained by the behavior of the gross domestic product (the external factor in this case). However, in many cases the external factors are not enough to explain the entire behavior of the measurements, and it is necessary to use so-called stochastic models (or probabilistic models) that describe how the measurements are dependent on each other through time (i.e., the measurements are explained by the behavior of the previous measurements themselves). The treatment and analysis of the latter kind of behavior is known by various names, such as timeseries analysis or signal processing. In the majority of cases, the goal of this analysis is to estimate the parameters of the underlying models which, in some sense, explain how and to what extent the observations depend on each other through time.Is Nonmetastatic Cutaneous Melanoma Predictable through Genomic Biomarkers?
Melanoma Research, 28(1), p.21-29.
Cutaneous melanoma is a highly aggressive skin cancer whose treatment and prognosis are critically affected by the presence of metastasis. In this study, we address the following issue: which gene transcripts and what kind of interactions between them can allow to predict nonmetastatic from metastatic melanomas with a high level of accuracy? We carry out a meta-analysis on the first gene expression set of the Leeds melanoma cohort, as made available online on 11 May 2016 through the ArrayExpress platform with MicroArray Gene Expression number 4725. According to the authors, primary melanoma mRNA expression was measured in 204 tumours using an illumina DASL HT12 4 whole-genome array. The tumour transcripts were selected through a recently proposed predictive-based regression algorithm for gene-network selection. A set of 64 equivalent models, each including only two gene transcripts, were each sufficient to accurately classify primary tumours into metastatic and nonmetastatic melanomas. The sensitivity and specificity of the genomic-based models were, respectively, 4% (95% confidence interval: 0.11-21.95%) and 99% (95% confidence interval: 96.96-99.99%). The very high specificity coupled with a significantly large positive likelihood ratio leads to a conclusive increase in the likelihood of disease when these biomarkers are present in the primary tumour. In conjunction with other highly sensitive methods, this approach can aspire to be part of the future standard diagnosis methods for the screening of metastatic cutaneous melanoma. The small dimension of the selected transcripts models enables easy handling of large-scale genomic testing procedures. Moreover, some of the selected transcripts have an understandable link with what is known about cutaneous melanoma oncogenesis, opening a window on the molecular pathways underlying the metastatic process of this disease.A Computationally Efficient Framework for Automatic Inertial Sensor Calibration
IEEE Sensors Journal, 18(4), p.1636-1646.
The calibration of (low-cost) inertial sensors has become increasingly important over the past years, since their use has grown exponentially in many applications going from unmanned aerial vehicle navigation to 3-D animation. However, this calibration procedure is often quite problematic since, aside from compensating for deterministic measurement errors due to physical phenomena such as dynamics or temperature, the stochastic signals issued from these sensors in static settings have a complex spectral structure and the methods available to estimate the parameters of these models are either unstable, computationally intensive, and/or statistically inconsistent. This paper presents a new software platform for calibration of the stochastic component in inertial sensor measurement errors based on the generalized method of wavelet moments, which provides a computationally efficient, flexible, user-friendly, and statistically sound tool to estimate and select from a wide range of complex models. In addition, all this is possible also in a robust framework allowing to perform sensor calibration when the data are affected by outliers. The software is developed within the open-source statistical software R and is based on C++ language allowing it to achieve high computational performance.
2017
A Study of the Allan Variance for Constant-Mean Nonstationary Processes
IEEE Signal Processing Letters, 24(8), p.1257-1260.
The Allan variance (AV) is a widely used quantity in areas focusing on error measurement as well as in the general analysis of variance for autocorrelated processes in domains such as engineering and, more specifically, metrology. The form of this quantity is widely used to detect noise patterns and indications of stability within signals. However, the properties of this quantity are not known for commonly occurring processes whose covariance structure is nonstationary and, in these cases, an erroneous interpretation of the AV could lead to misleading conclusions. This letter generalizes the theoretical form of the AV to some nonstationary processes while at the same time being valid also for weakly stationary processes. Some simulation examples show how this new form can help to understand the processes for which the AV is able to distinguish these from the stationary cases and hence allow for a better interpretation of this quantity in applied cases.
2016
A Predictive based Regression Algorithm for Gene Network Selection
Frontiers in Genetics, Statistical Genetics and Methodology, 7(97), p.1-11.
Gene selection has become a common task in most gene expression studies. The objective of such research is often to identify the smallest possible set of genes that can still achieve good predictive performance. To do so, many of the recently proposed classification methods require some form of dimension-reduction of the problem which finally provide a single model as an output and, in most cases, rely on the likelihood function in order to achieve variable selection. We propose a new prediction-based objective function that can be tailored to the requirements of practitioners and can be used to assess and interpret a given problem. Based on cross-validation techniques and the idea of importance sampling, our proposal scans low-dimensional models under the assumption of sparsity and, for each of them, estimates their objective function to assess their predictive power in order to select. Two applications on cancer data sets and a simulation study show that the proposal compares favorably with competing alternatives such as, for example, Elastic Net and Support Vector Machine. Indeed, the proposed method not only selects smaller models for better, or at least comparable, classification errors but also provides a set of selected models instead of a single one, allowing to construct a network of possible models for a target prediction accuracy level.Theoretical Limitations of Allan Variance-based Regression for Time Series Model Estimation
IEEE Signal Processing Letters, 23(5), p.597-601.
This letter formally proves the statistical inconsistency of the Allan variance-based estimation of latent (composite) model parameters. This issue has not been sufficiently investigated and highlighted since it is a technique that is still being widely used in practice, especially within the engineering domain. Indeed, among others, this method is frequently used for inertial sensor calibration, which often deals with latent time series models and practitioners in these domains are often unaware of its limitations. To prove the inconsistency of this method, we first provide a formal definition and subsequently deliver its theoretical properties, highlighting its limitations by comparing it with another statistically sound method.Wavelet-based Improvements for Inertial Sensor Error Modeling
IEEE Transactions on Instrumentation and Measurement, 65(12), p.2693-2700.
The parametric estimation of stochastic error signals is a common task in many engineering applications, such as inertial sensor calibration. In the latter case, the error signals are often of complex nature, and very few approaches are available to estimate the parameters of these processes. A frequently used approach for this purpose is the maximum likelihood (ML), which is usually implemented through a Kalman filter and found via the expectation-maximization algorithm. Although the ML is a statistically sound and efficient estimator, its numerical instability has brought to the use of alternative methods, the main one being the generalized method of wavelet moments (GMWM). The latter is a straightforward, consistent, and computationally efficient approach, which nevertheless loses statistical efficiency compared with the ML method. To narrow this gap, in this paper, we show that the performance of the GMWM estimator can be enhanced by making use of model moments in addition to those provided by the vector of wavelet variances. The theoretical findings are supported by simulations that highlight how the new estimator not only improves the finite sample performance of the GMWM but also allows it to approach the statistical efficiency of the ML. Finally, a case study with an inertial sensor demonstrates how useful this development is for the purposes of sensor calibration.Discussion on Maximum Likelihood-based Methods for Inertial Sensor Calibration
IEEE Sensors Journal, 16(14), p.5522-5523.
This letter highlights some issues which were overlooked in a recently published paper called maximum likelihood identification of inertial sensor noise model parameters. The latter paper does not consider existing alternative methods, which specifically tackle this issue in a possibly more direct manner and, although remaining a generally valid proposal, does not appear to improve on the earlier proposals. Finally, a simulation study rectifies the poor results of an estimator of reference in the same publication.Member Plan Choice and Migration in Response to Changes in Member Premiums after Massachusetts Health Insurance Reform
North American Actuarial Journal, p.1-16.
In 2006 Massachusetts implemented a substantial reform of its health insurance market that included a new program for uninsured individuals with income between 100% of Federal Poverty (the upper limit for state Medicaid benefits) and 300% of Federal Poverty. Enrollment was compulsory for all citizens because of a mandate. Consumers who enrolled in this program, which offered generous benefits with low copays, received graduated subsidies depending on their income. Five insurers were contracted to underwrite the program, and consumers were able to choose their insurer. Insurers bid annually, and the member contribution was set according to an affordability schedule for the lowest-bidding insurer. Consumers could choose from the range of insurers, but if they chose a plan other than the lowest cost, their contributions reflected the difference. Premiums were changed annually at July 1, and members were eligible to move to a different plan at this date; a number of members migrated each year. This study aims to quantify the effect of this premium-induced switching behavior. Prior studies of member switching behavior have looked at employer plans and estimated the elasticity of response to changes in member contributions. The Massachusetts environment is unique in that there is a mandate (so being uninsured is not an option) and members may choose insurer but not benefit plan. Thus a study of migration in Massachusetts is uniquely able to quantify the effect of price (contribution rates) on member switching behavior. We find elasticity averaging −0.21 for 2013 (the last year of the study) to be somewhat lower (in absolute value) than previous studies of employer populations. Elasticity has also been significantly increasing with time and appeared to have at least doubled over the studied period (i.e., 2008–2013). Prior studies have estimated higher elasticities in the range −0.3 to −0.6. We found that the data contained many outliers in terms of both changes in contributions and percentage of members switching plans. The effect of outliers was moderated by the choice of robust regression models, leading us to question whether other studies may have been affected by outliers, leading to overestimates of the elasticities.Use of a Publicly Available Database to Determine the Impact of Diabetes on Length of Hospital Stay for Elective Orthopedic Procedures in California
Population Health Management, p.1-17.
In California, 1 in 3 hospital beds are occupied by adults with diabetes. The aim of this study was to examine whether diabetes impacts length of stay (LOS) following common elective orthopedic procedures compared to nondiabetic individuals, and also the performance of hospitals across California for these procedures. Using the Public Use California Patient Discharge Data Files for 2010-2012, the authors examined LOS for elective discharges for hip, spine, or knee surgery (n = 318,861) from the total population of all discharges (n = 11,476,073) for 309 hospitals across California. In all, 16% of discharges had a codiagnosis of diabetes. Unadjusted average LOS was 3.11 days without and 3.40 days with diabetes (mean difference 0.29 [95% confidence interval (0.27, 0.31) days, P < 0.01]). After adjusting for covariates, diabetes no longer resulted in a significant difference in LOS. However, the presence of common comorbidities did significantly impact LOS. Average LOS for patients with diabetes also varied widely by hospital, ranging between -50% and +100% of the mean LOS for all hospitals. Diabetes does not prolong LOS after orthopedic procedures unless comorbidities are present. Nevertheless, across California there is significant variation in LOS between individual hospitals, which may inform the decision-making process for prospective patients and payers.
2015
Automatic Identification and Calibration of Stochastic Parameters in Inertial Sensors
Journal of the Institute of Navigation, 62(4), p.265-272.
We present an algorithm for determining the nature of stochastic processes and their parameters based on the analysis of time series of inertial errors. The algorithm is suitable mainly (but not only) for situations where several stochastic processes are superposed. The proposed approach is based on a recently developed method called the Generalized Method of Wavelet Moments (GMWM), whose estimator was proven to be consistent and asymptotically normally distributed. This method delivers a global selection criterion based on the wavelet variance that can be used to determine the suitability of a candidate model (compared to other models) and apply it to low-cost inertial sensors. By allowing candidate model ranking, this approach enables us to construct an algorithm for automatic model identification and determination. The benefits of this methodology are highlighted by providing practical examples of model selection for two types of MEMS IMUs.An Approach for Observing and Modeling Errors in MEMS-based Inertial Sensors under Vehicle Dynamic
IEEE Transactions on Instrumentation and Measurement, 64(11), p.2926-2936.
This paper studies the error behavior of low-cost inertial sensors in dynamic conditions. After proposing a method for error observations per sensor (i.e., gyroscope or accelerometer) and axes, their properties are estimated via the methodology of generalized method of wavelet moments. The developed model parameters are compared with those obtained under static conditions. Then, an attempt is presented to link the parameters of the established model to the dynamic of the vehicle. It is found that a linear relation explains a large portion of the exhibited variability. These findings suggest that the static methods employed for the calibration of inertial sensors could be improved when exploiting such a relationship.
2014
Generalized Method of Wavelet Moments for Inertial Navigation Filter Design
IEEE Transactions on Aerospace and Electronic Systems, 50(3), p.2269-2283.
The integration of observations issued from a satellite-based system (GNSS) with an inertial navigation system (INS) is usually performed through a Bayesian filter such as the extended Kalman filter (EKF). The task of designing the navigation EKF is strongly related to the inertial sensor error modeling problem. Accelerometers and gyroscopes may be corrupted by random errors of complex spectral structure. Consequently, identifying correct error-state parameters in the INS/GNSS EKF becomes difficult when several stochastic processes are superposed. In such situations, classical approaches like the Allan variance (AV) or power spectral density (PSD) analysis fail due to the difficulty of separating the error processes in the spectral domain. For this purpose, we propose applying a recently developed estimator based on the generalized method of wavelet moments (GMWM), which was proven to be consistent and asymptotically normally distributed. The GMWM estimator matches theoretical and sample-based wavelet variances (WVs), and can be computed using the method of indirect inference. This article mainly focuses on the implementation aspects related to the GMWM, and its integration within a general navigation filter calibration procedure. Regarding this, we apply the GMWM on error signals issued from MEMS-based inertial sensors by building and estimating composite stochastic processes for which classical methods cannot be used. In a first stage, we validate the resulting models using AV and PSD analyses and then, in a second stage, we study the impact of the resulting stochastic models design in terms of positioning accuracy using an emulated scenario with statically observed error signatures. We demonstrate that the GMWM-based calibration framework enables to estimate complex stochastic models in terms of the resulting navigation accuracy that are relevant for the observed structure of errors.Estimation of Time Series Models via Robust Wavelet Variance
Austrian Journal of Statistics, 43(3-4), p.267- 277.
A robust approach to the estimation of time series models is proposed. Taking from a new estimation method called the Generalized Method of Wavelet Moments (GMWM) which is an indirect method based on the Wavelet Variance (WV), we replace the classical estimator of the WV with a recently proposed robust M-estimator to obtain a robust version of the GMWM. The simulation results show that the proposed approach can be considered as a valid robust approach to the estimation of time series and state-space models.
2013
Wavelet-variance-based Estimation for Composite Stochastic Processes
Journal of the American Statistical Association, 108(503), p.1021-1030.
This article presents a new estimation method for the parameters of a time series model. We consider here composite Gaussian processes that are the sum of independent Gaussian processes which, in turn, explain an important aspect of the time series, as is the case in engineering and natural sciences. The proposed estimation method offers an alternative to classical estimation based on the likelihood, that is straightforward to implement and often the only feasible estimation method with complex models. The estimator furnishes results as the optimization of a criterion based on a standardized distance between the sample wavelet variances (WV) estimates and the model-based WV. Indeed, the WV provides a decomposition of the variance process through different scales, so that they contain the information about different features of the stochastic model. We derive the asymptotic properties of the proposed estimator for inference and perform a simulation study to compare our estimator to the MLE and the LSE with different models. We also set sufficient conditions on composite models for our estimator to be consistent, that are easy to verify. We use the new estimator to estimate the stochastic error's parameters of the sum of three first order Gauss–Markov processes by means of a sample of over 800, 000 issued from gyroscopes that compose inertial navigation systems. Supplementary materials for this article are available online.
2012
Fault Detection and Isolation in Multiple MEMS-IMUs Configurations
IEEE Transactions on Aerospace and Electronic Systems, 48(3), p.2015-2031.
This research presents methods for detecting and isolating faults in multiple micro-electro-mechanical system inertial measurement unit (MEMS-IMU) configurations. First, geometric configurations with n sensor triads are investigated. It is proved that the relative orientation between sensor triads is irrelevant to system optimality in the absence of failures. Then, the impact of sensor failure or decreased performance is investigated. Three fault detection and isolation (FDI) approaches (i.e., the parity space method, Mahalanobis distance method and its direct robustification) are reviewed theoretically and in the context of experiments using reference signals. It is shown that in the presence of multiple outliers the best performing detection algorithm is the robust version of the Mahalanobis distance.
2011
Constrained Expectation-Maximization Algorithm for Stochastic Inertial Error Modeling: Study of Feasibility
Measurement Science and Technology, 22(8), p.121-135.
Stochastic modeling is a challenging task for low-cost sensors whose errors can have complex spectral structures. This makes the tuning process of the INS/GNSS Kalman filter often sensitive and difficult. For example, first-order Gauss–Markov processes are very often used in inertial sensor models. But the estimation of their parameters is a non-trivial task if the error structure is mixed with other types of noises. Such an estimation is often attempted by computing and analyzing Allan variance plots. This contribution demonstrates solving situations when the estimation of error parameters by graphical interpretation is rather difficult. The novel strategy performs direct estimation of these parameters by means of the expectation-maximization (EM) algorithm. The algorithm results are first analyzed with a critical and practical point of view using simulations with typically encountered error signals. These simulations show that the EM algorithm seems to perform better than the Allan variance and offers a procedure to estimate first-order Gauss–Markov processes mixed with other types of noises. At the same time, the conducted tests revealed limits of this approach that are related to the convergence and stability issues. Suggestions are given to circumvent or mitigate these problems when complexity of error structure is 'reasonable'. This work also highlights the fact that the suggested approach via EM algorithm and the Allan variance may not be able to estimate the parameters of complex error models reasonably well and shows the need for new estimation procedures to be developed in this context. Finally, an empirical scenario is presented to support the former findings. There, the positive effect of using the more sophisticated EM-based error modeling on a filtered trajectory is highlighted.
2010
Noise Reduction and Estimation in Multiple Micro-electro-mechanical Inertial Systems
Measurement Science and Technology, 21(6), p.231-242.
This research studies the reduction and the estimation of the noise level within a redundant configuration of low-cost (MEMS-type) inertial measurement units (IMUs). Firstly, independent observations between units and sensors are assumed and the theoretical decrease in the system noise level is analyzed in an experiment with four MEMS-IMU triads. Then, more complex scenarios are presented in which the noise level can vary in time and for each sensor. A statistical method employed for studying the volatility of financial markets (GARCH) is adapted and tested for the usage with inertial data. This paper demonstrates experimentally and through simulations the benefit of direct noise estimation in redundant IMU setups.