Abstracts :: Seio-bayes

Estimating the number of kinds using proxy sampling effort data

Simon Wilson

Data in a number of kinds problem are usually modelled as one of 2 types:

Complete sampling data, where the number of individuals sampled and their kind are observed;
Temporal data that describe when new kinds were observed but lack information on numbers sampled.

Very different models and estimation methods apply to these types.

Inconveniently, one of the most important applications of this problem, estimation of the number of species, falls into neither type; complete sampling information is lacking, but there is some proxy information on it, typically some measure of effort like estimates of numbers of individuals that can be sampled. In this talk we propose a hybrid model that allows such proxy information to be incorporated. The advantage of this approach are that it produces a framework around which the uncertainties in the number of species can be modelled and quantified, something that is certainly needed for a question where estimates vary by at least an order of magnitude and estimates of uncertainty are often lacking. Inference is implemented via ABC and applied to 2 large databases: Catalogue of Life and World Register of Marine Species. Prior sensitivity and approaches to speeding up the implementation are discussed.

DESCARGAR simo...seio.pdf

Revisiting Bayesian p-values for model checking

Maria Eugenia Castellanos

In the last few years, the importance of assessing the goodness of fit of models has emerged in applied modelling, especially in the so-called applied modelling workflows (Gelman et al., 2020, Bayesian workflows; Gabry et al., 2019, Visualising in Bayesian workflows). In this context, the main tool used to assess the fit of the Bayesian model is the Bayesian posterior p-value, although it has been criticised in many works for its conservativeness and lack of power. To mitigate this poor behaviour of the posterior p, mainly due to the double use of the data, there are several proposals. Some consist of carrying out post-processing, and others use a sampled or plug-in version estimation of parameters or measures that are pivotal quantities. On the other hand, Bayarri and Berger worked on conditional and partial p-values from the ISDS Discussion Paper, 1997 to JASA 2000, to obtain calibrated p-values. These p-values have an attractive property: when considered as random variables, p(X), their null distribution is uniform, at least asymptotically. This gives these p-values the very desirable property of having the same interpretation across problems. The main drawback is that they are expensive and difficult (computationally and temporally) to obtain in complex models.

This paper reviews various proposals in the literature, in particular, it presents the conditional predictive p-value when conditioning on maximum likelihood estimators of the hyperparameters. The general proposal of the conditional predictive p-values to m.l.e. is due to several papers by Robert and Rousseau, 2002, and Fraser and Rousseau, 2008. These conditional p-values are easier to obtain and are also asymptotically uniformly distributed. We will present the application of these measures to the goodness of fit in the most interesting context of hierarchical models.

DESCARGAR MECa...anos.pdf

Is European sardine fishing in the Mediterranean Sea at risk? A Bayesian joint longitudinal approach

Gabriel Calvo, Carmen Armero, Luigi Spezia, and Maria Grazia Pennino

In this work, we develop Bayesian joint longitudinal models to assess the fishing of the European sardine over the last few decades in the Mediterranean Sea. This species is probably the most ecologically and economically important small pelagic fish. In recent years, a concerning reduction in the weight and life expectancy of this type of fish has been observed. This decline is the result of a complex process likely caused by several factors.

This study focuses on the evolution of sardine catches in the Mediterranean Sea from 1985 to 2018 according to the Mediterranean country and the type of fishing practised, artisanal and industrial. We propose three Bayesian longitudinal mixed linear models to assess the differences in the temporal evolution of artisanal and industrial fishing between and within countries.

In this case, we consider a bivariate response: where y(A) represents the logarithm of the artisanal tons and y(I), the industrial fishing. These two responses are conditionally independent given the parameters, hyperparameters, random effects, and serial correlation terms. Three alternative modellings are considered. The first option is a mixed model with common intercepts and slopes and their subsequent individual random effects, with normally distributed measurement errors. The second model differs only in the errors, instead of being independent and normally distributed, they are autoregressive of order 1. The last option has all four components: fixed effects, random effects, autoregressive terms, and normal errors. The key point is that we assume a linear relationship between the individual random effects of the artisanal response and those of the industrial response through two variance-covariance matrices.

DESCARGAR Gabr...alvo.pdf

Bayesian sensitivity analysis in the case of the compound Poisson Process

F. Ruggeri, M. Sánchez-Sánchez, A. Suárez-Llorens

The Bayesian approach is widely used in determining insurance premiums, but is often faces criticism due to the subjectivity in selecting the prior distribution. In the area of robust Bayesian analysis, recent studies have proposed a novel class of prior distributions based on stochastic orders and weighted functions. Given a risk dependent on a multidimensional parameter, we will study how this uncertainty propagates to collective and Bayesian premiums, establishing upper and lower bounds for Bayesian premiums reached at the extreme members of the class. Crucial to this will be the order-preserving properties in the prior and posterior distributions intrinsic to certain premium principles, through multivariate stochastic orderings. We illustrate this methodology with examples based on the compound Poisson process, which models the frequency and severity of claims in automobile insurance, and we show its applicability to assessing sensitivity within a Bonus-Malus (BMS) system.

A framework for managing cybersecurity risks in systems including Artificial Intelligence components.

J.M. Camacho, A. Couce, D. Arroyo, D. Ríos Insua

The introduction of the European Union Artificial Intelligence Act, the NIST Artificial Intelligence Risk Management Framework, and related norms demand a better understanding and implementation of novel risk analysis approaches to deal with systems including Artificial Intelligence (AI) components.
This comprises dealing with novel AI-related impacts; incorporating AI-based assets within the cyber architecture; considering AI-based security and recovery controls within the cybersecurity portfolio;
and facing novel types of AI-based targeted attacks. This talk suggests (mostly) Bayesian solutions to such issues and integrates them within a broad cybersecurity risk analysis framework to support managing risks in systems with AI-based components. An example concerning automated driving systems illustrates the framework.

Spatio-temporal Dirichlet regression models

Mario Figueira, David Conesa and Antonio López-Quílez

Compositional Data Analysis (CoDa) has experienced a surge in popularity in recent years due to its applicability in various fields, e.g. land use management, demography, or the analysis of microbiome composition. This analytical approach deals with data comprising values from distinct categories that collectively sum up to a constant. Fitting multivariate models within the CoDa framework presents challenges, particularly when incorporating structured random effects like temporal or spatial variations.

In this work we present two different approaches for analysing compositional data, along with the extension implemented in the dirinla package. Both approaches allow us to construct highly versatile models for analysing spatial and spatiotemporal structure. Thanks to its implementation in INLA, we can obtain results at a low computational cost, enabling us to carry out step-wise regression processes. Thus, we can select the model that best fits a given set of explanatory variables under different basic structures for the spatial and temporal components. Furthermore, downscaling models can be implemented to avoid effects stemming from highly irregular area structures. As an example of the use of this methodology, we present an analysis of data provided in an European project of Land Use Management for Sustainability. Data have been categorised into five distinct types: cropland, grassland, forest, urban, and other natural lands. The spatial structure of the data is at the NUTS3 level, with yearly data available from 2008 to 2017.

DESCARGAR Davi...nesa.pdf

Variable selection with groups: a Bayesian Model Uncertainty perspective

Gonzalo García-Donato, Anabel Forte and Camilla Savaranesse

In the design of many observational studies, variables are collected in groups that represent population characteristics that cannot be measured directly.

In this research, we consider such a situation when the purpose of the study is to identify the causes that explain an outcome of interest. This problem can be called variable selection with groups or feature selection. In our proposal, the problem is treated from a Bayesian model uncertainty perspective, and we argue that the focus should be on the prior distribution over the model space. We propose a hierarchical distribution that controls for multiplicity and favors an appropriate representation of the submodels that make up a group. We illustrate the benefits of this approach on synthetic examples and a real data set from a respiratory study.

DESCARGAR Gonz...nato.pdf

Heterogeneous influence of banking risk on cost efficiency: a hierarchical bayesian analysis

Pilar Gargallo, Jordi Moreno and Manuel Salvador

During the past two decades, European banking sectors have undergone a profound process of integration and financial globalization. In this highly competitive context, and following the global financial crisis (2007-2009), concerns have arisen regarding the risk assumed by banking systems and its impact on their performance. Despite the regulations implemented, such as those proposed in Basel III aimed at enhancing the stability and security of the global banking system, recent studies have revealed that these measures have not achieved their objectives for European banks.

Understanding the influence of risk on banking outcomes is essential, particularly given the high-competition environment that incentivizes risky behaviours. While the literature addressing this issue is abundant, it remains unclear to what extent and how different types of banking risk affect their performance. Moreover, previous studies examining this relationship have often considered a homogeneous effect of risk on banking efficiency, without accounting for the possibility that this risk assumed by banks could have heterogeneous effects depending on the bank's own characteristics and the conditions of the operating environment.

To shed light on this matter, we analysed a sample of commercial banks operating in European Union countries between 2004 and 2020. For the analysis, we adopted an approach based on the modified value-added method, where deposits are simultaneously treated as inputs and outputs since they involve value creation. We proposed a Bayesian stochastic frontier model applied to an unbalanced panel of data to estimate the evolution of cost efficiency for each bank. This efficiency is assumed to depend on the bank's size and the levels of risk it assumes. The model is hierarchical and supposes, in a second stage, that the influence exerted by these characteristics depends, in turn, on the environment (country, industry) in which the bank operates. Using Bayesian tools, we compared various hypotheses regarding the degree of homogeneity of this influence concerning the country and industry. The economic and financial data were obtained from Orbis Bank Focus (2004-2016) and Orbis (2017-2020), while data on industry and country were collected from the World Bank's World Development Indicators database.

DESCARGAR Pila...allo.pdf

Colorectal cancer risk mapping through Bayesian networks

Daniel Corrales Alonso and David Ríos

Only about 14 % of the susceptible EU citizens participate in colorectal cancer (CRC) screening programs despite of being the third most common type of cancer worldwide. The development of predictive models can facilitate personalized CRC predictions which can be embedded in decision-support tools that facilitate screening and treatment recommendations. This work develops a predictive model that aids in characterizing risk groups and assessing the influence of a variety of risk factors on the population. A Bayesian Network (BN) is learned by aggregating extensive expert knowledge and data from an observational study and making use of a structure learning algorithm to represent the set of relations between variables. The network is then parametrized to characterize these relations in terms of local probability distributions at each of the nodes. It is finally used to predict the individuals' risk of developing CRC together with its uncertainty. A CRC risk graphical mapping tool is developed from the model and used to segment the population into risk subgroups according to variables of interest. Furthermore, the network provides insights on the influence of modifiable risk factors such as alcohol consumption and smoking, and medical conditions such as diabetes or hypertension linked to the individuals' lifestyle that have an impact on the increased risk for developing CRC.

A Bayesian competing risks joint model to study the cause of death in patients with heart failure

Jesús Gutiérrez-Botella, Carmen Armero, Thomas Kneib, María Pata and Javier García-Seara

Heart Failure (HF) occurs when the heart is unable to pump blood around the body properly. It usually happens because the heart has become too weak or stiff. Cardiac Resynchronization Therapy (CRT) is a procedure which consists of implanting a device in the heart's chambers to help it to work in a more efficient way. This therapy has been shown to improve the short-term prognosis of HF patients, but there are scarce data about its long-term benefits.

Joint modeling of longitudinal and survival data (JM-LS) deals with statistical models that allow to combine longitudinal and time-to-event information. JM-LS are used to introduce internal time-varying covariates in survival processes as well as to provide inferential frameworks for the inclusion of non-ignorable dropout mechanisms through survival tools. Basic JM-LSs are based on (generalized) linear mixed effects models and on the Cox survival regression model.

The objective of the work is to study the cardiovascular and non-cardiovascular death in HF patients who underwent CRT in relation to longitudinal and baseline covariates. For that purpose, we considered a Bayesian JM-LS which accounts for a mixed linear Beta regression model and a competing risks model. The cause-specific hazard function for each relevant event is defined in terms of the Cox proportional hazards structure with Weibull baseline hazards, baseline covariates and a final term including longitudinal information. We discuss different proposals for the latter term, such as the mean of the time-dependent covariate or the specific random effects associated to individuals. Markov chain Monte Carlo methods (MCMC) have been employed the relevant posterior distribution through JAGS software. Posterior outputs and prediction are discussed.

Product launching through adversarial risk analysis

Pablo García Arce and Davíd Ríos

In a world of utility-driven marketing, each company acts as an adversary to other contenders, with all having competing interests. A major challenge for companies at the time of the launch of a new
product is that despite testing, there can remain some endogenous flaws in their product, potentially risking a loss in their market share. However, delayed launch decisions can lead to losing any `first-mover advantage'. Furthermore, each company generally has incomplete information regarding both the launch strategy and the quality of products from a competing brand. From a buyer's perspective, along with the price, they need to make their buying decision based on noisy signals regarding the quality of competing brands, generally through advertisements. This paper proposes a broad adversarial risk analysis framework to support product launching decisions by a company in the presence of multiple competitors, and multiple buyers, with both sides aiming to maximize their expected utilities, under various incomplete signals. We apply the proposed
framework to illustrate applications to a software release case study.

Spatial Generalized Dissimilarity Mixed Models for Beta Diversity

A. Gelfand, Phil White, Henry Frye, Jasper Slingsby, and John Silander

Turnover, or change in the composition of species over space and time, is one of the primary ways to define beta diversity. Inferring what factors impact beta diversity is not only important for understanding biodiversity processes but also for conservation planning. A popular approach to understanding the drivers of compositional turnover is through generalized dissimilarity modeling (GDM). However, the current GDM approach suffers several limitations, as we detail. So we provide an alternative approach that remedies these issues. In particular, we propose a model that provides improvements including a flexible spatially varying mean function, spatial random effects that capture dependence unaccounted for by explanatory variables, and spatially heterogeneous variance structure. Further, these features are offered in a model that can handle a large incidence of total dissimilarity through "1-inflation." Such inflation would be expected from highly biodiverse areas with steep turnover gradients.

The models are implemented in a Bayesian framework, employing hierarchical specifications to yield full regression and spatial predictive inference, both with associated full uncertainties. We illustrate by examining dissimilarity in three datasets: tree survey data from Panama's Barro Colorado Island (BCI), plant occurrence data from southwest Australia, and plant abundance surveys from the Greater Cape Floristic Region (GCFR) of South Africa. We select a best model using out-of-sample predictive performance. The form of the best model differs across the three datasets, but, regardless, our models provide improved performance, in some cases consequential, over GDMs. We focus on the GCFR where the spatial random effects play a more important role in the modeling than all the environmental variables.

DESCARGAR Alan...fand.pdf

A Bayesian approach to modeling imperfect diagnostic tests

R. Susi, C.M. Rodríguez-Leal, E. Dacal, J. Amador, C. Nieto.

In this work, a Bayesian approach is presented to estimate the validity,measured by sensitivity and specificity, of one or more imperfect diagnostic tests, initially considering dichotomous tests. Additionally, when working in the absence of a gold standard, Bayesian inference proves useful for determining illness prevalence. The estimation of diagnostic test validity and illness prevalence is considered in various scenarios, including situations with at least two imperfect diagnostic tests. These tests can be either independent or correlated, given the illness status, across one or more populations.

Furthermore, beyond Bayesian inference for dichotomous imperfect diagnostic tests, we delve into a more complex scenario involving a continuous variable or biomarker used for diagnosis. Specifically, we explore the receiver operating characteristic (ROC) curve to evaluate the discriminatory ability of the biomarker. Bayesian estimation of the ROC curve for two diagnostic tests is then presented, considering both the binormal and bigamma models.

The theoretical concepts discussed in this presentation are illustrated using two real-world cases:

(1) Estimation of the validity of two uncorrelated dichotomous imperfect diagnostic tests and the prevalence of strongyloidiasis.

(2) Estimation of the ROC curve for two biomarkers, assuming gamma-distributed data, in the context of pulmonary thromboembolism disease.

DESCARGAR Rosa...Susi.pdf

Quantitative system risk assessment from incomplete data with belief networks and pairwise comparison elicitation

M. Remedios Sillero-Denamiel, José Luis Bosque, Cristina De Persis, Irene Huertas, Simon Wilson

A method for conducting Bayesian elicitation and learning in risk assessment is presented. It assumes that the risk process can be described as a fault tree. This is viewed as a belief network, for which prior distributions on primary event probabilities are elicited by means of a pairwise comparison approach. A novel and fully Bayesian updating procedure, following different observation campaigns of the events in the fault tree for the posterior probabilities assessment, is described. In particular, the goal is to handle contexts where there are limited data information, thus keeping simple the elicitation process and adequately quantifying the uncertainties in the analysis. The application is illustrated through the motivating example of risk assessment of spacecraft explosion during controlled re-entry.

DESCARGAR Reme...lero.pdf

Spatio-temporal modeling for record-breaking temperature events in Spain

Ana C. Cebrián, J. Castillo, A. Gelfand, J. Asín, and Z. Gracia

The occurrence of record-breaking temperature events is one of the evidences of climate change. In this context, we present an approach to investigate the occurrence of record-breaking temperatures across years for any given day within the year within a space-time framework. Formal statistical analysis of record-breaking events has primarily been developed within the probability community, using results from the stationary record-breaking setting. However, that framework is not sufficient for analyzing actual record-breaking data, which requires rich modeling of the indicator events defining record-breaking series. We work with a dataset consisting of over sixty years (1960–2021) of daily maximum temperatures across peninsular Spain. A novel and thorough exploratory data analysis leads us to propose hierarchical conditional models for the indicator events. The final model includes an explicit trend, necessary autoregression terms, spatial behavior captured by the distance to the coast, useful interactions, helpful spatial random effects, and very strong daily random effects. The fitted model shows that global warming trends have increased the number of records expected in the past decade almost two-fold, but it also estimates highly differentiated climate warming rates in space and by season.

DESCARGAR Ana.Cebrian.pdf

The point null hypothesis, you can do better than p-values

Miguel Ángel Gómez-Villegas

Statistics has struggled for nearly a century over the issue of whether the Bayesian or the frequentist approximation is superior. This debate is far from over and, indeed, should continue, since there are fundamental philosophical and pedagogical issues at stake. At the methodological level however, the debate has become considerable muted wit the recognition that each approach has a great deal to contribute to statistical practice and each is actually essential for full development of the other approach. In this contribution, we embark upon a rather idiosyncratic walk through some of these issues. For the most common scenarios, when "objective prior distributions are used" the Bayesian approach is better than the frequentist approximation and if necessary we would be en favor of the Bayesian approach, but there is also Birnbaum's theorem that places the Bayesian approach in a better position.

DESCARGAR MAng...egas.pdf