The problem of estimating the dimensionality of a model occurs in various forms in applied statistics: estimating the number of factors in factor analysis, estimating the degree of a polynomial describing the data, selecting the variables to be introduced in a multiple regression equation, estimating the order of an AR or MA time series model, and so on.

I study the typical configuration of a Cp plot when the number of variables in the regression problem is large and there are many weak effects. I show that a particular configuration that is very commonly seen can arise in a simple way. I give a formula by means of which the risk incurred by the “minimum CP ” rule can be estimated.

We discuss the interpretation of C p -plots and show how they can be calibrated in several ways. We comment on the practice of using the display as a basis for formal selection of a subset-regression model, and extend the range of application of the device to encompass arbitrary linear estimates of the regression coefficients, for example Ridge estimates.

A review of model-selection criteria is presented, with a view toward showing their similarities. It is suggested that some problems treated by sequences of hypothesis tests may be more expeditiously treated by the application of model-selection criteria. Consideration is given to application of model-selection criteria to some problems of multivariate analysis, especially the clustering of variables, factor analysis and, more generally, describing a complex of variables.

During the last fifteen years, Akaike's entropy-based Information Criterion (AIC) has had a fundamental impact in statistical model evaluation problems. This paper studies the general theory of the AIC procedure and provides its analytical extensions in two ways without violating Akaike's main principles. These extensions make AIC asymptotically consistent and penalize overparameterization more stringently to pick only the simplest of the “true” models. These selection criteria are called CAIC a...

The number of digits it takes to write down an observed sequence x"1, ..., x"N of a time series depends on the model with its parameters that one assumes to have generated the observed data. Accordingly, by finding the model which minimizes the description length one obtains estimates of both the integer-valued structure parameters and the real-valued system parameters.

The problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading terms of its asymptotic expansion. These terms are a valid large-sample criterion beyond the Bayesian context, since they do not depend on the a priori distribution.

Purpose Chronic post-surgical pain (CPSP) is a highly prevalent complication following thoracic surgery. This is a prospective cohort study that aims to describe the pain trajectories of patients undergoing thoracic surgery beginning preoperatively and up to 1 year after surgery METHODS: Two hundred and seventy nine patients undergoing elective thoracic surgery were enrolled. Participants filled out a preoperative questionnaire containing questions about their sociodemographic information, comor...

AbstractResearch on communities and crime has predominantly focused on social conditions within an area or in its immediate proximity. However, a growing body of research shows that people often tr...

BACKGROUND: Misdiagnosis, arbitrary charges and annoying queues, and clinic waiting times among others are long-standing phenomena in the medical industry across the world, These factors can con-tribute to patient anxiety about misdiagnosis by clinicians. However, with the increasing use of big data growth in biomedical and healthcare communities, the performance of artificial intelligence (Al) techniques of diagnosis is improving, and can help avoid medical practice errors, including under the ...

Recent studies suggest that a large proportion of new HIV-1 infections in mature epidemics occurs within discordant couples, making discordancy a major contributor to the spread of HIV/AIDS in Africa. This paper aims at assessing changes over a five-year period (2009–2015) on the (risk) factors associated with HIV serodiscordance among couples in Mozambique, using cross-sectional data from the INSIDA and IMASIDA surveys. The pooled data of both surveys were analyzed using a joint model for three...

Abstract Collisions between ships and whales raise environmental, safety, and economic concerns. The management of whale-ship collisions, however, lacks a holistic approach, unlike the management of other types of wildlife-vehicle collisions, which have been more standardized for several years now. In particular, safety and economic factors are routinely omitted in the assessment of proposed mitigation solutions to ship strikes, possibly leading to under-compliance and a lack of acceptance from ...

The study aims to test the applicability of a variant of the model proposed by Hockerts (2017) for assessing the social entrepreneurial intention (SEI) of male and female students. It extends the model by incorporating the university's environment and support system (ESS) as an additional more distal construct. The university's ESS, coupled with the experience with social, cultural and environmental issues can affect SEI by influencing the more proximal precursors of empathy towards others, perc...

Many researchers have studied gender differences in the entrepreneurial intention of students by analyzing the influence of several intrinsic and extrinsic factors on the antecedents of entrepreneurial intention. Fewer researchers have analyzed the influence of the university’s environment and support system on the precursors of the entrepreneurial intention of students in general and of female students in particular. This study aims to fill that gap by analyzing the influence of the university’...

We provide a novel family of generative block-models for random graphs that naturally incorporates degree distributions: the block-constrained configuration model. Block-constrained configuration models build on the generalized hypergeometric ensemble of random graphs and extend the well-known configuration model by enforcing block-constraints on the edge-generating process. The resulting models are practical to fit even to large networks. These models provide a new, flexible tool for the study ...

Background and objectives Metabolic syndrome is a cluster of risk factors associated with CKD. By studying the genetic and environmental influences on how traits of metabolic syndrome correlate with CKD, the understanding of the etiological relationships can be improved. Design, setting, participants, & measurements From the population-based TwinGene project within the Swedish Twin Registry, 4721 complete twin pairs (9442 European ancestry participants) were included in this cross-sectional twin...