Mathematik, Informatik und Statistik - Open Access LMU - Teil 01/03

Ludwig-Maximilians-Universität München

About

Die Universitätsbibliothek (UB) verfügt über ein umfangreiches Archiv an elektronischen Medien, das von Volltextsammlungen über Zeitungsarchive, Wörterbücher und Enzyklopädien bis hin zu ausführlichen Bibliographien und mehr als 1000 Datenbanken reicht. Auf iTunes U stellt die UB unter anderem eine Auswahl an elektronischen Publikationen der Wissenschaftlerinnen und Wissenschaftler an der LMU bereit. (Dies ist der 1. von 3 Teilen der Sammlung 'Mathematik, Informatik und Statistik - Open Access LMU'.)

Available on

Community

250 episodes

Quantifying overdispersion effects in count regression data

The Poisson regression model is often used as a first model for count data with covariates. Since this model is a GLM with canonical link, regression parameters can be easily fitted using standard software. However the model requires equidispersion, which might not be valid for the data set under consideration. There have been many models proposed in the literature to allow for overdispersion. One such model is the negative binomial regression model. In addition, score tests have been commonly used to detect overdispersion in the data. However these tests do not allow to quantify the effects of overdispersion. In this paper we propose easily interpretable discrepancy measures which allow to quantify the overdispersion effects when comparing a negative binomial regression to Poisson regression. We propose asymptotic $\alpha$-level tests for testing the size of overdispersion effects in terms of the developed discrepancy measures. A graphical display of p-values curves can then be used to allow for an exact quantification of the overdispersion effects. This can lead to a validation of the Poisson regression or a discrimination of the Poisson regression with respect to the negative binomial regression. The proposed asymptotic tests are investigated in small samples using simulation and applied to two examples.

1s
Jan 01, 2002
Comparing Different Estimators in a Nonlinear Measurement Error Model

A nonlinear structural errors-in-variables model is investigated, where the response variable has a density belonging to an exponential family and the error-prone covariate follows a Gaussian distribution. Assuming the error variance to be known, we consider two consistent estimators in addition to the naive estimator. We compare their relative efficiencies by means of their asymptotic covariance matrices for small error variances. The structural quasi score (SQS) estimator is based on a quasi score function, which is constructed from a conditional mean-variance model. Consistency and asymptotic normality of this estimator is proved. The corrected score (CS) estimator is based on an error-corrected likelihood score function. For small error variances the SQS and CS estimators are approximately equally efficient. The polynomial model and the Poisson regression model are explored in greater detail.

1s
Jan 01, 2002
Missing at Random (MAR) in Nonparametric Regression - A Simulation Experiment

This paper considers an additive model y = f(x) + e when some observations on x are missing at random but corresponding observations on y are available. Especially for this model missing at random is an interesting case because of the fact that the complete case analysis is not expected to be suitable. A simulation study is reported and methods are compared based on superiority measures as the sample mean squared error, sample variance and estimated sample bias. In detail, complete case analysis, zero order regression plus random noise, single imputation and nearest neighbor imputation are discussed.

1s
Jan 01, 2002
On the bias of structural estimation methods in a polynomial regression with measurement error when the distribution of the latent covariate is a mixture of normals

The structural variant of a regression model with measurement error is characterized by the assumption of an underlying known distribution of the latent covariate. Several estimation methods, like regression calibration or structural quasi score estimation, take this distribution into account. In the case of a polynomial regression, which is studied here, structural quasi score takes the form of structural least squares (SLS). Usually the underlying latent distribution is assumed to be the normal distribution because then the estimation methods take a particularly simple form. SLS is consistent as long as this assumption is true. The purpose of the paper is to investigate the amount of bias that results from violations of the normality assumption for the covariate distribution. Deviations from normality are introduced by switching to a mixture of normal distributions. It turns out that the bias reacts only mildly to slight deviations from normality.

1s
Jan 01, 2002
The Additive Model with Missing Values in the Independent Variable - Theory and Simulation

After a short introduction of the model, the missing mechanism and the method of inference some imputation procedures are introduced with special focus on the simulation experiment. Within this experiment, the simple additive model y = f(x) + e is assumed to have missing values in the independent variable according to MCAR. Besides the well-known complete case analysis, mean imputation plus random noise, a single imputation and two ways of nearest neighbor imputation are used. These methods are compared within a simulation experiment based on the average mean square error, variances and biases of \hat{f}(x) at the knots.

1s
Jan 01, 2002
Modelling Data from Inside of Earth: Local Smoothing of Mean and Dispersion Structure in Deep Drill Data

In this paper we analyse data originating from the German Deep Drill Program. We model the amount of 'cataclastic rocks' in a series of measurements taken from deep drill samples ranging from 1000 up to 5000 meters depth. The measurements thereby describe the amount of strongly deformed rock particles and serve as indicator for the occurrence of cataclastic shear zones, which are easily speaking areas of severely 'ground' stones due to movements of different layers in the earth crust. The data represent a 'depth series' as analogue to a 'time series', with mean, dispersion and correlation structure varying in depth. The general smooth structure is thereby disturbed by peaks and outliers so that robust procedures have to be applied for estimation. In terms of statistical modelling technology we have to tackle three different peculiarities of the data simultaneously, that is estimation of the correlation structure, local bandwidth selection and robust smoothing. To do so, existing routines are adapted and combined in new 'two stage' estimation procedures.

1s
Jan 01, 2002
A Selection Model for Bivariate Normal Data, with a Flexible Nonparametric Missing Model and a Focus on Variance Estimates

Nonignorable nonresponse is a common problem in bivariate or multivariate data. Here a selection model for bivariate normal distributed data (Y1 ; Y2) is proposed. The missingness of Y2 is supposed to depend on its own values. The model for missingness describes the probability of nonresponse in dependency of Y2 itself and it is chosen nonparametrically to allow exible patterns. We try to get a reasonable estimate for the expectation and especially for the variance of Y2 . Estimation is done by data augmentation and computation by common sampling methods.

1s
Jan 01, 2002
A comparison of asymptotic covariance matrices of three consistent estimators in the Poisson regression model with measurement errors

We consider a Poisson model, where the mean depends on certain covariates in a log-linear way with unknown regression parameters. Some or all of the covariates are measured with errors. The covariates as well as the measurement errors are both jointly normally distributed, and the error covariance matrix is supposed to be known. Three consistent estimators of the parameters - the corrected score, a structural, and the quasi-score estimators - are compared to each other with regard to their relative (asymptotic) efficiencies. The paper extends an earlier result for a scalar covariate.

1s
Jan 01, 2002
Generalized basic probability assignments

Dempster-Shafer theory allows to construct belief functions from (precise) basic probability assignments. The present paper extends this idea substantially. By considering SETS of basic probability assignments, an appealing constructive approach to general interval probability (general imprecise probabilities) is achieved, which allows for a very flexible modelling of uncertain knowledge.

1s
Jan 01, 2002
Why are West African children underweight?

The incidence of underweight amongst children under five in Western Africa has been increasing over the last decade (UNICEF, 2002). In Asia, where about two thirds of the world's underweight children live, the rate of underweight declined from about 36 per cent to some 29 per cent between 1990 and 2000. In sub-Saharan Africa, the absolute number of underweight children has increased and is now about 36 per cent. Using new data from Demographic and Health Surveys, I estimate the probability of underweight or a sample of West African children, controlling for selective survival.

1s
Jan 01, 2002
The association between reported and calculated reservation wages

Do reported reservation wages correspond to the concept of reservation wages that economists have? Using panel data on British unemployed I calculate reservation wages from a search model and compare these with reported reservation wages. It is shown that men's reported reservation wages are greater than what the model predicts, and that for women there is hardly a relation between the two variables.

1s
Jan 01, 2002
Application of Survival Analysis Methods to Long Term Care Insurance

With the introduction of compulsory long term care (LTC) insurance in Germany in 1995, a large claims portfolio with a significant proportion of censored observations became available. In first part of this paper we present an analysis of part of this portfolio using the Cox proportional hazard model (Cox, 1972) to estimate transition intensities. It is shown that this approach allows the inclusion of censored observations as well as the inclusion of time dependent risk factors such as time spent in LTC. This is in contrast to the more commonly used Poisson regression with graduation approach (see for example Renshaw and Haberman 1995) where censored observations and time dependent risk factors are ignored. In the second part we show how these estimated transition intensities can be used in a multiple state Markov process (see Haberman and Pitacco, 1999) to calculate premiums for LTC insurance plans.

1s
Jan 01, 2002
Using Genetic Algorithms for Model Selection in Graphical Models

Model selection in graphical models is still not fully investigated. The main difficulty lies in the search space of all possible models which grows more than exponentially with the number of variables involved. Here, genetic algorithms seem to be a reasonable strategy to find good fitting models for a given data set. In this paper, we adapt them to the problem of model search in graphical models and discuss their performance by conducting simulation studies.

1s
Jan 01, 2002
Risk Management with Extreme Value Theory

In this paper we review certain aspects around the Value-at-Risk, which is nowadays the industry benchmark risk measure. As a small quantile (usually 1%) Value-at-Risk is closely related to extreme value theory. We explain an estimation method based on extreme value theory. Since the variance of the estimated Value-at-Risk may depend on the dependence structure of the data, we investigate the extreme behaviour of some of the most prominent time series models in finance, continuous as well as discrete time models. We also determine optimal portfolios, when risk is measured by the Value-at-Risk. Again we use realistic models, moving away from the traditional Black-Scholes model to the class of Lévy processes. This paper is the contribution to a book by several authors on Extreme Value Theory, which will appear by CRC/Chapman and Hall.

1s
Jan 01, 2002
An exact corrected log-likelihood function for Cox's proportional hazards model under measurement error and some extensions

This paper studies Cox`s proportional hazards model under covariate measurement error. Nakamura`s (1990) methodology of corrected log-likelihood will be applied to the so called Breslow likelihood, which is, in the absence of measurement error, equivalent to partial likelihood. For a general error model with possibly heteroscedastic and non-normal additive measurement error, corrected estimators of the regression parameter as well as of the baseline hazard rate are obtained. The estimators proposed by Nakamura (1992), Kong, Huang and Li (1998) and Kong and Gu (1999) are reestablished in the special cases considered there. This sheds new light on these estimators and justifies them as exact corrected score estimators. Finally, the method will be extended to some variants of the Cox model.

1s
Jan 01, 2002
Parametric and Nonparametric Regression with Missing X's - A Review

This paper gives a detailed overview of the problem of missing data in parametric and nonparametric regression. Theoretical basics, properties as well as simulation results may help the reader to get familiar with the common problem of incomplete data sets. Of course, not all occurences can be discussed so this paper could be seen as an introduction to missing data within regression analysis and as an extension to the early paper of Little (1992).

1s
Jan 01, 2002
Geo-additive models of Childhood Undernutrition in three Sub-Saharan African Countries

We investigate the geographical and socioeconomic determinants of childhood undernutrition in Malawi, Tanzania and Zambia, three neighboring countries in Southern Africa using the 1992 Demographic and Health Surveys. We estimate models of undernutrition jointly for the three countries to explore regional patterns of undernutrition that transcend boundaries, while allowing for country-specific interactions. We use semiparametric models to flexibly model the effects of selected so-cioeconomic covariates and spatial effects. Our spatial analysis is based on a flexible geo-additive model using the district as the geographic unit of anal-ysis, which allows to separate smooth structured spatial effects from random effect. Inference is fully Bayesian and uses recent Markov chain Monte Carlo techniques. While the socioeconomic determinants generally confirm what is known in the literature, we find distinct residual spatial patterns that are not explained by the socioeconomic determinants. In particular, there appears to be a belt run-ning from Southern Tanzania to Northeastern Zambia which exhibits much worse undernutrition, even after controlling for socioeconomic effects. These effects do transcend borders between the countries, but to a varying degree. These findings have important implications for targeting policy as well as the search for left-out variables that might account for these residual spatial patterns.

1s
Jan 01, 2002
Graphical chain models for the analysis of complex genetic diseases: an application to hypertension

A crucial task in modern genetic medicine is the understanding of complex genetic diseases. The main complicating features are that a combination of genetic and environmental risk factors is involved, and the phenotype of interest may be complex. Traditional statistical techniques based on lod-scores fail when the disease is no longer monogenic and the underlying disease transmission model is not defined. Different kinds of association tests have been proved to be an appropriate and powerful statistical tool to detect a candidate gene for a complex disorder. However, statistical techniques able to investigate direct and indirect influences among phenotypes, genotypes and environmental risk factors, are required to analyse the association structure of complex diseases. In this paper we propose graphical models as a natural tool to analyse the multifactorial structure of complex genetic diseases. An application of this model to primary hypertension data set is illustrated.

1s
Jan 01, 2002
A Smooth Test in Proportional Hazard Survival Models using Local Partial Likelihood Fitting

Proportional hazard models for survival data, even though popular and numerically handy, suffer from the restrictive assumption that covariate effects are constant over survival time. A number of tests have been proposed to check this assumption. This paper contributes to this area by employing local estimates allowing to fit hazard models with covariate effects smoothly varying with time. A formal test is derived to test the model with proportional hazards against the smooth general model as alternative. The test proves to possess omnibus power. Comparative simulations and two data examples accompany the presentation. Extensions are provided to multiple covariate settings, where the focus of interest is to decide which of the covariate effects vary with time.

1s
Jan 01, 2002
Model Selection for Dags via RJMCMC for the Discrete and Mixed Case

Based on a reversible jump Markov Chain Monte Carlo (RJMCMC) algorithm which was developed by Fronk and Giudici (2000) to deal with model selection for Gaussian dags, we propose a new approach for the pure discrete case. Here, the main idea is to introduce latent variables which then allow to fall back on the already treated continuous case. This makes it also straightforward to tackle the mixed case, i.e. to deal simultaneously with continuous and discrete variables. The performance of the approach is investigated by means of a simulation study for different standard situations. In addition, a real data application is provided.

1s
Jan 01, 2002
The Tail of the Stationary Distribution of a Random Coefficient AR(q) Model

We investigate a stationary random cofficient autoregressive process. Using renewal type arguments tailor-made for such processes we show that the stationary distribution has a power-law tail. When the model is normal, we show that the model is in distribution equivalent to an autoregressive process with ARCH errors. Hence we obtain the tail behaviour of any such model of arbitrary order.

1s
Jan 01, 2002
Bayesian varying-coefficient models using adaptive regression splines

Varying-coefficient models provide a flexible framework for semi- and nonparametric generalized regression analysis. We present a fully Bayesian B-spline basis function approach with adaptive knot selection. For each of the unknown regression functions or varying coefficients, the number and location of knots and the B-spline coefficients are estimated simultaneously using reversible jump Markov chain Monte Carlo sampling. The overall procedure can therefore be viewed as a kind of Bayesian model averaging. Although Gaussian responses are covered by the general framework, the method is particularly useful for fundamentally non-Gaussian responses, where less alternatives are available. We illustrate the approach with a thorough application to two data sets analysed previously in the literature: the kyphosis data set with a binary response and survival data from the Veteran’s Administration lung cancer trial.

1s
Jan 01, 2001
Local Fitting with General Basis Functions

Local polynomial modelling can be seen as a local fit of the data against the basis functions 1, x, ... , x^p. In this paper we extend this method to a wide range of other basis functions. We will focus on the power basis, i.e. a basis which consists of the powers of an arbitrary function, and derive an extended Taylor theorem for this basis. We describe the estimation procedure and calculate asymptotic expressions for bias and variance of this local basis estimator. We apply this method to a simulated data set for various basis functions and propose a data-driven method to find a suitable basis function in each situation.

1s
Jan 01, 2001
Edge Preserving Smoothing by Local Mixture Modelling

Smooth models became more and more popular over the last couple of years. Standard smoothing methods however can not cope with discontinuities in a function or its first derivative. In particular, this implies that structural changes in data may be hidden in smooth estimates. Recently, Chu, Glad, Godtliebsen & Marron (1998) suggest local M estimation as edge preserving smoother. The basic idea behind local M estimation is that observations beyond a jump are considered as outliers and down-weighted or neglected in the estimation. We pursue a different, but related idea here and treat observations beyond a jump as tracing from a different population which differs from the current one by a shift in the mean. This means we impose locally a mixture model where mixing takes place due to different mean values. For fitting we apply a local version of the EM algorithm. The advantage of our approach shows in its general formulation. In particular, it easily extends to non Gaussian data. The procedure is applied in two examples, the first concerning the analysis of structural changes in the duration of unemployment, the second focusing on disease mapping.

1s
Jan 01, 2001
The Hungarian Unemployment Insurance Benefit System and Incentives to Return to Work

This paper analyses the impact of the Hungarian unemployment insurance (UI) benefit system on the speed of exit from unemployment to regular employment. The duration analysis relies on unemployment spells from two inflow cohorts, which are administered under distinct UI rules. Thus, it exploits a natural experiment to identify disincentive effects. Kaplan-Meier estimates suggest that the benefit reform did not significantly change the transition rates. Moreover, a semi-parametric analysis cannot find remarkable disincentive effects but an entitlement effect. The hazards of men and women rise somewhat in the last two months before they run out of UI benefit.

1s
Jan 01, 2001
Nonparametric predictive inference and interval probability

This paper presents the unique position of A(n)-based nonparametric predictive inference within the theory of interval probability. It provides a completely new understanding, leading to powerful new results and a well-founded justification of such inferences by proving strong internal consistency results.

1s
Jan 01, 2001
A Bayesian Model for Spatial Disease Prevalence Data

The analysis of the geographical distribution of disease on the scale of geographic areas such as administrative boundaries plays an important role in veterinary epidemiology. Prevalence estimates of wildlife population surveys are often based on regional count data generated by sampling animals shot by hunters. The observed disease rate per spatial unit is not a useful estimate of the underlying disease prevalence due to different sample sizes and spatial dependencies between neighbouring areas. Therefore, it is necessary to account for extra-sample variation and and spatial correlation in the data to produce more accurate maps of disease incidence. For this purpose a hierarchical Bayesian model in which structured and un-structured overdispersion is modelled explicitly in terms of spatial and non-spatial components was implemented by Markov Chain Monte Carlo methods. The model was empirically compared with the results of the non-spatial beta-binomial model using surveillance data of Pseudorabies virus infections of wildboars in the Federal State of Brandenburg, Germany.

1s
Jan 01, 2001
Synthesizing the classical and inverse methods in linear calibration

This paper considers the problem of linear calibration and presents two estimators arising from a synthesis of classical and inverse calibration approaches. Their performance properties are analyzed employing the small error asymptotic theory. Using the criteria of bias and mean squared error, the proposed estimators along with the traditional classical and inverse calibration are compared. Finally, some remarks related to future work are placed.

1s
Jan 01, 2001
Disease Mapping of Stage-specific Cancer Incidence Data

We propose two approaches for the spatial analysis of cancer incidence data with additional information on the stage of the disease at time of diagnosis. The two formulations are extensions of commonly used models for multicategorical response data on an ordinal scale. We include spatial and age group effects in both formulations, which we estimate in a nonparametric smooth way. More specifically, we adopt a fully Bayesian approach based on Gaussian pairwise difference priors where additional smoothing parameters are treated as unknown as well. We apply our methods to data on cervical cancer in the former German Democratic Republic. The results suggest that there are large spatial differences in the stage-proportions, which indicates spatial variability with respect to the introduction and effectiveness of screening programs.

1s
Jan 01, 2001
Generalized semiparametrically structured ordinal models

Semiparametrically structured models are defined as a class of models for which the predictors may contain parametric parts, additive parts of covariates with an unspecified functional form and interactions which are described as varying coefficients. In the case of an ordinal response the complexity of the predictor is determined by different sorts of effects. It is distinguished between global effects and category-specific effects where the latter allow that the effect varies across response categories. A general framework is developed in which global as well as category-specific effects may have unspecified functional form. The framework extends various existing methods of modeling ordinal responses. The Wilcoxon-Rogers notation is extended to incorporate smooth model parts and varying coefficient terms, the latter being important for the smooth specification of category-specific effects.

1s
Jan 01, 2001