Hereditary hemochromatosis is a disorder of iron metabolism characterized by increased iron intake and progressive storage and is related to mutations in the HFE gene. Interactions between thalassemia and hemochromatosis may further increase iron overload. The ethnic background of the Brazilian population is heterogeneous and studies analyzing the simultaneous presence of HFE and thalassemia-related mutations have not been carried out. The aim of this study was to evaluate the prevalence of the H63D, S65C and C282Y mutations in the HFE gene among 102 individuals with alpha-thalassemia and 168 beta-thalassemia heterozygotes and to compare them with 173 control individuals without hemoglobinopathies. The allelic frequencies found in these three groups were 0.98, 2.38, and 0.29% for the C282Y mutation, 13.72, 13.70, and 9.54% for the H63D mutation, and 0, 0.60, and 0.87% for the S65C mutation, respectively. The chi-square test for multiple independent individuals indicated a significant difference among groups for the C282Y mutation, which was shown to be significant between the beta-thalassemia heterozygote and the control group by the Fisher exact test (P value = 0.009). The higher frequency of inheritance of the C282Y mutation in the HFE gene among beta-thalassemic patients may contribute to worsen the clinical picture of these individuals. In view of the characteristics of the Brazilian population...
Objectives: The stair-climbing test as measured in meters or number of steps has been proposed to predict the risk of postoperative complications. The study objective was to determine whether the stair-climbing time can predict the risk of postoperative complications. Methods: Patients aged more than 18 years with a recommendation of thoracotomy for lung resection were included in the study. Spirometry was performed according to the criteria by the American Thoracic Society. The stair-climbing test was performed on shaded stairs with a total of 12.16 m in height, and the stair-climbing time in seconds elapsed during the climb of the total height was measured. The accuracy test was applied to obtain stair-climbing time predictive values, and the receiver operating characteristic curve was calculated. Variables were tested for association with postoperative cardiopulmonary complications using the Student t test for independent populations, the Mann-Whitney test, and the chi-square or Fisher exact test. Logistic regression analysis was performed. Results: Ninety-eight patients were evaluated. Of these, 27 showed postoperative complications. Differences were found between the groups for age and attributes obtained from the stair-climbing test. The cutoff point for stair-climbing time obtained from the receiver operating characteristic curve was 37.5 seconds. No differences were found between the groups for forced expiratory volume in 1 second. In the logistic regression...
This paper explores laboratory analyses of ground layers of Portuguese wooden painted panels of the 15th and 16th centuries, performed in the Laboratório de Conservação e
Restauro José de Figueiredo – Instituto dos Museus e da Conservação. Based on this information, a database of materials was built. A subset of the database was selected
to perform a data mining analysis.
We used a method based on decision tree learning, Fisher’s exact test and permutation testing and it was possible to
find a ground layer technique that distinguishes António Nogueira’s Ferreira do Alentejo retable from other paintings
in our database. This constitutes a small contribution to a better understanding of Portuguese painting since it was possible to establish that António Nogueira used coloured
ground layers, which was an emerging technique at the time. Another contribution of this work is to show that the
methodology we present here can be applied to other case studies based on similar data.
Much forensic inference based upon DNA evidence is made assuming that the Hardy-Weinberg equilibrium (HWE) is valid for the genetic loci being used. Several statistical tests to detect and measure deviation from HWE have been devised, each having advantages and limitations. The limitations become more obvious when testing for deviation within multiallelic DNA loci is attempted. Here we present an exact test for HWE in the biallelic case, based on the ratio of weighted likelihoods under the null and alternative hypotheses, the Bayes factor. This test does not depend on asymptotic results and minimizes a linear combination of type I and type II errors. By ordering the sample space using the Bayes factor, we also define a significance (evidence) index, P value, using the weighted likelihood under the null hypothesis. We compare it to the conditional exact test for the case of sample size n = 10. Using the idea under the method of chi(2) partition, the test is used sequentially to test equilibrium in the multiple allele case and then applied to two short tandem repeat loci, using a real Caucasian data bank, showing its usefulness.
Copy number variations (CNVs) constitute a major source of genetic variations in human populations and have been reported to be associated with complex diseases. Methods have been developed for detecting CNVs and testing CNV associations in genome-wide association studies (GWAS) based on SNP arrays. Commonly used two-step testing procedures work well only for long CNVs while direct CNV association testing methods work only for recurrent CNVs. Assuming that short CNVs disrupting any part of a given genomic region increase disease risk, we developed a variable threshold exact test (VTET) for testing disease associations of CNVs randomly distributed in the genome using intensity data from SNP arrays. By extensive simulations, we found that VTET outperformed two-step testing procedures based on existing CNV calling algorithms for short CNVs and that the performance of VTET was robust to the length of the genomic region. In addition, VTET had a comparable performance with CNVtools for testing the association of recurrent CNVs. Thus, we expect VTET to be useful for testing disease associations of both recurrent and randomly distributed CNVs using existing GWAS data. We applied VTET to a lung cancer GWAS and identified a genome-wide significant region on chromosome 18q22.3 for lung squamous cell carcinoma.
Summary: Next-generation sequencing platforms for measuring digital expression such as RNA-Seq are displacing traditional microarray-based methods in biological experiments. The detection of differentially expressed genes between groups of biological conditions has led to the development of numerous bioinformatics tools, but so far, few exploit the expanded dynamic range afforded by the new technologies. We present edgeRun, an R package that implements an unconditional exact test that is a more powerful version of the exact test in edgeR. This increase in power is especially pronounced for experiments with as few as two replicates per condition, for genes with low total expression and with large biological coefficient of variation. In comparison with a panel of other tools, edgeRun consistently captures functionally similar differentially expressed genes.
Fonte: Oxford University PressPublicador: Oxford University Press
Tipo: Artigo de Revista Científica
Relevância na Pesquisa
Summary: Next-generation sequencing platforms for measuring digital expression such as RNA-Seq are displacing traditional microarray-based methods in biological experiments. The detection of differentially expressed genes between groups of biological conditions has led to the development of numerous bioinformatics tools, but so far, few exploit the expanded dynamic range afforded by the new technologies. We present edgeRun, an R package that implements an unconditional exact test that is a more powerful version of the exact test in edgeR. This increase in power is especially pronounced for experiments with as few as two replicates per condition, for genes with low total expression and with large biological coefficient of variation. In comparison with a panel of other tools, edgeRun consistently captures functionally similar differentially expressed genes. Availability and implementation: The package is freely available under the MIT license from CRAN (http://cran.r-project.org/web/packages/edgeRun). Contact: email@example.com Supplementary information: Supplementary data are available at Bioinformatics online.
In this paper, we study several tests for the equality of two unknown distributions. Two are based on empirical distribution functions, three others on nonparametric probability density estimates, and the last ones on differences between sample moments. We suggest controlling the size of such tests (under nonparametric assumptions) by using permutational versions of the tests jointly with the method of Monte Carlo tests properly adjusted to deal with discrete distributions. We also propose a combined test procedure, whose level is again perfectly controlled through the Monte Carlo test technique and has better power properties than the individual tests that are combined. Finally, in a simulation experiment, we show that the technique suggested provides perfect control of test size and that the new tests proposed can yield sizeable power improvements.; Dans ce texte, nous étudions plusieurs tests pour l’égalité de deux distributions inconnues. Deux de ces tests sont basés sur des fonctions de distribution empiriques, trois autres sur des estimateurs non paramétriques de fonctions de densité et les trois derniers sur des moments empiriques. Nous proposons de contrôler la taille des tests (sous des hypothèses non paramétriques) en employant des versions permutationnelles de ces tests conjointement avec la méthode des tests de Monte Carlo ajustée pour tenir compte de la possibilité de distributions discontinues. Nous proposons aussi une méthode pour combiner plusieurs de ces tests...
Follicular lymphoma (FL) is an attractive model for discovering biomarkers and elucidating mechanisms of tumour progression. We hypothesized that alterations in the expression of proteins with known roles in cancer biology and hematological cells might correlate with clinical outcome and thereby shed light on biological mechanisms. Sections from a tissue microarray (TMA) containing FL samples from 67 patients were immunostained for candidate biomarkers, including p53, p16INK4a, Bcl-2, Bcl-6, MUM1, PML, phospho-ERK, and p27Kip1. The Kaplan-Meier method and log-rank test were used to identify markers that correlate significantly (p<0.05) with overall survival (OS). The chi-squared or Fisher exact test were used to examine associations between histological markers and baseline clinical features, including the Follicular Lymphoma International Prognostic Index (FLIPI) score. Expression of p16INK4a or p53, or absent CD10 expression correlated with poor survival. Patients with p16INK4a-negative tumours had a median OS of 13.4 years compared to 8.3 years for those with p16INK4a-positive tumours (p=0.006). Expression of p16INK4a was significantly associated with low hemoglobin, elevated serum lactate dehydrogenase (LDH), high histological grade...
In this paper we propose exact likelihood-based mean-variance efficiency tests of the market portfolio in the context of Capital Asset Pricing Model (CAPM), allowing for a wide class of error distributions which include normality as a special case. These tests are developed in the frame-work of multivariate linear regressions (MLR). It is well known however that despite their simple statistical structure, standard asymptotically justified MLR-based tests are unreliable. In financial econometrics, exact tests have been proposed for a few specific hypotheses [Jobson and Korkie (Journal of Financial Economics, 1982), MacKinlay (Journal of Financial Economics, 1987), Gib-bons, Ross and Shanken (Econometrica, 1989), Zhou (Journal of Finance 1993)], most of which depend on normality. For the gaussian model, our tests correspond to Gibbons, Ross and Shanken’s mean-variance efficiency tests. In non-gaussian contexts, we reconsider mean-variance efficiency tests allowing for multivariate Student-t and gaussian mixture errors. Our framework allows to cast more evidence on whether the normality assumption is too restrictive when testing the CAPM. We also propose exact multivariate diagnostic checks (including tests for multivariate GARCH and mul-tivariate generalization of the well known variance ratio tests) and goodness of fit tests as well as a set estimate for the intervening nuisance parameters. Our results [over five-year subperiods] show the following: (i) multivariate normality is rejected in most subperiods...
We study the problem of testing the error distribution in a multivariate linear regression (MLR) model. The tests are functions of appropriately standardized multivariate least squares residuals whose distribution is invariant to the unknown cross-equation error covariance matrix. Empirical multivariate skewness and kurtosis criteria are then compared to simulation-based estimate of their expected value under the hypothesized distribution. Special cases considered include testing multivariate normal, Student t; normal mixtures and stable error models. In the Gaussian case, finite-sample versions of the standard multivariate skewness and kurtosis tests are derived. To do this, we exploit simple, double and multi-stage Monte Carlo test methods. For non-Gaussian distribution families involving nuisance parameters, confidence sets are derived for the the nuisance parameters and the error distribution. The procedures considered are evaluated in a small simulation experi-ment. Finally, the tests are applied to an asset pricing model with observable risk-free rates, using monthly returns on New York Stock Exchange (NYSE) portfolios over five-year subperiods from 1926-1995.; Dans cet article, nous proposons des tests sur la forme de la distribution des erreurs dans un modèle de régression linéaire multivarié (RLM). Les tests que nous développons sont fonction des résidus obtenus par moindres carrés multivariés...
This paper proposes finite-sample procedures for testing the SURE specification in multi-equation regression models, i.e. whether the disturbances in different equations are contemporaneously uncorrelated or not. We apply the technique of Monte Carlo (MC) tests [Dwass (1957), Barnard (1963)] to obtain exact tests based on standard LR and LM zero correlation tests. We also suggest a MC quasi-LR (QLR) test based on feasible generalized least squares (FGLS). We show that the latter statistics are pivotal under the null, which provides the justification for applying MC tests. Furthermore, we extend the exact independence test proposed by Harvey and Phillips (1982) to the multi-equation framework. Specifically, we introduce several induced tests based on a set of simultaneous Harvey/Phillips-type tests and suggest a simulation-based solution to the associated combination problem. The properties of the proposed tests are studied in a Monte Carlo experiment which shows that standard asymptotic tests exhibit important size distortions, while MC tests achieve complete size control and display good power. Moreover, MC-QLR tests performed best in terms of power, a result of interest from the point of view of simulation-based tests. The power of the MC induced tests improves appreciably in comparison to standard Bonferroni tests and...
In this paper, we propose exact inference procedures for asset pricing models that can be formulated in the framework of a multivariate linear regression (CAPM), allowing for stable error distributions. The normality assumption on the distribution of stock returns is usually rejected in empirical studies, due to excess kurtosis and asymmetry. To model such data, we propose a comprehensive statistical approach which allows for alternative - possibly asymmetric - heavy tailed distributions without the use of large-sample approximations. The methods suggested are based on Monte Carlo test techniques. Goodness-of-fit tests are formally incorporated to ensure that the error distributions considered are empirically sustainable, from which exact confidence sets for the unknown tail area and asymmetry parameters of the stable error distribution are derived. Tests for the efficiency of the market portfolio (zero intercepts) which explicitly allow for the presence of (unknown) nuisance parameter in the stable error distribution are derived. The methods proposed are applied to monthly returns on 12 portfolios of the New York Stock Exchange over the period 1926-1995 (5 year subperiods). We find that stable possibly skewed distributions provide statistically significant improvement in goodness-of-fit and lead to fewer rejections of the efficiency hypothesis.
We extend the analysis of the statistical properties of cytonuclear disequilibria in two major ways. First, we develop the asymptotic sampling theory for the nonrandom associations between the alleles at a haploid cytoplasmic locus and the alleles and genotypes at a diploid nuclear locus, when there are an arbitrary number of alleles at each marker. This includes the derivation of the maximum likelihood estimators and their sampling variances for each disequilibrium measure, together with simple tests of the null hypothesis of no disequilibrium. In addition to these new asymptotic tests, we provide the first implementation of Fisher's exact test for the genotypic cytonuclear disequilibria and some approximations of the exact test. We also outline an exact test for allelic cytonuclear disequilibria in multiallelic systems. An exact test should be used for data sets when either the marginal frequencies are extreme or the sample size is small. The utility of this new sampling theory is illustrated through applications to recent nuclear-mtDNA and nuclear-cpDNA data sets. The results also apply to population surveys of nuclear loci in conjunction with markers in cytoplasmically inherited microorganisms.
We describe an exact test of the null hypothesis that a Markov chain is nth
order versus the alternate hypothesis that it is $(n+1)$-th order. The
procedure does not rely on asymptotic properties, but instead builds up the
test statistic distribution via surrogate data and is valid for any sample
size. Surrogate data are generated using a novel algorithm that guarantees, per
shot, a uniform sampling from the set of sequences that exactly match the nth
order properties of the observed data.; Comment: 7 pages, 2 figures, 3 tables
Fisher's exact test is often a preferred method to estimate the significance
of statistical dependence. However, in large data sets the test is usually too
worksome to be applied, especially in an exhaustive search (data mining). The
traditional solution is to approximate the significance with the
$\chi^2$-measure, but the accuracy is often unacceptable. As a solution, we
introduce a family of upper bounds, which are fast to calculate and approximate
Fisher's $p$-value accurately. In addition, the new approximations are not
sensitive to the data size, distribution, or smallest expected counts like the
$\chi^2$-based approximation. According to both theoretical and experimental
analysis, the new approximations produce accurate results for all sufficiently
strong dependencies. The basic form of the approximation can fail with weak
dependencies, but the general form of the upper bounds can be adjusted to be
This study proposes the segmentation procedure of univariate time series
based on Fisher's exact test. We show that an adequate change point can be
detected as the minimum value of p-value. It is shown that the proposed
procedure can detect change points for an artificial time series. We apply the
proposed method to find segments of the foreign exchange rates recursively. It
is also applied to randomly shuffled time series. It concludes that the
randomly shuffled data can be used as a level to determine the null hypothesis.
Assessing the statistical significance of an observed 2x2 contingency table
can easily be accomplished using Fisher's exact test (FET). However, if the
cell entries are continuous or represent values inferred from a continuous
parametric model, then FET cannot be applied. Such tables arise frequently in
areas of biostatistical research including population genetics and evolutionary
genomics, where cell entries are estimated by computational methods and result
in cell entries drawn from the non-negative real line R+. Simply rounding cell
entries to conform to the assumptions of FET is an ill-suited approach that we
show creates problems related to both type-I and type-II errors. Pearson's
chi^2 test for independence, while technically applicable, is not often
effective for these tables, as the test has several limiting assumptions that
make application of this method inadvisable in many common instances
(particularly with small cell entries). Here we develop a novel method for
tables with continuous entries, which we term continuous Fisher's Exact Test
(cFET). Through simulations, we show that cFET has a close-to-uniform
distribution of p-values under the null hypothesis of independence, and more
power when applied to tables where the null hypothesis is false (compared to
FET applied to rounded cell entries). We apply cFET to an example from
comparative genomics to confirm an overall increased evolutionary rate among
primates compared to rodents...
The two years of Master of Science in Statistical and Economic Modeling program is the most rewarding time ever in my life. This thesis acts as a portfolio of project and applied experience while I am enrolled in the Master of Science in Statistical and Economic Modeling program. This thesis will summarize my graduate study in two parts: Simulation Study of Exchangeability for Binary Data, and Summary of Summer Internship at Center for Responsible Lending. The project of Simulation Study of Exchangeability for Binary Data contains materials from a team project, which jointly performed by Sheng Jiang, Xuan Sun and me. Abstracts for both projects are below in order.
(1) Simulation Study of Exchangeability for Binary Data
To investigate tractable Bayesian tests on exchangeability, this project considers special cases of nonexchangeable random sequences: Markov chains. Asymptotic results of Bayes factor (BF) are derived. When null hypothesis is true, Bayes Factor in favor of the null goes to infinity at geometric rate (true odds is not one half). When null hypothesis is not true, Bayes Factor in favor of the null goes to 0 faster than geometric rate. The results are robust under misspecifications. Simulation studies are employed to see the performance of the test when the sample size is small...
In this paper a nonparametric procedure for testing for monotonicity of a regression mean with guaranteed level is proposed. The procedure is based on signs of differences of observations from the response variable. The test is calibrated against the most difficult null hypothesis, when the regression function is constant, and produces an exact test in this context. In general, the test is conservative. The power of the test is good, and comparable with that of other nonparametric tests. It is shown that the testing procedure has asymptotic power 1 against certain local alternatives. The method is also robust against heavy-tailed error distributions, and even maintains good power when the errors are for example Cauchy distributed. A simulation study is provided to demonstrate finite-sample behaviour of the testing procedure.