# A melhor ferramenta para a sua pesquisa, trabalho e TCC!

Página 1 dos resultados de 4600 itens digitais encontrados em 0.018 segundos

- ELSEVIER SCIENCE INC
- JOHN WILEY & SONS LTD
- Biblioteca Digitais de Teses e Dissertações da USP
- Biblioteca Digital da Unicamp
- Springer-Verlag
- Medknow Publications Pvt Ltd
- Universidade Nacional da Austrália
- Cognitive Science Society
- Universidade de Indiana
- Universidade Carlos III de Madrid
- Quens University
- Universidade Cornell
- Universidade Duke
- University of Delaware
- Mais Publicadores...

## Methods for Equivalence and Noninferiority Testing

Fonte: ELSEVIER SCIENCE INC
Publicador: ELSEVIER SCIENCE INC

Tipo: Artigo de Revista Científica

Português

Relevância na Pesquisa

55.99%

#Equivalence#Noninferiority#Hypothesis testing#Confidence intervals#Power#Survival#Odds ratio#Relative risk#NON-INFERIORITY TRIALS#CLINICAL-TRIALS#END-POINTS

Classical hypothesis testing focuses on testing whether treatments have differential effects on outcome. However, sometimes clinicians may be more interested in determining whether treatments are equivalent or whether one has noninferior outcomes. We review the hypotheses for these noninferiority and equivalence research questions, consider power and sample size issues, and discuss how to perform such a test for both binary and survival outcomes. The methods are illustrated on 2 recent studies in hematopoietic cell transplantation.

Link permanente para citações:

## Hypothesis testing in an errors-in-variables model with heteroscedastic measurement errors

Fonte: JOHN WILEY & SONS LTD
Publicador: JOHN WILEY & SONS LTD

Tipo: Artigo de Revista Científica

Português

Relevância na Pesquisa

55.95%

#errors-in-variables models#equation-error models#maximum likelihood#hypothesis testing#goodness of fit#RISK#Mathematical & Computational Biology#Public, Environmental & Occupational Health#Medical Informatics#Medicine, Research & Experimental#Statistics & Probability

In many epidemiological studies it is common to resort to regression models relating incidence of a disease and its risk factors. The main goal of this paper is to consider inference on such models with error-prone observations and variances of the measurement errors changing across observations. We suppose that the observations follow a bivariate normal distribution and the measurement errors are normally distributed. Aggregate data allow the estimation of the error variances. Maximum likelihood estimates are computed numerically via the EM algorithm. Consistent estimation of the asymptotic variance of the maximum likelihood estimators is also discussed. Test statistics are proposed for testing hypotheses of interest. Further, we implement a simple graphical device that enables an assessment of the model`s goodness of fit. Results of simulations concerning the properties of the test statistics are reported. The approach is illustrated with data from the WHO MONICA Project on cardiovascular disease. Copyright (C) 2008 John Wiley & Sons, Ltd.; FONDECYT (Fordo Nacional de Desarrollo Cientifico y Tecnologico, Chile); FONDECYT (Fordo Nacional de Desarrollo Cientifico y Tecnologico, Chile)[1070919]

Link permanente para citações:

## Monotonicidade em testes de hipóteses; Monotonicity in hypothesis tests

Fonte: Biblioteca Digitais de Teses e Dissertações da USP
Publicador: Biblioteca Digitais de Teses e Dissertações da USP

Tipo: Dissertação de Mestrado
Formato: application/pdf

Publicado em 09/03/2010
Português

Relevância na Pesquisa

56.23%

#Bayes test#class of hypothesis testing#classes de testes de hipóteses#decision theory#monotonicidade#monotonicity#teoria da decisão#testes de Bayes

A maioria dos textos na literatura de testes de hipóteses trata de critérios de otimalidade para um determinado problema de decisão. No entanto, existem, em menor quantidade, alguns textos sobre os problemas de se realizar testes de hipóteses simultâneos e sobre a concordância lógica de suas soluções ótimas. Algo que se espera de testes de hipóteses simultâneos e que, se uma hipótese H1 implica uma hipótese H0, então é desejável que a rejeição da hipótese H0 necessariamente implique na rejeição da hipótese H1, para uma mesma amostra observada. Essa propriedade é chamada aqui de monotonicidade. A fim de estudar essa propriedade sob um ponto de vista mais geral, neste trabalho é definida a nocão de classe de testes de hipóteses, que estende a funcão de teste para uma sigma-álgebra de possíveis hipóteses nulas, e introduzida uma definição de monotonicidade. Também é mostrado, por meio de alguns exemplos simples, que, para um nível de signicância fixado, a classe de testes Razão de Verossimilhanças Generalizada (RVG) não apresenta monotonicidade, ao contrário de testes formulados sob a perspectiva bayesiana, como o teste de Bayes baseado em probabilidades a posteriori, o teste de Lindley e o FBST. Porém...

Link permanente para citações:

## Inferência estatística para regressão múltipla h-splines; Statistical inference for h-splines multiple regression

Fonte: Biblioteca Digital da Unicamp
Publicador: Biblioteca Digital da Unicamp

Tipo: Tese de Doutorado
Formato: application/pdf

Publicado em 14/04/2014
Português

Relevância na Pesquisa

46.14%

#Modelos aditivos generalizados#Spline#Teoria do#Métodos MCMC#Testes de hipóteses estatísticas#Análise de regressão#Generalized additive models#Spline theory#MCMC methods#Statistical hypothesis testing#Regression analysis

Este trabalho aborda dois problemas de inferência relacionados à regressão múltipla não paramétrica: a estimação em modelos aditivos usando um método não paramétrico e o teste de hipóteses para igualdade de curvas ajustadas a partir do modelo. Na etapa de estimação é construída uma generalização dos métodos h-splines, tanto no contexto sequencial adaptativo proposto por Dias (1999), quanto no contexto bayesiano proposto por Dias e Gamerman (2002). Os métodos h-splines fornecem uma escolha automática do número de bases utilizada na estimação do modelo. Estudos de simulação mostram que os resultados obtidos pelos métodos de estimação propostos são superiores aos conseguidos nos pacotes gamlss, mgcv e DPpackage em R. São criados dois testes de hipóteses para testar H0 : f = f0. Um teste de hipóteses que tem sua regra de decisão baseada na distância quadrática integrada entre duas curvas, referente à abordagem sequencial adaptativa, e outro baseado na medida de evidência bayesiana proposta por Pereira e Stern (1999). No teste de hipóteses bayesiano o desempenho da medida de evidência é observado em vários cenários de simulação. A medida proposta apresentou um comportamento que condiz com uma medida de evidência favorável à hipótese H0. No teste baseado na distância entre curvas...

Link permanente para citações:

## Testing for bimodality in frequency distributions of data suggesting polymorphisms of drug metabolism--hypothesis testing.

Fonte: PubMed
Publicador: PubMed

Tipo: Artigo de Revista Científica

Publicado em /12/1989
Português

Relevância na Pesquisa

46.12%

1. The theory of methods of hypothesis testing in relation to the detection of bimodality in density distributions is discussed. 2. Practical problems arising from these methods are outlined. 3. The power of three methods of hypothesis testing was compared using simulated data from bimodal distributions with varying separation between components. None of the methods could determine bimodality until the separation between components was 2 standard deviation units and could only do so reliably (greater than 90%) when the separation was as great as 4-6 standard deviation units. 4. The robustness of a parametric and a non-parametric method of hypothesis testing was compared using simulated unimodal distributions known to deviate markedly from normality. Both methods had a high frequency of falsely indicating bimodality with distributions where the components had markedly differing variances. 5. A further test of robustness using power transformation of data from a normal distribution showed that the algorithms could accurately determine unimodality only when the skew of the distribution was in the range 0-1.45.

Link permanente para citações:

## Resampling methods for improved wavelet-based multiple hypothesis testing of parametric maps in functional MRI

Fonte: PubMed
Publicador: PubMed

Tipo: Artigo de Revista Científica

Português

Relevância na Pesquisa

46.12%

Two- or three-dimensional wavelet transforms have been considered as a basis for multiple hypothesis testing of parametric maps derived from functional magnetic resonance imaging (fMRI) experiments. Most of the previous approaches have assumed that the noise variance is equally distributed across levels of the transform. Here we show that this assumption is unrealistic; fMRI parameter maps typically have more similarity to a 1/f-type spatial covariance with greater variance in 2D wavelxet coefficients representing lower spatial frequencies, or coarser spatial features, in the maps. To address this issue we resample the fMRI time series data in the wavelet domain (using a 1D discrete wavelet transform [DWT]) to produce a set of permuted parametric maps that are decomposed (using a 2D DWT) to estimate level-specific variances of the 2D wavelet coefficients under the null hypothesis. These resampling-based estimates of the “wavelet variance spectrum” are substituted in a Bayesian bivariate shrinkage operator to denoise the observed 2D wavelet coefficients, which are then inverted to reconstitute the observed, denoised map in the spatial domain. Multiple hypothesis testing controlling the false discovery rate in the observed, denoised maps then proceeds in the spatial domain...

Link permanente para citações:

## A Computationally Efficient Hypothesis Testing Method for Epistasis Analysis using Multifactor Dimensionality Reduction

Fonte: PubMed
Publicador: PubMed

Tipo: Artigo de Revista Científica

Publicado em /01/2009
Português

Relevância na Pesquisa

46.13%

Multifactor dimensionality reduction (MDR) was developed as a nonparametric and model-free data mining method for detecting, characterizing, and interpreting epistasis in the absence of significant main effects in genetic and epidemiologic studies of complex traits such as disease susceptibility. The goal of MDR is to change the representation of the data using a constructive induction algorithm to make nonadditive interactions easier to detect using any classification method such as naïve Bayes or logistic regression. Traditionally, MDR constructed variables have been evaluated with a naïve Bayes classifier that is combined with 10-fold cross validation to obtain an estimate of predictive accuracy or generalizability of epistasis models. Traditionally, we have used permutation testing to statistically evaluate the significance of models obtained through MDR. The advantage of permutation testing is that it controls for false-positives due to multiple testing. The disadvantage is that permutation testing is computationally expensive. This is in an important issue that arises in the context of detecting epistasis on a genome-wide scale. The goal of the present study was to develop and evaluate several alternatives to large-scale permutation testing for assessing the statistical significance of MDR models. Using data simulated from 70 different epistasis models...

Link permanente para citações:

## P Value and the Theory of Hypothesis Testing: An Explanation for New Researchers

Fonte: Springer-Verlag
Publicador: Springer-Verlag

Tipo: Artigo de Revista Científica

Português

Relevância na Pesquisa

46.2%

In the 1920s, Ronald Fisher developed the theory behind the p value and Jerzy Neyman and Egon Pearson developed the theory of hypothesis testing. These distinct theories have provided researchers important quantitative tools to confirm or refute their hypotheses. The p value is the probability to obtain an effect equal to or more extreme than the one observed presuming the null hypothesis of no effect is true; it gives researchers a measure of the strength of evidence against the null hypothesis. As commonly used, investigators will select a threshold p value below which they will reject the null hypothesis. The theory of hypothesis testing allows researchers to reject a null hypothesis in favor of an alternative hypothesis of some effect. As commonly used, investigators choose Type I error (rejecting the null hypothesis when it is true) and Type II error (accepting the null hypothesis when it is false) levels and determine some critical region. If the test statistic falls into that critical region, the null hypothesis is rejected in favor of the alternative hypothesis. Despite similarities between the two, the p value and the theory of hypothesis testing are different theories that often are misunderstood and confused, leading researchers to improper conclusions. Perhaps the most common misconception is to consider the p value as the probability that the null hypothesis is true rather than the probability of obtaining the difference observed...

Link permanente para citações:

## A critique of statistical hypothesis testing in clinical research

Fonte: Medknow Publications Pvt Ltd
Publicador: Medknow Publications Pvt Ltd

Tipo: Artigo de Revista Científica

Publicado em //2011
Português

Relevância na Pesquisa

46.17%

Many have documented the difficulty of using the current paradigm of Randomized Controlled Trials (RCTs) to test and validate the effectiveness of alternative medical systems such as Ayurveda. This paper critiques the applicability of RCTs for all clinical knowledge-seeking endeavors, of which Ayurveda research is a part. This is done by examining statistical hypothesis testing, the underlying foundation of RCTs, from a practical and philosophical perspective. In the philosophical critique, the two main worldviews of probability are that of the Bayesian and the frequentist. The frequentist worldview is a special case of the Bayesian worldview requiring the unrealistic assumptions of knowing nothing about the universe and believing that all observations are unrelated to each other. Many have claimed that the first belief is necessary for science, and this claim is debunked by comparing variations in learning with different prior beliefs. Moving beyond the Bayesian and frequentist worldviews, the notion of hypothesis testing itself is challenged on the grounds that a hypothesis is an unclear distinction, and assigning a probability on an unclear distinction is an exercise that does not lead to clarity of action. This critique is of the theory itself and not any particular application of statistical hypothesis testing. A decision-making frame is proposed as a way of both addressing this critique and transcending ideological debates on probability. An example of a Bayesian decision-making approach is shown as an alternative to statistical hypothesis testing...

Link permanente para citações:

## Predictive hypothesis identification

Fonte: Universidade Nacional da Austrália
Publicador: Universidade Nacional da Austrália

Tipo: Conference paper

Português

Relevância na Pesquisa

56.05%

#parameter estimation#hypothesis testing#model selection#predictive inference#composite hypotheses#MAP versus ML#moment fitting#Bayesian statistics

While statistics focusses on hypothesis testing and on estimating (properties
of) the true sampling distribution, in machine learning the performance of
learning algorithms on future data is the primary issue. In this paper we bridge
the gap with a general principle (PHI) that identifies hypotheses with best
predictive performance. This includes predictive point and interval estimation,
simple and composite hypothesis testing, (mixture) model selection, and
others as special cases. For concrete instantiations we will recover well-known
methods, variations thereof, and new ones. PHI nicely justifies, reconciles,
and blends (a reparametrization invariant variation of) MAP, ML, MDL, and
moment estimation. One particular feature of PHI is that it can genuinely
deal with nested hypotheses.

Link permanente para citações:

## Adaptive information source selection during hypothesis testing

Fonte: Cognitive Science Society
Publicador: Cognitive Science Society

Tipo: Conference paper

Publicado em //2014
Português

Relevância na Pesquisa

66.08%

We consider how the information sources people use to test hypotheses change as the sparsity of the hypotheses – the proportion of items in the hypothesis space they include – changes. Specifically, we focus on understanding how requests for positive and negative evidence, which have been shown to be sensitive to hypothesis sparsity (Hendrickson, Navarro, & Perfors, in prep), are influenced by requests for specific instances, which show a positive bias and less sensitivity to sparsity (Markant & Gureckis, 2013). We find that people modify their information requests as a function of the sparsity of the hypotheses and they do so in this task primarily by by manipulating the rate of requesting positive and negative evidence. Furthermore, by simulating the set of possible remaining hypotheses, we find that people were most likely to select the information source that maximized the expected reduction in uncertainty across hypotheses. We conclude by discussing the implications of these results for models of hypothesis testing.; Andrew T. Hendrickson, Amy F. Perfors, Daniel J. Navarro

Link permanente para citações:

## Hypothesis Testing and Statistical Power of a Test

Fonte: Universidade de Indiana
Publicador: Universidade de Indiana

Português

Relevância na Pesquisa

66.02%

How powerful is my study (test)? How many observations do I need to have for what I want to get from the study? You may want to know statistical power of a test to detect a meaningful effect, given sample size, test size (significance level), and standardized effect size. You may also want to determine the minimum sample size required to get a significant result, given statistical power, test size, and standardized effect size. These analyses examine the sensitivity of statistical power and sample size to other components, enabling researchers to efficiently use research resources. This document summarizes basics of hypothesis testing and statistic power analysis, and then illustrates how to do using SAS 9, Stata 10, G*Power 3.

Link permanente para citações:

## Bootstraping the general linear hypothesis test

Fonte: Universidade Carlos III de Madrid
Publicador: Universidade Carlos III de Madrid

Tipo: Trabalho em Andamento
Formato: application/pdf

Publicado em /02/1993
Português

Relevância na Pesquisa

56.01%

#Bootstrap#F-test#General linear hypothesis#Hypothesis testing#Linear model#One-way model#Resampling#Estadística

We discuss the use of bootstrap methodology in hypothesis testing, focusing on the classical F-test for linear hypotheses in the linear model. A modification of the F-statistics which allows for resampling under the null hypothesis is proposed. This approach is specifically considered in the one-way analysis of variance model. A simulation study illustrating the behaviour of our proposal is presented.

Link permanente para citações:

## Multiple hypothesis testing and clustering with mixtures of non-central t-distributions applied in microarray data analysis

Fonte: Universidade Carlos III de Madrid
Publicador: Universidade Carlos III de Madrid

Tipo: Trabalho em Andamento
Formato: application/octet-stream; application/octet-stream; application/pdf

Publicado em /11/2010
Português

Relevância na Pesquisa

55.95%

#Clustering#MCMC computation#Microarray analysis#Mixture distributions#Multiple hypothesis testing#Non-central t-distribution#Estadística

Multiple testing analysis, based on clustering methodologies, is usually applied in Microarray Data Analysis for comparisons between pair of groups. In this paper, we generalize this methodology to deal with multiple comparisons among more than two groups obtained from microarray expressions of genes. Assuming normal data, we define a statistic which depends on sample means and sample variances, distributed as a non-central t-distribution. As we consider multiple comparisons among groups, a mixture of non-central t-distributions is derived. The estimation of the components of mixtures is obtained via a Bayesian approach, and the model is applied in a multiple comparison problem from a microarray experiment obtained from gorilla, bonobo and human cultured fibroblasts.

Link permanente para citações:

## AUDITOR MENTAL REPRESENTATIONS AND HYPOTHESIS TESTING OF THE CONTROL ENVIRONMENT

Fonte: Quens University
Publicador: Quens University

Tipo: Tese de Doutorado

Português

Relevância na Pesquisa

66.18%

In this thesis, I examine how auditors construct their mental representations and test their hypotheses about the strength of a client’s control environment. With regard to the former, I hypothesize that management’s frame of the control system and auditor’s retrieval of control environment information from memory may influence the auditor’s control environment mental representation and impact subsequent audit judgments. Consistent with my theoretical predictions, I find that retrieval of control environment information from memory biases an auditor’s mental representation, and that this biased mental representation impacts subsequent fraud assessment. In addition, there is limited evidence to support the conjecture that auditors may be susceptible to management’s framing of the internal control system resulting in relatively positive control environment evaluations which was found to transfer to some subsequent audit judgments. With regard to the latter, prior audit literature has examined how auditors evaluate person specific characteristics, such as competence, of other auditors, however there has been no research that has examined how auditors test such characteristics of client management. I disentangle whether auditors utilize a diagnostic and/or a conservative hypothesis testing strategy when testing client management’s ethicality and competence as these are fundamental components of the client’s control environment. A diagnostic testing strategy is evidenced by the auditor searching for the most informative information...

Link permanente para citações:

## "Testes de hipótese e critério bayesiano de seleção de modelos para séries temporais com raiz unitária" ; "Hypothesis testing and bayesian model selection for time series with a unit root"

Fonte: Biblioteca Digitais de Teses e Dissertações da USP
Publicador: Biblioteca Digitais de Teses e Dissertações da USP

Tipo: Dissertação de Mestrado
Formato: application/pdf

Publicado em 23/06/2004
Português

Relevância na Pesquisa

56.15%

#bayesian inference#Brownian motion#hypothesis testing#importance sampling#inferência bayesiana#MCMC#MCMC#movimento Browniano#priori de Jeffreys#raiz unitária#séries temporais

A literatura referente a testes de hipótese em modelos auto-regressivos que apresentam uma possível raiz unitária é bastante vasta e engloba pesquisas oriundas de diversas áreas. Nesta dissertação, inicialmente, buscou-se realizar uma revisão dos principais resultados existentes, oriundos tanto da visão clássica quanto da bayesiana de inferência. No que concerne ao ferramental clássico, o papel do movimento browniano foi apresentado de forma detalhada, buscando-se enfatizar a sua aplicabilidade na dedução de estatísticas assintóticas para a realização dos testes de hipótese relativos à presença de uma raíz unitária. Com relação à inferência bayesiana, foi inicialmente conduzido um exame detalhado do status corrente da literatura. A seguir, foi realizado um estudo comparativo em que se testa a hipótese de raiz unitária com base na probabilidade da densidade a posteriori do parâmetro do modelo, considerando as seguintes densidades a priori: Flat, Jeffreys, Normal e Beta. A inferência foi realizada com base no algoritmo Metropolis-Hastings, usando a técnica de simulação de Monte Carlo por Cadeias de Markov (MCMC). Poder, tamanho e confiança dos testes apresentados foram computados com o uso de séries simuladas. Finalmente...

Link permanente para citações:

## Bayesian Hypothesis Testing for Sparse Representation

Fonte: Universidade Cornell
Publicador: Universidade Cornell

Tipo: Artigo de Revista Científica

Publicado em 21/08/2010
Português

Relevância na Pesquisa

46.15%

In this paper, we propose a Bayesian Hypothesis Testing Algorithm (BHTA) for
sparse representation. It uses the Bayesian framework to determine active atoms
in sparse representation of a signal.
The Bayesian hypothesis testing based on three assumptions, determines the
active atoms from the correlations and leads to the activity measure as
proposed in Iterative Detection Estimation (IDE) algorithm. In fact, IDE uses
an arbitrary decreasing sequence of thresholds while the proposed algorithm is
based on a sequence which derived from hypothesis testing. So, Bayesian
hypothesis testing framework leads to an improved version of the IDE algorithm.
The simulations show that Hard-version of our suggested algorithm achieves
one of the best results in terms of estimation accuracy among the algorithms
which have been implemented in our simulations, while it has the greatest
complexity in terms of simulation time.

Link permanente para citações:

## A quantum linguistic characterization of the reverse relation between confidence interval and hypothesis testing

Fonte: Universidade Cornell
Publicador: Universidade Cornell

Tipo: Artigo de Revista Científica

Português

Relevância na Pesquisa

46.15%

Although there are many ideas for the formulations of statistical hypothesis
testing, we consider that the likelihood ratio test is the most reasonable and
orthodox. However, it is not handy, and thus, it is not usual in elementary
books. That is, the statistical hypothesis testing written in elementary books
is different from the likelihood ratio test. Thus, from the theoretical point
of view, we have the following question: "What is the statistical hypothesis
testing written in elementary books?" For example, we consider that even the
difference between "one sided test" and "two sided test" is not clear yet. In
this paper, we give an answer to this question. That is, we propose a new
formulation of statistical hypothesis testing, which is contrary to the
confidence interval methods. In other words, they are two sides of the same
coin. This will be done in quantum language (or, measurement theory), which is
characterized as the linguistic turn of the Copenhagen interpretation of
quantum mechanics, and also, a kind of system theory such that it is applicable
to both classical and quantum systems. Since quantum language is suited for
theoretical arguments, we believe that our results are essentially final as a
general theory.; Comment: arXiv admin note: substantial text overlap with arXiv:1312.6757

Link permanente para citações:

## Unrealistically Optimistic Consumers: a Selective Hypothesis Testing Account for Optimism in Predictions of Future Behavior

Fonte: Universidade Duke
Publicador: Universidade Duke

Tipo: Dissertação
Formato: 473692 bytes; application/pdf

Publicado em 21/04/2008
Português

Relevância na Pesquisa

46.19%

Individuals tend to make unrealistically optimistic self assessments about themselves and their future behavior. While little studied in marketing, unrealistic optimism by consumers may have negative consequences for both marketers and consumers. This dissertation proposes and explores a selective hypothesis testing view of unrealistic optimism. Specifically, I propose that consumers adopt the tentative hypothesis that they will behave in an ideal fashion when predicting their future behavior. They then selectively test this hypothesis by accessing information consistent with it, with the ultimate consequence being unrealistically optimistic predictions of future behavior.
To validate this theory I use the following experimental paradigm. I have individuals first provide an idealized estimate for the behavior of interest (e.g., In an ideal world, how often would you exercise next week?) and then provide a second estimate (e.g., How often will you exercise next week?). The idea here is that by making the idealized nature of the ideal behavior salient consumers will be less likely to test a hypothesis of ideal behavior when subsequently providing an estimate. In a series of ten studies, I find that prior consideration of idealistic performance does indeed temper optimism in subsequent self-assessments (henceforth post-ideal estimates). Specifically...

Link permanente para citações:

## Within-document term-based index pruning techniques with statistical hypothesis testing

Fonte: University of Delaware
Publicador: University of Delaware

Tipo: Tese de Doutorado

Português

Relevância na Pesquisa

66.02%

#Statistical hypothesis testing#Computer network resources -- Abstracting and indexing#Information retrieval

Carterette, Ben; Static index pruning methods have been proposed to reduce the index size of information retrieval systems while retaining the effectiveness of the search. Document-centric static index pruning methods provide smaller indexes and faster query times by dropping some within-document term information from inverted lists. We present a method for pruning inverted lists derived from the formulation of unigram language models for retrieval. This method is based on the statistical significance of term frequency ratios. Using the two-sample two-proportion (2P2N) test, the frequency of occurrence of a word within a given document is statistically compared to the frequency of its occurrence in the collection to decide whether to prune it. Experimental results show that this technique can be used to significantly decrease the size of the index and querying time with less compromise to retrieval effectiveness than similar heuristic methods. We also implemented static index pruning algorithm that uses the retrievability of the documents decide whether to remove or keep them in the index, along with the statistical hypothesis testing method. The retrievability is calculated using the document entropy which is in turn calculated using the entropies of each of the terms in the document. It is observed from the experimental results that the performance of the retrieval system is improved by this hybrid algorithm. Furthermore...

Link permanente para citações: