Página 1 dos resultados de 4019 itens digitais encontrados em 0.016 segundos

Cross-validation for the selection of spectral variables using the successive projections algorithm

Galvão,Roberto Kawakami Harrop; Araújo,Mário César Ugulino; Silva,Edvan Cirino; José,Gledson Emidio; Soares,Sófacles Figueredo Carreiro; Paiva,Henrique Mohallem
Fonte: Sociedade Brasileira de Química Publicador: Sociedade Brasileira de Química
Tipo: Artigo de Revista Científica Formato: text/html
Publicado em 01/01/2007 Português
Relevância na Pesquisa
66.1%
This work compares the use of a separate validation set and leave-one-out cross-validation to guide the selection of variables in the Successive Projections Algorithm (SPA) for multivariate calibration. Two case studies involving diesel and corn analysis by NIR spectrometry are presented. A graphical interface for SPA is available at www.ele.ita.br/~kawakami/spa/

Using cross-validation to evaluate predictive accuracy of survival risk classifiers based on high-dimensional data

Simon, Richard M.; Subramanian, Jyothi; Li, Ming-Chung; Menezes, Supriya
Fonte: Oxford University Press Publicador: Oxford University Press
Tipo: Artigo de Revista Científica
Português
Relevância na Pesquisa
46.22%
Developments in whole genome biotechnology have stimulated statistical focus on prediction methods. We review here methodology for classifying patients into survival risk groups and for using cross-validation to evaluate such classifications. Measures of discrimination for survival risk models include separation of survival curves, time-dependent ROC curves and Harrell’s concordance index. For high-dimensional data applications, however, computing these measures as re-substitution statistics on the same data used for model development results in highly biased estimates. Most developments in methodology for survival risk modeling with high-dimensional data have utilized separate test data sets for model evaluation. Cross-validation has sometimes been used for optimization of tuning parameters. In many applications, however, the data available are too limited for effective division into training and test sets and consequently authors have often either reported re-substitution statistics or analyzed their data using binary classification methods in order to utilize familiar cross-validation. In this article we have tried to indicate how to utilize cross-validation for the evaluation of survival risk models; specifically how to compute cross-validated estimates of survival distributions for predicted risk groups and how to compute cross-validated time-dependent ROC curves. We have also discussed evaluation of the statistical significance of a survival risk model and evaluation of whether high-dimensional genomic data adds predictive accuracy to a model based on standard covariates alone.

Empirical Performance of Cross-Validation With Oracle Methods in a Genomics Context

Martinez, Josue G.; Carroll, Raymond J.; Müller, Samuel; Sampson, Joshua N.; Chatterjee, Nilanjan
Fonte: PubMed Publicador: PubMed
Tipo: Artigo de Revista Científica
Publicado em 01/11/2011 Português
Relevância na Pesquisa
46.19%
When employing model selection methods with oracle properties such as the smoothly clipped absolute deviation (SCAD) and the Adaptive Lasso, it is typical to estimate the smoothing parameter by m-fold cross-validation, for example, m = 10. In problems where the true regression function is sparse and the signals large, such cross-validation typically works well. However, in regression modeling of genomic studies involving Single Nucleotide Polymorphisms (SNP), the true regression functions, while thought to be sparse, do not have large signals. We demonstrate empirically that in such problems, the number of selected variables using SCAD and the Adaptive Lasso, with 10-fold cross-validation, is a random variable that has considerable and surprising variation. Similar remarks apply to non-oracle methods such as the Lasso. Our study strongly questions the suitability of performing only a single run of m-fold cross-validation with any oracle method, and not just the SCAD and Adaptive Lasso.

Cross-Validation for Nonlinear Mixed Effects Models

Colby, Emily; Bair, Eric
Fonte: PubMed Publicador: PubMed
Tipo: Artigo de Revista Científica
Português
Relevância na Pesquisa
46.23%
Cross-validation is frequently used for model selection in a variety of applications. However, it is difficult to apply cross-validation to mixed effects models (including nonlinear mixed effects models or NLME models) due to the fact that cross-validation requires “out-of-sample” predictions of the outcome variable, which cannot be easily calculated when random effects are present. We describe two novel variants of cross-validation that can be applied to nonlinear mixed effects models. One variant, where out-of-sample predictions are based on post hoc estimates of the random effects, can be used to select the overall structural model. Another variant, where cross-validation seeks to minimize the estimated random effects rather than the estimated residuals, can be used to select covariates to include in the model. We show that these methods produce accurate results in a variety of simulated data sets and apply them to two publicly available population pharmacokinetic data sets.

Cross-validation in cryo-EM–based structural modeling

Falkner, Benjamin; Schröder, Gunnar F.
Fonte: National Academy of Sciences Publicador: National Academy of Sciences
Tipo: Artigo de Revista Científica
Português
Relevância na Pesquisa
46.23%
Single-particle cryo-EM is a powerful approach to determine the structure of large macromolecules and assemblies thereof in many cases at subnanometer resolution. It has become popular to refine or flexibly fit atomic models into density maps derived from cryo-EM experiments. These density maps are typically significantly lower in resolution than electron density maps obtained from X-ray diffraction experiments, such that the number of parameters that need to be determined is much larger than the number of experimental observables. Overfitting and misinterpretation of the density, thus, become a serious problem. For diffraction data, a cross-validation approach was introduced almost 20 y ago; however, no such approach has been described yet for structure refinement against cryo-EM density maps, although the overfitting problem is, because of the lower resolution, significantly larger. We present a cross-validation approach for real-space refinement against cryo-EM density maps in analogy to cross-validation typically used in crystallography. Our approach is able to detect overfitting and allows for optimizing the choice of restraints used in the refinement. The approach is shown on three protein structures with simulated data and experimental data of the rotavirus double-layer particle. Because cross-validation requires splitting the dataset into at least two independent sets...

Cross-validation in association mapping and its relevance for the estimation of QTL parameters of complex traits

Würschum, T; Kraft, T
Fonte: Nature Publishing Group Publicador: Nature Publishing Group
Tipo: Artigo de Revista Científica
Português
Relevância na Pesquisa
46.21%
Association mapping has become a widely applied genomic approach to identify quantitative trait loci (QTL) and dissect the genetic architecture of complex traits. However, approaches to assess the quality of the obtained QTL results are lacking. We therefore evaluated the potential of cross-validation in association mapping based on a large sugar beet data set. Our results show that the proportion of the population that should be used as estimation and validation sets, respectively, depends on the size of the mapping population. Generally, a fivefold cross-validation, that is, 20% of the lines as independent validation set, appears appropriate for commonly used population sizes. The predictive power for the proportion of genotypic variance explained by QTL was overestimated by on average 38% indicating a strong bias in the estimated QTL effects. The cross-validated predictive power ranged between 4 and 50%, which are more realistic estimates of this parameter for complex traits. In addition, QTL frequency distributions can be used to assess the precision of QTL position estimates and the robustness of the detected QTL. In summary, cross-validation can be a valuable tool to assess the quality of QTL parameters in association mapping.

Prediction of Maize Single Cross Hybrids Using the Total Effects of Associated Markers Approach Assessed by Cross-Validation and Regional Trials

Melo, Wagner Mateus Costa; Pinho, Renzo Garcia Von; Balestre, Marcio
Fonte: Hindawi Publishing Corporation Publicador: Hindawi Publishing Corporation
Tipo: Artigo de Revista Científica
Português
Relevância na Pesquisa
46.21%
The present study aimed to predict the performance of maize hybrids and assess whether the total effects of associated markers (TEAM) method can correctly predict hybrids using cross-validation and regional trials. The training was performed in 7 locations of Southern Brazil during the 2010/11 harvest. The regional assays were conducted in 6 different South Brazilian locations during the 2011/12 harvest. In the training trial, 51 lines from different backgrounds were used to create 58 single cross hybrids. Seventy-nine microsatellite markers were used to genotype these 51 lines. In the cross-validation method the predictive accuracy ranged from 0.10 to 0.96, depending on the sample size. Furthermore, the accuracy was 0.30 when the values of hybrids that were not used in the training population (119) were predicted for the regional assays. Regarding selective loss, the TEAM method correctly predicted 50% of the hybrids selected in the regional assays. There was also loss in only 33% of cases; that is, only 33% of the materials predicted to be good in training trial were considered to be bad in regional assays. Our results show that the predictive validation of different crop conditions is possible, and the cross-validation results strikingly represented the field performance.

Local Cross-validation for Spectrum Bandwidth Choice

Velasco, Carlos
Fonte: Blackwell Publicador: Blackwell
Tipo: Artigo de Revista Científica Formato: application/pdf
Publicado em /05/2000 Português
Relevância na Pesquisa
66.05%
We investigate an automatic method of determining a local bandwidth for non-parametric kernel spectral density estimates at a single frequency. This procedure is a modification of a cross-validation technique for global bandwidth choices, avoiding the computation of any pilot estimate based on initial bandwidths or on approximate parametric models. Only local conditions on the spectral density around the frequency of interest are assumed. We illustrate with a Monte Carlo study the performance in finite samples of the bandwidth estimates proposed.

Local cross validation for spectrum bandwidth choice

Velasco, Carlos
Fonte: Universidade Carlos III de Madrid Publicador: Universidade Carlos III de Madrid
Tipo: Trabalho em Andamento Formato: application/pdf
Publicado em /02/1998 Português
Relevância na Pesquisa
66.05%
We investigate an automatic method of determining a local bandwidth for nonparametric kernel spectral density estimates at a single frequency. This procedure is a modification of a cross-validation tecnique for global bandwidth choices, avoiding the computation of any pilot estimate based on initial bandwidths or on approximate parametric models. Only local conditions on the spectral density around the frequency of interest are assumed. We illustrate with a Monte CarIo study the performance in finite samples of the bandwidth estimates proposed.

Concentration inequalities of the cross-validation estimator for Empirical Risk Minimiser

Cornec, Matthieu
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 30/10/2010 Português
Relevância na Pesquisa
46.34%
In this article, we derive concentration inequalities for the cross-validation estimate of the generalization error for empirical risk minimizers. In the general setting, we prove sanity-check bounds in the spirit of \cite{KR99} \textquotedblleft\textit{bounds showing that the worst-case error of this estimate is not much worse that of training error estimate} \textquotedblright . General loss functions and class of predictors with finite VC-dimension are considered. We closely follow the formalism introduced by \cite{DUD03} to cover a large variety of cross-validation procedures including leave-one-out cross-validation, $k$% -fold cross-validation, hold-out cross-validation (or split sample), and the leave-$\upsilon$-out cross-validation. In particular, we focus on proving the consistency of the various cross-validation procedures. We point out the interest of each cross-validation procedure in terms of rate of convergence. An estimation curve with transition phases depending on the cross-validation procedure and not only on the percentage of observations in the test sample gives a simple rule on how to choose the cross-validation. An interesting consequence is that the size of the test sample is not required to grow to infinity for the consistency of the cross-validation procedure.; Comment: 24 pages...

Choice of V for V-Fold Cross-Validation in Least-Squares Density Estimation

Arlot, Sylvain; Lerasle, Matthieu
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Português
Relevância na Pesquisa
46.23%
This paper studies V-fold cross-validation for model selection in least-squares density estimation. The goal is to provide theoretical grounds for choosing V in order to minimize the least-squares loss of the selected estimator. We first prove a non-asymptotic oracle inequality for V-fold cross-validation and its bias-corrected version (V-fold penalization). In particular, this result implies that V-fold penalization is asymptotically optimal in the nonparametric case. Then, we compute the variance of V-fold cross-validation and related criteria, as well as the variance of key quantities for model selection performance. We show that these variances depend on V like 1+4/(V-1), at least in some particular cases, suggesting that the performance increases much from V=2 to V=5 or 10, and then is almost constant. Overall, this can explain the common advice to take V=5---at least in our setting and when the computational power is limited---, as supported by some simulation experiments. An oracle inequality and exact formulas for the variance are also proved for Monte-Carlo cross-validation, also known as repeated cross-validation, where the parameter V is replaced by the number B of random splits of the data.

Cross-Validation for Nonlinear Mixed Effects Models

Colby, Emily; Bair, Eric
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 09/04/2013 Português
Relevância na Pesquisa
46.23%
Cross-validation is frequently used for model selection in a variety of applications. However, it is difficult to apply cross-validation to mixed effects models (including nonlinear mixed effects models or NLME models) due to the fact that cross-validation requires "out-of-sample" predictions of the outcome variable, which cannot be easily calculated when random effects are present. We describe two novel variants of cross-validation that can be applied to nonlinear mixed effects models. One variant, where out-of-sample predictions are based on post hoc estimates of the random effects, can be used to select the overall structural model. Another variant, where cross-validation seeks to minimize the estimated random effects rather than the estimated residuals, can be used to select covariates to include in the model. We show that these methods produce accurate results in a variety of simulated data sets and apply them to two publicly available population pharmacokinetic data sets.; Comment: 38 pages, 15 figures To be published in the Journal of Pharmacokinetics and Pharmacodynamics

Concentration inequalities of the cross-validation estimate for stable predictors

Cornec, Matthieu
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 23/11/2010 Português
Relevância na Pesquisa
46.32%
In this article, we derive concentration inequalities for the cross-validation estimate of the generalization error for stable predictors in the context of risk assessment. The notion of stability has been first introduced by \cite{DEWA79} and extended by \cite{KEA95}, \cite{BE01} and \cite{KUNIY02} to characterize class of predictors with infinite VC dimension. In particular, this covers $k$-nearest neighbors rules, bayesian algorithm (\cite{KEA95}), boosting,... General loss functions and class of predictors are considered. We use the formalism introduced by \cite{DUD03} to cover a large variety of cross-validation procedures including leave-one-out cross-validation, $k$-fold cross-validation, hold-out cross-validation (or split sample), and the leave-$\upsilon$-out cross-validation. In particular, we give a simple rule on how to choose the cross-validation, depending on the stability of the class of predictors. In the special case of uniform stability, an interesting consequence is that the number of elements in the test set is not required to grow to infinity for the consistency of the cross-validation procedure. In this special case, the particular interest of leave-one-out cross-validation is emphasized.

Estimating Subagging by cross-validation

CORNEC, Matthieu
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 23/11/2010 Português
Relevância na Pesquisa
46.26%
In this article, we derive concentration inequalities for the cross-validation estimate of the generalization error for subagged estimators, both for classification and regressor. General loss functions and class of predictors with both finite and infinite VC-dimension are considered. We slightly generalize the formalism introduced by \cite{DUD03} to cover a large variety of cross-validation procedures including leave-one-out cross-validation, $k$-fold cross-validation, hold-out cross-validation (or split sample), and the leave-$\upsilon$-out cross-validation. \bigskip \noindent An interesting consequence is that the probability upper bound is bounded by the minimum of a Hoeffding-type bound and a Vapnik-type bounds, and thus is smaller than 1 even for small learning set. Finally, we give a simple rule on how to subbag the predictor. \bigskip

Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory

Watanabe, Sumio
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Português
Relevância na Pesquisa
46.28%
In regular statistical models, the leave-one-out cross-validation is asymptotically equivalent to the Akaike information criterion. However, since many learning machines are singular statistical models, the asymptotic behavior of the cross-validation remains unknown. In previous studies, we established the singular learning theory and proposed a widely applicable information criterion, the expectation value of which is asymptotically equal to the average Bayes generalization loss. In the present paper, we theoretically compare the Bayes cross-validation loss and the widely applicable information criterion and prove two theorems. First, the Bayes cross-validation loss is asymptotically equivalent to the widely applicable information criterion as a random variable. Therefore, model selection and hyperparameter optimization using these two values are asymptotically equivalent. Second, the sum of the Bayes generalization error and the Bayes cross-validation error is asymptotically equal to $2\lambda/n$, where $\lambda$ is the real log canonical threshold and $n$ is the number of training samples. Therefore the relation between the cross-validation error and the generalization error is determined by the algebraic geometrical structure of a learning machine. We also clarify that the deviance information criteria are different from the Bayes cross-validation and the widely applicable information criterion.

A computationally fast alternative to cross-validation in penalized Gaussian graphical models

Vujacic, Ivan; Abbruzzo, Antonino; Wit, Ernst
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Português
Relevância na Pesquisa
46.23%
We study the problem of selection of regularization parameter in penalized Gaussian graphical models. When the goal is to obtain the model with good predicting power, cross validation is the gold standard. We present a new estimator of Kullback-Leibler loss in Gaussian Graphical model which provides a computationally fast alternative to cross-validation. The estimator is obtained by approximating leave-one-out-cross validation. Our approach is demonstrated on simulated data sets for various types of graphs. The proposed formula exhibits superior performance, especially in the typical small sample size scenario, compared to other available alternatives to cross validation, such as Akaike's information criterion and Generalized approximate cross validation. We also show that the estimator can be used to improve the performance of the BIC when the sample size is small.; Comment: 16 pages, 5 figures

Cross-validation for choosing resolution level for nonlinear wavelet curve estimators

Hall, Peter; Penev, S
Fonte: Chapman & Hall Publicador: Chapman & Hall
Tipo: Artigo de Revista Científica
Português
Relevância na Pesquisa
66.23%
We show that unless the target density is particularly smooth, cross-validation applied directly to nonlinear wavelet estimators produces an empirical value of primary resolution which fails, by an order of magnitude, to give asymptotic optimality. We note, too, that in the same setting, but for different reasons, cross-validation of the linear component of a wavelet estimator fails to give asymptotic optimality, if the primary resolution level that it suggests is applied to the nonlinear form of the estimator. We propose an alternative technique, based on multiple cross-validation of the linear component. Our method involves dividing the region of interest into a number of subregions, choosing a resolution level by cross-validation of the linear part of the estimator in each subregion, and taking the final empirically chosen level to be the minimum of the subregion values. This approach exploits the relative resistance of wavelet methods to over-smoothing: the final resolution level is too small in some parts of the main region, but that has a relatively minor effect on performance of the final estimator. The fact that we use the same resolution level throughout the region, rather than a different level in each subregion, means that we do not need to splice together different estimates and remove artificial jumps where the subregions abut.

On Gauss quadrature and partial cross validation

Kozek, A; Yin, Jiying
Fonte: Elsevier Publicador: Elsevier
Tipo: Artigo de Revista Científica
Português
Relevância na Pesquisa
66.14%
New estimators of expected values Ew(X) of functions of a random variable X are introduced. The new estimators are based on Gauss quadrature, a numerical method frequently used to approximate integrals over finite intervals. The estimators need a small number of numerical evaluations and hence are useful in partial cross validation (PCV) a numerical method for finding optimal smoothing parameters in nonparametric curve estimation. The PCV can considerably reduce the computational cost of the generalized cross validation method typically used to determine the optimal smoothing parameter.

Body fat in judokas: cross-validation of lohman’s equation; Gordura corporal em judocas: validação cruzada da equação de Lohman

Glaner, Maria Fatima Glaner MFG; Universidade Católica de Brasília; Brito, Ciro José; Universidade Federal de Viçosa
Fonte: Universidade Federal de Santa Catarina. Florianópolis, SC. Brasil Publicador: Universidade Federal de Santa Catarina. Florianópolis, SC. Brasil
Tipo: info:eu-repo/semantics/article; info:eu-repo/semantics/publishedVersion; "Avaliado por Pares",; Avaliado por Pares; Descritiva Formato: application/pdf; application/pdf
Publicado em 05/09/2007 Português
Relevância na Pesquisa
56.09%
Combat sports are disputed in weight categories. The greater the proportion of lean mass per kilogram of body mass, the greater a fi ghter’s capacity to exert force will be. Therefore, estimating percentage body fat (%F) is of fundamental importance for deciding in which category a fi ghter will compete. Therefore, the objective of this study was to verify the cross-validity of Lohman’s equation (LE) 7 for the estimation of %F in fi ghters. The sample comprised 30 male judokas, resident in the Distrito Federal, Brazil and with a mean age of 25.1±4.5 years, mean body mass of 81.8±12.5 kg and mean height of 176.3±7.1 cm. Hydrostatic weighing (HW) was used as the gold standard for cross-validation. The statistical criteria employed were those proposed by Lohman7 with the addition of residual score analysis.17 Correlation was high (r= 0.80) and signifi cant (p≤0.0005). Both the constant error and the standard error of estimation were less than 3.5%. The %FLE (15.1±4.7) was signifi cantly different (p≤0.0005) from the %FHW (11.9±4.2). Lohman’s equation signifi cantly overestimated the %F. The residual scores demonstrated a lack of agreement between %FLE and %FHW, of up to 8.5%F. This being so, Lohman’s equation does not exhibit cross-validity for this sample of judokas.; Esportes de combate são disputados por categorias de peso. Quanto maior a proporção de massa magra por quilogramas de massa corporal...

PARAMETER SELECTION IN LEAST SQUARES-SUPPORT VECTOR MACHINES REGRESSION ORIENTED, USING GENERALIZED CROSS-VALIDATION

ÁLVAREZ MEZA,ANDRÉS M.; DAZA SANTACOLOMA,GENARO; ACOSTA MEDINA,CARLOS D.; CASTELLANOS DOMÍNGUEZ,GERMÁN
Fonte: DYNA Publicador: DYNA
Tipo: Artigo de Revista Científica Formato: text/html
Publicado em 01/02/2012 Português
Relevância na Pesquisa
66.05%
In this work, a new methodology for automatic selection of the free parameters in the least squares-support vector machines (LS-SVM) regression oriented algorithm is proposed. We employ a multidimensional generalized cross-validation analysis in the linear equation system of LS-SVM. Our approach does not require prior knowledge about the influence of the LS-SVM free parameters in the results. The methodology is tested on two artificial and two real-world data sets. According to the results, our methodology computes suitable regressions with competitive relative errors.