Addition ally, the algorithm employs this parsimonious library of highly accurate pairs to reduce the computational time required for the k TSP algorithm and leave one out cross validation analysis. With these optimizations, the algo rithm is able to fully analyze even large microarray data sets within one day on a standard desktop computer, including cross validation analysis and False Discovery Rate prediction. Combinatoric k TSP Algorithm In an extension of the TSP algorithm, k individual TSP classifiers can be combined into a multi pair k TSP clas sifier. In this approach, the TSP algorithm itself is per formed, and all possible transcript pairs are ranked in order of their classification accuracy. The top k highest ranked TSP pairs for a given classification task each repre sent one vote, with equal weight, for the class of each given sample.
the final predicted class of each sample is the phenotype with the majority of votes. To avoid ties, k is restricted to odd numbers only. for this study the maxi mum value of k was held to 11. For each classification task, a leave one out cross validation loop is employed to determine the optimal value of k. Analysis of Non Overlapping TSP and k TSP Classifiers We employed TSP and k TSP algorithms to determine the degree to which these methods can generate multiple unique gene expression based classifiers. We first deter mined the optimal TSP and k TSP classifiers against the previously mentioned GIST/LMS gene expression data. We then removed the top scoring individual gene pair from the dataset, and repeated the algorithm on this reduced gene expression data.
We iteratively performed this gene pair excision, and recorded TSP and k TSP clas sifier accuracies Carfilzomib at each step. The value of k was held to a maximum of 11, and was determined in each iteration by an internal loop of leave one out cross validation that established the optimal value of k for each classification task. Leave One Out Cross Validation To estimate algorithm performance on novel samples, we performed leave one out cross validation, in which the top scoring pair as determined by N 1 samples is used to predict the left out sample class. This cross val idation is performed iteratively for each of N samples, with the number of correct predictions out of N then aver aged to determine LOOCV accuracy. Cross validation sen sitivity and specificity were also determined. Calculation of False Discovery Rate To estimate the statistical power of each classifier, we applied the algorithm to each dataset following random permutations of phenotypic class labels across all sam ples.