Implementation T-RFPred is coded in Perl and uses the check details BioPerl Toolkit [17], fuzznuc from the EMBOSS package [18] and the BLASTN program from the NCBI BLAST suite [19]. T-RFPred has been tested in Unix-like environments, but runs in all the operating systems able to execute Perl, BioPerl, BLAST and EMBOSS; a ready-to-use VMware virtual image is also available for download at http://nodens.ceab.csic.es/t-rfpred/. An interactive shell guides the user through the multiple steps of the analysis. Users can choose to analyze archaeal or bacterial CP673451 cell line sequences using either forward
or reverse primers. The primer search utilizes fuzznuc, which allows the user to select the number of nucleotide ambiguities. The program extracts a subset of sequences from the RDP database that will supplement sequence analysis of clone libraries. T-RFPred generates and exports in a tab delimited text file: (1) the fragment length for the RDP sequence with the best BLASTN hit to the input sequence(s), (2) the estimated fragment OICR-9429 supplier length for the input sequence, (3) the gap length for the input sequence, (4) the percent identity between the input sequence and the best hit RDP sequence and (5) the taxonomic classification. The BLASTN search results and the Smith-Waterman alignments [20]
are saved to allow the user to manually check the results. Database The program uses a custom version of the aligned RDP as a flat file in FASTA format, where the selleck screening library header has been modified to include the NCBI taxonomic information and the forward/reverse position of the first non-gap character from the RDP alignment. T-RFPred exploits the Bio::DB::Flat capabilities from BioPerl to index the RDP flat file for the rapid retrieval of 16S rRNA gene sequences. All restriction enzymes
available in REBase [21] are stored in a flat file and available for use in the analysis. A list of frequently used forward and reverse primers is available, although the user may also input custom primers. Algorithm In part, the rationale for the described method was to circumvent the need for full-length 16S rRNA gene sequences from representative clone libraries. In addition to requiring multiple sequencing reactions, obtaining full-length sequences is generally complicated by the ambiguous nature of the 5′ end of a sequence generated by the Sanger approach (i.e. the first 10-30 bp of a sequence are missing). When the same primer set used to generate T-RFLP profiles is also used to generate amplicons for libraries and directional sequencing of representative clones, as is often the case, in silico predictions of expected peak sizes are cumbersome. Additionally, the size of the fragment is subject to experimental error [22, 23], which complicates the assignment of chromatogram peaks to specific phylogenetic groups.