urealyticum (14 strainsa) U. parvum (5 strainsb) Pan genome 1020 971 938 688 Core genome 515 523 553 538 Singletons 262 246 216 77 Clusters
of Orthologous Genes(COGs) 758 725 722 688 Pan genome represents the number of clusters of orthologous genes and singletons. Singletons are genes found only in one of the genomes. Clusters of Orthologous Genes (COGs) have genes orthologous among at least 2 genomes. a) ATCC UUR2, UUR4, UUR5, UUR7-13, and the clinical isolates 2033, 2608, 4155, 4318. b) ATCC UPA1, UPA3 (ATCC 27815), UPA3 (ATCC 700970), UPA6, UPA14. It has been suggested that genes that are not affected by the selective pressure on mycoplasmas gradually mutate at a faster rate than genes whose sequences are highly conserved
to a higher AT content and eventually are lost [25]. Therefore, the %GC content may point out which genes are important for ureaplasmas or have recently selleck products been acquired horizontally. We evaluated the GM6001 percent GC content of all genes across the 19 sequenced strains. Genes encoding hypothetical surface proteins selleck screening library conserved across all ureaplasma strains with high GC content may play an important role for ureaplasmas in processes like adherence to mammalian cells and colonization. An interactive excel table of the %CG values of all ureaplasma strains can be found in the Additional file 3: Comparative paper COGs tables.xls. A histogram of the distribution of %GC values of the ureaplasma pan genome shows that core genome genes with assigned function generally have a higher GC content than hypothetical genes (Figure 2). The median for the core genome was 27%GC, therefore genes with %GC higher than 27 are likely to be essential and/or acquired. The median for the hypothetical proteins was 24%GC. Considering that the ureaplasma genomes have an overall 25%GC content, it is likely that genes with GC content below 25% may be non-essential and on their way to be
lost. The lowest GC content is of a hypothetical protein with only 13%GC content. The genomes of the 14 sequenced ATCC ureaplasma serovar strains showed extreme similarity between the two species and 14 serovars. The comparison of the finished genomes shows Lck synteny on the gene level and not many rearrangements. We obtained percent difference values by whole genome comparison on the nucleotide level. The average intra-species percent difference was 0.62% with the least difference between UUR4 and UUR12 of only 0.06%, and the greatest difference between UUR9 and UUR13 of 1.27%. On the inter-species level the average percent difference was 9.5%, with the greatest difference between UPA1 and UUR9 of 10.2% (Table 3). As mentioned earlier, UUR serovars have about 118 Kbp (13.5%) larger genomes than UPA serovars. As a result UUR serovars have on average 58 genes more than UPA serovars. Figure 2 Percent GC Distribution Among Genes of The Ureaplasma Pan Genome (19 Strains).