How big can a gene be? Ten years ago in the early days of genome sequencing, researchers scoured the genomes of 580 bacterial and archaeal species for large genes. They found that 0.2% of all genes identified are longer than 5,000 bases and 80 of them are “giant genes,” those larger than 15,000 bases. To put this in perspective, the average prokaryotic gene length is between 900 and 1,200 bases.
The two longest genes were found in the green sulfur bacterium Chlorobium chlorochromatii CaD3. The genes encode proteins 36,806 and 20,647 amino acids long and their corresponding genes would be 110,418 and 61,941 bases long, respectively. At the time of this research, these giant genes are only surpassed in length by the human titin coding sequence which is 38,138 amino acids long. Now, scientists have identified a slew of genes that exceed one million bases long.
While organisms of the bacterial world tend to favor streamlined, highly efficient genomes, these giants have slipped under the radar. The type of proteins they encode could explain why. Over 90% of the giant genes encode surface proteins, such as a transporter or adhesion, or multienzyme complexes that work sequentially to transform a substrate to a desired product, usually an antifungal or antibacterial compound. These microbial weapons could give advantages to organisms that carry these giant genes when competing with other microbes for nutrients or territory. Multienzyme complexes are typically encoded by genes that sit side by side in the genome, making it easy for the genes to be turned off and on at the same time. Perhaps it was just convenient that these individual genes fused into one giant gene.
But creating such an enormous protein is quite a burden, requiring an immense investment of time, energy, and material. In optimal conditions, a cell can piece together 40 amino acids in a second. A “normal” protein could be made in a few seconds. The largest Chlorobium chlorochromatii protein identified in the study requires at least 15 minutes to make. This may still seem quick to us humans, but a bacterial life cycle lasts a matter of 20 minutes to a few weeks. If a bacterium is growing quickly, why bother making proteins from giant genes? Giant genes may only useful during periods of slow growth or in slow growing organisms.
Sure enough, most of these genes were found in environmental bacteria – those that tend to replicate slower and live in cycles of feast and famine. The 80 giant genes came from 47 species including six human pathogens, one fish pathogen, one insect pathogen, and one plant pathogen. When the study took place, most of the sequenced bacteria were human pathogens. The finding that giant genes occur preferentially in non-pathogenic environmental bacteria even when genome data is biased towards pathogens further supports the claim that giant genes are rare in pathogens and more common in environmental bacteria.
Microbial life at the extremes does always mean living in hydrothermal vents or clinging on ice floes in the Arctic. Unusual aspects of life may be stealthy, hidden within the most “ordinary” of organisms.