I cannot warn of the confusion about the difference between genome size and assembly size too much. Sequencing just gives you the total number of bases in the assembly in the end, but it is not the same as what is traditionally known as genome size.
Genome size is typically measured with nuclear DNA staining combined with flow cytometry to quantify the intensity of staining on the cellular basis, and the famous established genome size database (for plants & for animals) collects measurement results with this type of approaches. But, those methods require fresh cells sampled from the study organism as well as standard cellular samples of the reference species with a known genome size. These requirements have hindered our knowledge of genome size variation for some groups of species, like elusive sharks, and validation of the completeness of genome sequencing outputs. Genome size can be estimated with an analysis based on k-mer frequency only with the genome assembly sequences, but it sometimes gives an erroneous value, with a considerable underestimate.
As a solution, we previously released the protocol sQuantGenome, based on real-time quantitative PCR, to enable genome size estimation on more species with limitations. The protocol, which requires only DNA and the nucleotide sequences of several single-copy genes, was mainly formulated by Dr. Mitsutaka Kadota, the chief on-site staff of the DNA analysis station in Kobe RIKEN, and was originally published in 2023. In the end of May 2024, we have released its revised version (version 1.1) to enhance the overall usability.
We hope that our protocol will contribute to multifaceted validation of whole genome sequencing outputs of diverse species.
Snapshot from the 1st page of the protocol
Comments