Recently, I and my PhD student, Jennifer Warrender have become
interested in the representation of karyotypes. There are descriptions
of chromosome complement of an individual. In essence, they are a
birds-eye view of the genome. Normally, they are described using a
karyotype string, so my karyotype would be
46,XY (probably!) which is
normal male. When describing abnormalities, these can get very complex;
take, for example,
which describes a patient with multiple translocations.
There are a couple of reasons why we thought that it would be interesting to turn this into an ontological representation. The karyotype strings are not very parsable, and lack a computable specification which makes it very hard to check that they are correct, or to search and query over them. Having an ontological representation should help with the specification; additionally, using OWL we should be able to reason over by the specification and individual karyotype.
So, far it is turning out to be quite an interesting experience. The definitive resource for these strings is called ISCN (n.d.a) One of the things that we thought would happen, is that we would detect some inconsistencies in this specification; this often happens when producing a computable specification from a human-readable one, and it is not a reflection on the authors of the original. Sure enough, this is turning out to be the case. One the second page of the specification (after front matter), for instance, we find this statement:
Group G (21-22,Y): Short acrocentric chromosomes with satellites. The Y chromosome bears no satellites
— ISCN pg7
The two sentences here are contradictory in themselves; either, the Y chromosome should not be in Group G, or Group G should not be defined to bear satellites.
There is an apparently similar statement on the page before, which says:
Not all the chromosomes in the D and G groups show satellites on their short arms in a single cell
— ISCN pg6
However, this there is a significant difference here; to fill in the cytogenetic background, a satellite is a differentially staining visible body on the chromosome, found near the the centromere of some chromosomes. The name “satellite” actually comes from a different property of satellite DNA, that is is often a different density from bulk genomic DNA, so when spun in a density gradient, appears as a smaller band segregated from the main genomic band. Nowadays, satellite DNA is known to be highly repetitive sequence that we are no longer allowed to called Junk DNA (n.d.b) In humans, the most common sequence is known as the alpha satellite. The different densities sometimes seen in gradients occurs where the GC content is different from bulk. The differential staining patterns seen cytogenetically occurs because the repetitive DNA is normally packed as heterochromatin.
Which leads us to understand the difference between our two quotes from ISCN. Although, repetitive DNA is variable in detail, it is not that variable, and will remain constant within an individual; if one Chromosome 22 has alpha satellite at the centromere, then so will another. The key here is the caveat “in a single cell”. In a chromosome spread from a single cell, an individual chromosome may or may not be at the right stage of condensation for the satellite to be visible.
So, we have two usages of the word chromosome. In the first quote, we are referring to a canonical chromosome; so, canonically, it is true that all human chromosome 22s contain a satellite. In the second statement, we are referring to a single chromosome, in a single cell, from a single human. There is no contradiction here because the existence of a single chromosome 22 without satellites is not inconsistent with the canonical chromosome 22 having satellites.
Actually, ontologically, the situation is slightly more complex still. The second quote says “Not all..show satellites” (our emphasis). It is not an issue of whether the chromosomes have or do not have the relevant DNA, it is whether we happen to be able to see this after appropriate staining.
We will consider both of these issues — canonicalization, and visualisation in future posts.
This post was authored by [author]Phillip Lord[/author] and [author]Jennifer Warrender[/author].