Pre-recorded examples of Sonified DNA sequences

These audio files were produced from the sonification tools without modification and have been converted to mp3 files for wider browser compatibility. There were imported into video editing software and basic animation has been added. In these videos, the region of DNA sequence being sonified is highlighted as the audio is being played. This makes it easier to understand how the audio relates to the features of the DNA sequence.

An artificial test DNA sequence 01

A sequence of (GGG)n with three stop codons added, each in a different reading frame. As each Stop codon is passed the instsrument representing the reading frame stops playing. Stop codons are used in biology to halt translation (in process of gene expression to make a protein) and may indicate the end of a gene coding sequence.


This simple mononucleotide sequence is useful to understanding the sonification output and highlights the characteristic triplet note phrasing which is the basis of the subsequent auditory displays. Since the same codon motif occurs in each reading frame the same note is played on each of three instruments giving rise to a highly repetitive pattern. As each Stop codon (taa) is passed it causes an instrument to stop playing, the audio passage ends in silence.


An artificial test DNA sequence 02

A sequence of (GGG)n with three Start codons added, each in a different reading frame. As each Start codon is passed the instsrument representing the reading frame Starts playing. Start codons may be used in biology to signal the start site of translation (in the process of gene expression to make a protein) and may indicate the beginning of a gene coding sequence.


Notice how the audio is silent until the occurence of a Start codon in a the second reading frame that triggers the guitar to play. Subsequent Start codons turn on the audio of the other reading frames in which they occur. Also notice how the change in sequence from the ggg codon to gga,gat,atg and tgg codons causes a flutter of notes that is distinct from the mono-tonal note that represents ggg.


An repetitive DNA sequence 01

Homo sapiens chromosome 22 terminal deletion breakpoint region sequence, telomere-junction (22q13.3) GenBank: AJ277167.1


The first half of this sequence represents a more complex and naturally occuring DNA sequence. In this example the algorthm options were chosen to ignore Start and Stop codons (since this sequence is not a coding sequence). The second half of this squence contains the telomer repeat sequences (which are features found at the ends of the chromosomes). The telomer repeats (ttaggg) naturally give rise to a repetitive audio pattern (possibly a more harmonic or melodic phrase?) that is easily discerned. Within this repetitive sequence there is a single nucleotide change (either a polymorphorism or mutation). The absence of this single nucleotide (in this instance a missing 't' base/nucleotide) causes both a flutter of notes and a shift in the voicing of the repeat sequence (i.e a the frame shift causes different instruments to play the notes).


An repetitive DNA sequence 02

Homo sapiens huntingtin repeat instability region (LOC109461479) on chromosome 4 NCBI Reference Sequence: NG_052623.1


This sequence is a naturally occuring gene coding sequence that contains a repeat cag. In nature these repeat sequence may accumulate in certain genes gene and may result in disease (such as Huntingtins). Again, sonification of these repeat sequences give rise to a repetitive audio pattern (possibly a more harmonic or melodic phrase?) that is easily discerned.


Coding Sequence

Homo sapiens v-Ha-ras Harvey rat sarcoma viral oncogene homolog mRNA, complete cds GenBank: BT019421.1


This sequence represents a gene coding sequence. At the beginning of the sequence all reading frames are audible. The first frame (piano), is the open reading frame, that contains no stop codons and is audible throughout. The ocurence of a stop codon immediatley in the second reading frame and another in the third frame resuls in a solo piano or piano accompanied by other instrument combinations. This is expected since a coding sequence is defined by a reading frame in which stop codons are absent. Additionally, an option was chosen to sonify Start and Stop codons as percussive instruments to further highlight their occurence. The Start codon (ATG) was additionally sonified with an electric snare and Stop codons with cymbals.

.

Non-Coding Sequence

15S_rRNA non-coding RNA from S. cerevisiae strain S288C reference genome sequence (S288C_reference_genome_R64-1-1_20110203.tgz)


This sequence was process in the same way as the ras sequence above and likewise has no discernable melodic patterns such as those discernable on the repetitive sequences, however, the audio is still distinct. This audio is characterised by the absence of "triplet" note passages and the presence of repeated sections of silence (since piano, guitar and organ have been silenced). This is because repeated stop codons occurring in all three reading frames. In cells this sequence is transcribed into an rRNA (which is not translated). Start and Stop codons were additionally sonified as percussive instruments and these are prominant in passages that would otherwise be silent. Start codon (ATG) are sonified with an electric Snare and Stop codons with cymbals.Therefore a cymbal may be associated with the silencing of an instrument and a snare may be associated with an instrument joining back into the audio (depending on the audio status of the reading frame at the time), this is more evident than in the previous example.





Pre-recorded examples of Sonified DNA sequences

These audio files were produced from the sonification tools without modification and have been converted to mp3 files for wider browser compatibility. The following examples show how the sonification tool can be used to identify characteristics of genomic DNA sequences.

Auditory Display Sequence 1

G Sequence

An artificial test DNA sequence that consists (GGG)n:

GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

G sequence (default settings)


This simple mononucleotide sequence is useful to understanding the sonification output and highlights the characteristic triplet note phrasing which is the basis of the subsequent auditory displays. Since the same codon motif occurs in each reading frame the same note is played on each of three instruments giving rise to a highly repetitive pattern. Additionally, no disruption to either instrument occurs which highlights that the sequence contains no start or stop codons.


Auditory Display Sequence 2

Mutated G Sequence

An artificial test DNA sequence that consists (GGG)n with a G->T point mutation:

GGGGGGGGGGGGGGGTGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
GGGGGGGGGGGGGGGGGGGGGGGGGG

Mutated G Sequence (default settings)


The introduction of a point mutation into the 'G Sequence' causes only a transient change of one note of each reading frame/instrument (i.e. a change of up to three notes in one triplet, allowing for degeneracy in the genetic code), as exemplified by the change auditory display of the ‘Mutated G Sequence’ at approx. 3 seconds from the start, no further change is evident. .


Auditory Display Sequence 3

G Sequence with STOP in the 1st Reading Frame

The sequence is similar to the (GGG)n sequence above except that it contains a stop codon:

GGGGGGGGGGGGGGGTAAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
GGGGGGGGGGGGGGGGGGGGGGGGGG

STOP in the 1st reading frame (default settings)


This sounds the same as the 'G Sequence' above until the stop codon is parsed (at about 3 seconds from the start), after which instrument 1 (piano) becomes silent for the remainder of the sonification. Following the stop codon the characteristic triplet note phrasing is replaced by a two note phrasing for the remainder of the auditory display with a rest beat in place of the absent audio. The stop codon itself (TAA) is silent in frame one (as is the remainder of that frame) whereas in frames two and three it manifests as AAG and AGG, respectively, and gives rise to distinct notes before the reoccurrence of GGG.


Auditory Display Sequence 4

G Sequence with STOP in all Reading Frames

Again similar to the (GGG)n but with stop codons in all frames:

GGGGGGGGGGGGGGGTAAGTAAGTAAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
GGGGGGGGGGGGGGGGGGGGGGGGGG

STOP in all reading frames (default settings)



STOP in all reading frames (Restart after 10 codons)


Same as above until the stop codons are parsed, beyond this point (approx. 4 seconds) the audio streams from all reading frames becomes silent for the remainder of the sonification. In the second sonification of this sequence, the 'Restart after 10 codons' option was selected which forces the audio to restart in all frames after a short period of silence (10 codons) even in the absence of an ATG start codon.


Auditory Display Sequence 5

STOP in All, START in 1st RF

Repetitive G sequences with stop codons in all three reading frames followed by a start codon in the 1st frame.

GGGGGGGGGGGGGGGTAAGTAAGTAAGGGGGGGGGGATGGGGGGGGGGGGGGGGGGGGGG
GGGGGGGGGGGGGGGGGGGGGGGGGG

STOP in all reading frames START in 1st RF (default settings)


Same as 'STOP in all reading frames' above through to the point where all reading frames becomes silent, however the presence of a start codon in reading frame 1 causes instrument 1 to restart at approx. 7 seconds, whilst the others frames remain silent. Audio notes from the isolated instrument are staggered with two rests beats between each note due to the two silent frames.


Auditory Display Sequence 6

AT only DNA

AT rich DNA (absence of GC bases)

AAATTATTAAATTATTAAATTATTAAATTATTAAATTATTAAATTATTAAATTATTAAAT
TATTAAATTATTAAATTATTAAATTATTAAATTATTAAATTATTAAATTATTAAATTATT
AAATTATTAAATTATTAAATTATTAAATTATTAAATTATTAAATTATTAAATTATTAAAT
TATTAAATTATTAAATTATTAAATTATTAAATTATTAAATTATTAAATTATTAAATTATT
AAATTATTAAATTATT

AT rich DNA (default settings)



AT rich DNA (Ignore Start/Stop)


Audio plays with characteristic triplet pattern (three instruments) however due to the high chance of an TAA stop codons in each reading frame the audio becomes silent after only 4 seconds and remains so for the remaining 39 seconds. The absence of G precludes the occurrence of an ATG start codon to restart the audio. Selecting the 'Restart after 10 codons' causes no change compared to the default settings due to the re-occurrence of TAA's (stop codons) within the passage of 10 codons. In contrast the 'Ignore Start/Stop' option results in sonification of the entire sequence. It may not be obvious but he audio phrasing repeats every 8 triplets due to the repetitive nature of the artificial DNA sequence.


Auditory Display Sequence 7

GC only DNA

GC rich DNA (absence of AT bases)

GGGCCGCCGGGCCGCCGGGCCGCCGGGCCGCCGGGCCGCCGGGCCGCCGGGCCGCCGGGC
CGCCGGGCCGCCGGGCCGCCGGGCCGCCGGGCCGCCGGGCCGCCGGGCCGCCGGGCCGCC
GGGCCGCCGGGCCGCCGGGCCGCCGGGCCGCCGGGCCGCCGGGCCGCCGGGCCGCCGGGC
CGCCGGGCCGCCGGGCCGCCGGGCCGCCGGGCCGCCGGGCCGCCGGGCCGCCGGGCCGCC
GGGCCGCCGGGCCGCC

GC rich DNA (default settings)


Audio plays with characteristic triplet pattern (three instruments) for the duration of the sequence (approx. 43 seconds) with the default settings. There are no interruptions in the auditory display because all start and stop codons require A and T bases which are absence. This is in stark contrast to the display of the AT rich DNA using the same default settings.


Auditory Display Sequence 8

Human Telomeric DNA

Human DNA sequence that consists of tandem arrays of the hexanucleotide sequence (TTAGGG)n, for example:

TTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGG
TTAGGGTTAGGGTTAGGGAGTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGG
GTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGG
GTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGG
GTGTTAGGGTTAGGGTTAGGG

Play sonified Human Telomeric DNA sequence

Human Telomeric DNA (Ignore Start Stop)


The audio from this sequence is highly repetitive and repeats approximately every 6 bases. This sequence was sonified using the "reading frame algorithm" that reads groups of three (3) bases at a time (as triplets) hence after TWO sets of triplets the notes repeat. Notice the change in the repetitive sound that occurs at approx. 13 sec that reflects a subtle change in the DNA sequence at bp 79 (insertion of AG in place of T) in addition to a change at 41 sec (due to the insertion of TG). This is clearly apparent in the sonification but not so apparent by visual inspection of the sequence


Sequence data published by:
Moyzis, R. K., J. M. Buckingham, et al. (1988). "A highly conserved repetitive DNA sequence, (TTAGGG)n, present at the telomeres of human chromosomes." PNAS. 85, 6622-6626.

Auditory Display Sequence 9

Alphoid Repetitive DNA

Human DNA sequence that consists of tandem arrays of the pentanucleotide sequence (CCATT)n, for example:

CCATTCCATTCCATTCCATTCCATTCCATTCCATTCCATTCCATTCCATT
CCATTCCATTCCATTCCATTCCATTCCATTCCATTCCATTCCATTCCATT
ATAGTCCATTCCATTCCATTCCATTCCATTCAATTCCATTCCATTACAAT
TCGTTCCATTCCATTCTATTCCGTACCATTCGATTCCATTCCATACCATC
CATTCCATTCCATTCCATTCATTCCATTCCGTTCCATTCCGTTCATTCAT
TCATTCCATTCTATTCGGATTAATTCCAATCTATTCCATTCATTGCATTC
TATTCCATTCCATTGCAATCGAGTTGAATACATTGCATTCTATTCATTCA
TTCATTCCATTCCATTCCGGAAGATTA

Play sonified Alphoid sequence

Human alphoid repetitive sequence (Ignore Start Stop)



Human alphoid repetitive sequence (Restart on ATG)


The audio from this sequence is clearly repetitive but notice that the audio sound is more complex than the previous telomeric DNA sequence. This is because the sequence repeats approximately every 5 bases whereas the "reading frame algorithm" reads groups of three bases at a time, hence the melody repeats every 15 bases, that is FIVE sets of triplets occur before the notes repeat. The first 17 sec (100 bp) is a synthetic alphoid sequence that is purely repetitive with no sequence variation. Following this an actual alphoid repetitive sequence in that contains sequence variations that are clearly audible


Auditory Display Sequence 10

Ras coding sequence (cDNA)

This sequence represents the first exon of the Human Ras DNA sequence,
an important gene in cell signalling and human disease:

ATGACGGAATATAAGCTGGTGGTGGTGGGCGCCGGCGGTGTGGGCAAGAGTGCGCTGACC
ATCCAGCTGATCCAGAACCATTTTGTGGACGAATACGACCCCACTATAGAGGATTCCTAC
CGGAAGCAGGTGGTCATTGATGGGGAGACGTGCCTGTTGGACATCCTGGATACCGCCGGC
CAGGAGGAGTACAGCGCCATGCGGGACCAGTACATGCGCACCGGGGAGGGCTTCCTGTGT
GTGTTTGCCATCAACAACACCAAGTCTTTTGAGGACATCCACCAGTACAGGGAGCAGATC
AAACGGGTGAAGGACTCGGATGACGTGCCCATGGTGCTGGTGGGGAACAAGTGTGACCTG
GCTGCACGCACTGTGGAATCTCGGCAGGCTCAGGACCTCGCCCGAAGCTACGGCATCCCC
TACATCGAGACCTCGGCCAAGACCCGGCAGGGAGTGGAGGATGCCTTCTACACGTTGGTG
CGTGAGATCCGGCAGCACAAGCTGCGGAAGCTGAACCCTCCTGATGAGAGTGGCCCCGGC
TGCATGAGCTGCAAGTGTGTGCTCTCCTGA

Play sonified ras cDNA coding sequence

Human H-Ras cDNA (Silent until ATG)


Human H-Ras cDNA (Highlight STOP START)


This sequence was sonified using the "reading frame algorithm" in which a different instrument is used to sonify each reading frame, in this example a bright piano, electric bass (pick) and timpani were used to sound each frame. In addition the "use Start/Stop codons" option was selected, so that whenever a stop codon is detected in either reading frame the instrument is silenced as are the following 10 codons (notes). Notice how the bright piano plays throughout (i.e. the Ras open reading frame) whereas both the bass and timpani cut out repeatedly as stop codons occur in these respective reading frames. This leads to sections of audio with solo piano (e.g. at 3 sec and 1:30 min), piano and timpani (e.g. 9 to 17 sec) or piano and bass duets (predominantly from 45 sec to 1:00 min) plus the full trio ensemble (e.g. from 30 sec and 1:05 mins).

Sequence data taken from:
Homo sapiens chromosome 11 genomic contig, GRCh37.p5 Primary Assembly, NCBI Reference Sequence: NT_009237.18 (beginning at position 189)

Auditory Display Sequence 11

15S rRNA sequence

Yeast mitochondrial DNA sequence that codes for the 15S ribosomal RNA

GTAAAAAATTTATAAGAATATGATGTTGGTTCAGATTAAGCGCTAAATAAGGACATGACA
CATGCGAATCATACGTTTATTATTGATAAGATAATAAATATGTGGTGTAAACGTGAGTAA
TTTTATTAGGAATTAATGAACTATAGAATAAGCTAAATACTTAATATATTATTATATAAA
AATAATTTATATAATAAAAAGGATATATATATAATATATATTTATCTATAGTCAAGCCAA
TAATGGTTTAGGTAGTAGGTTTATTAAGAGTTAAACCTAGCCAACGATCCATAATCGATA
ATGAAAGTTAGAACGATCACGTTGACTCTGAAATATAGTCAATATCTATAAGATACAGCA
GTGAGGAATATTGGACAATGATCGAAAGATTGATCCAGTTACTTATTAGGATGATATATA
AAAATATTTTATTTTATTTATAAATATTAAATATTTATAATAATAATAATAATAATATAT
ATATATAAATTGATTAAAAATAAAATCCATAAATAATTAAAATAATGATATTAATTACCA
TATATATTTTTATATGGATATATATATTAATAATAATATTAATTTTATTATTATTAATAA
TATATTTTAATAGTCCTGACTAATATTTGTGCCAGCAGTCGCGGTAACACAAAGAGGGCG
AGCGTTAATCATAATGGTTTAAAGGATCCGTAGAATGAATTATATATTATAATTTAGAGT
TAATAAAATATAATTAAAGAATTATAATAGTAAAGATGAAATAATAATAATAATTATAAG
ACTAATATATGTGAAAATATTAATTAAATATTAACTGACATTGAGGGATTAAAACTAGAG
TAGCGAAACGGATTCGATACCCGTGTAGTTCTAGTAGTAAACTATGAATACAATTATTTA
TAATATATATTATATATAAATAATAAATGAAAATGAAAGTATTCCACCTGAAGAGTACGT
TAGCAATAATGAAACTCAAAACAATAGACGGTTACAGACTTAAGCAGTGGAGCATGTTAT
TTAATTCGATAATCCACGACTAACCTTACCATATTTTGAATATTATAATAATTATTATAA
TTATTATATTACAGGCGTTACATTGTTGTCTTTAGTTCGTGCTGCAAAGTTTTAGATTAA
GTTCATAAACGAACAAAACTCCATATATATAATTTTAATTATATATAATTTTATATTATT
TATTAATATAAAGAAAGGAATTAAGACAAATCATAATGATCCTTATAATATGGGTAATAG
ACGTGCTATAATAAAATGATAATAAAATTATATAAAATATATTTAATTATATTTAATTAA
TAATATAAAACATTTTAATTTTTAATATATTTTTTTATTATATATTAATATGAATTATAA
TCTGAAATTCGATTATATGAAAAAAGAATTGCTAGTAATACGTAAATTAGTATGTTACGG
TGAATATTCTAACTGTTTCGCACTAATCACTCATCACGCGTTGAAACATATTATTATCTT
ATTATTTATATAATATTTTTTAATAAATATTAATAATTATTAATTTATATTTATTTATAT
CAGAAATAATATGAATTAATGCGAAGTTGAAATACAGTTACCGTAGGGGAACCTGCGGTG
GGCTTATAAATATCTTAAATATTCTTACA

Play sonified the 15S ribosomal RNA sequence

15S_rRNA non-coding RNA (Silent until ATG).



15S_rRNA non-coding RNA (Highlight STOP START).


This sequence was process in the same way as the ras sequence above and likewise has no discernable melodic patterns such as those used to detect tandem repeats in repetitive DNA, however, the audio is still highly recognisable. This audio is characterised by the complete absence of any "triplet" note passages and the presence of repeated sections of silence. All passages are either single notes or pairs of notes with in each triplet (deriving from each of three reading frames). This is because of repeated stop codons occurring in all reading frames (including TGA). The rRNA is not translated and therefore stop codons have no effect (or act to inhibit translation if it were to occur). Clearly the "reading frame algorithm" combined with the "Start/Stop codons" option is effective in sonifying the rRNA into a distinctive audio stream.

The second example is the same sequence with the Highlight STOP START option to sonify the occurence of these motifs with purcussion sounds even when the audio from the respective instrument/reading frame is silent.

Sequence data taken from:
15S_rRNA 15S_RRNA SGDID:S000007287, Chr Mito from 6546-8194 (downloaded from http://downloads.yeastgenome.org/sequence/S288C_reference/genome_releases/S288C_reference_genome_R64-1-1_20110203.tgz/


Written by Mark Temple, School of Science and Health, Western Sydney University