Here’s a guide of some basic genetic engineering and synthetic biology for „dummies“ / noobs / beginners – how to getting started. In the future I intend to add some pictures (hand drawn not to infringe any IP) genetic-engineering-and-synbio-for-beginners v1  (click for the pdf document, the document shows the sequences in colours and is better readable)

On the diy bio mailing list a very frequent question is: where can I start? What can I read?

So here are some basic guidelines.

First question we need to ask: How does a cell store information? Let’s start with a bacterium. Our bacterium has a very large circular molecule of DNA (called chromosome) which contains all the information it needs. How to make enzymes that degrade glucose, make cellulose from simple sugars, and other metabolic enzymes. Bacteria can also carry plasmids, which are small molecules of DNA (like very small chromosomes) which are not necessary for living – but they can carry antibiotic resistances or other genes that are favourable. Think of it as the chromosome is the hard drive of the computer, where all the necessary information is stored. And plasmids are like USB sticks, which only carry small amounts of information and are mobilizable to transfer information between different bacteria.

The only purpose of DNA is to make proteins. DNA itself has no function other than saving information. Proteins can act as enzymes (to convert substance A into substance B – for example convert sugar into biodiesel, or digest cellulose into glucose), special proteins glow under UV-light, and some proteins act as a construction material, etc.  

Got all that information inside your mind? That’s the basic knowledge we’re gonna use now.

So we take a bacterium that is well studied and not pathogenic. Like a safe lab strain of E.coli. Now we want to introduce DNA that codes for a green fluorescent protein.

DNA is made of the four different bases A,T,G,C.

The construction plan for GFP is this:

atgagtaagggagaagaacttttcactggagttgtcccaattcttgttgaattagatggtgatgttaatgggcacaaattttctgtcagtggagagggtgaaggtgatgcaacatacggaaaacttacccttaaatttatttgcactactggaaaactacctgttccatggccaacacttgtcactactttcgcgtatggtcttcaatgctttgcgagatacccagatcatatgaaacggcatgactttttcaagagtgccatgcccgaaggttatgtacaggaaagaactatatttttcaaagatgacgggaactacaagacacgtgctgaagtcaagtttgaaggtgatacccttgttaatagaatcgagttaaaaggtattgattttaaagaagatggaaacattcttggacacaaattggaatacaactataactcacacaatgtatacatcatggcagacaaacaaaagaatggaatcaaagttaacttcaaaattagacacaacattgaagatggaagcgttcaactagcagaccattatcaacaaaatactccaattggcgatggccctgtccttttaccagacaaccattacttgtccacacaatctgccctttcgaaagatcccaacgaaaagagagaccacatggtccttcttgagtttgtaacagctgctgggattacacatggcatggatgaactatacaaataa

The DNA firstly is translated into RNA (by polymerase), a working copy from which proteins are synthesized. RNA -working copy of the construction plan:

AUGAGUAAGGGAGAAGAACUUUUCACUGGAGUUGUCCCAAUUCUUGUUGAAUUAGAUGGUGAUGUUAAUGGGCACAAAUUUUCUGUCAGUGGAGAGGGUGAAGGUGAUGCAACAUACGGAAAACUUACCCUUAAAUUUAUUUGCACUACUGGAAAACUACCUGUUCCAUGGCCAACACUUGUCACUACUUUCGCGUAUGGUCUUCAAUGCUUUGCGAGAUACCCAGAUCAUAUGAAACGGCAUGACUUUUUCAAGAGUGCCAUGCCCGAAGGUUAUGUACAGGAAAGAACUAUAUUUUUCAAAGAUGACGGGAACUACAAGACACGUGCUGAAGUCAAGUUUGAAGGUGAUACCCUUGUUAAUAGAAUCGAGUUAAAAGGUAUUGAUUUUAAAGAAGAUGGAAACAUUCUUGGACACAAAUUGGAAUACAACUAUAACUCACACAAUGUAUACAUCAUGGCAGACAAACAAAAGAAUGGAAUCAAAGUUAACUUCAAAAUUAGACACAACAUUGAAGAUGGAAGCGUUCAACUAGCAGACCAUUAUCAACAAAAUACUCCAAUUGGCGAUGGCCCUGUCCUUUUACCAGACAACCAUUACUUGUCCACACAAUCUGCCCUUUCGAAAGAUCCCAACGAAAAGAGAGACCACAUGGUCCUUCUUGAGUUUGUAACAGCUGCUGGGAUUACACAUGGCAUGGAUGAACUAUACAAAUAA

The ribosomes then read this as

AUG-AGU-AAG-GGA-GAA-GAA-CUU-UUC-ACU-GGA-GUU-GUC-CCA-…. and so forth.

The ribosomes then produce protein (a chain of amino acids) out of it. ATG always means “start” and codes for the amino acid methionine (m). The ribosome then slides to the next triplet (“codon”) AGU. This means serine (s). Next comes AAG which the ribosome translates into lysine (k). …etc.

Attach a picture of the codon wheel.

m-s-k-g-e-e-l-f-t-g-v-v-p-… and so forth.

yielding:

mskgeelftgvvpilveldgdvnghkfsvsgegegdatygkltlkficttgklpvpwptlvttfayglqcfarypdhmkrhdffksampegyvqertiffkddgnyktraevkfegdtlvnrielkgidfkedgnilghkleynynshnvyimadkqkngikvnfkirhniedgsvqladhyqqntpigdgpvllpdnhylstqsalskdpnekrdhmvllefvtaagithgmdelyk

Protein synthesis always starts with an ATG. And stops at TAG, TAA or TGA.

What we then need for the ribosome to start is: a ribosome binding site before the ATG shortly before the ATG (but at least 4 bases/nucleotides away and no more than 14 bp away!). The consensus sequence is AGGAGG. So we add

TCAGGAGGTAGTAatgagtaagggagaagaacttttcactggagttgtcccaattcttgttgaattagatggtgatgttaatgggcacaaattttctgtcagtggagagggtgaaggtgatgcaacatacggaaaacttacccttaaatttatttgcactactggaaaactacctgttccatggccaacacttgtcactactttcgcgtatggtcttcaatgctttgcgagatacccagatcatatgaaacggcatgactttttcaagagtgccatgcccgaaggttatgtacaggaaagaactatatttttcaaagatgacgggaactacaagacacgtgctgaagtcaagtttgaaggtgatacccttgttaatagaatcgagttaaaaggtattgattttaaagaagatggaaacattcttggacacaaattggaatacaactataactcacacaatgtatacatcatggcagacaaacaaaagaatggaatcaaagttaacttcaaaattagacacaacattgaagatggaagcgttcaactagcagaccattatcaacaaaatactccaattggcgatggccctgtccttttaccagacaaccattacttgtccacacaatctgccctttcgaaagatcccaacgaaaagagagaccacatggtccttcttgagtttgtaacagctgctgggattacacatggcatggatgaactatacaaataa

which is then translated into RNA:

UCAGGAGGUAGUAAUGAGUAAGGGAGAAGAACUUUUCACUGGAGUUGUCCCAAUUCUUGUUGAAUUAGAUGGUGAUGUUAAUGGGCACAAAUUUUCUGUCAGUGGAGAGGGUGAAGGUGAUGCAACAUACGGAAAACUUACCCUUAAAUUUAUUUGCACUACUGGAAAACUACCUGUUCCAUGGCCAACACUUGUCACUACUUUCGCGUAUGGUCUUCAAUGCUUUGCGAGAUACCCAGAUCAUAUGAAACGGCAUGACUUUUUCAAGAGUGCCAUGCCCGAAGGUUAUGUACAGGAAAGAACUAUAUUUUUCAAAGAUGACGGGAACUACAAGACACGUGCUGAAGUCAAGUUUGAAGGUGAUACCCUUGUUAAUAGAAUCGAGUUAAAAGGUAUUGAUUUUAAAGAAGAUGGAAACAUUCUUGGACACAAAUUGGAAUACAACUAUAACUCACACAAUGUAUACAUCAUGGCAGACAAACAAAAGAAUGGAAUCAAAGUUAACUUCAAAAUUAGACACAACAUUGAAGAUGGAAGCGUUCAACUAGCAGACCAUUAUCAACAAAAUACUCCAAUUGGCGAUGGCCCUGUCCUUUUACCAGACAACCAUUACUUGUCCACACAAUCUGCCCUUUCGAAAGAUCCCAACGAAAAGAGAGACCACAUGGUCCUUCUUGAGUUUGUAACAGCUGCUGGGAUUACACAUGGCAUGGAUGAACUAUACAAAUAA

The exact sequence of the other bases (shown in black) are not really important, they just should not interfere with the ribosome binding site (avoid C and T). It doesn’t get translated, so it won’t influence the cell in any way, though.

Now that we have this RNA in the cell, which includes a ribosome binding site, it gets translated into a protein which glows green under UV-light.

Last bit we need: Not all DNA is transcribed into RNA. What we need now is a promoter. A promoter is a sequence which confers a special structure to the DNA to which a RNA polymerase can bind to. The polymerase then reads the DNA and makes RNA with the corresponding sequence. The promoter also tells the cells when the gene is expressed (“on”): No promoter means never. Constitutive promoter means always. Lactose-Inducible promoter means only in the presence of lactose our gene is expressed… And, at last, a terminator, which is a structure which the RNA polymerase stops to make RNA.

So, finally we have or ready gene which can be expressed in bacteria:

Promoter- ribosome binding site- ATG – code – TAG – Terminator.

A constitutive promoter (from iGEM) ttgacagctagctcagtcctaggtattgtgctagc. So we take ggatccttgacagctagctcagtcctaggtattgtgctagctctagatttagtctTCAGGAGGTAGTAatgagtaagggagaagaacttttcactggagttgtcccaattcttgttgaattagatggtgatgttaatgggcacaaattttctgtcagtggagagggtgaaggtgatgcaacatacggaaaacttacccttaaatttatttgcactactggaaaactacctgttccatggccaacacttgtcactactttcgcgtatggtcttcaatgctttgcgagatacccagatcatatgaaacggcatgactttttcaagagtgccatgcccgaaggttatgtacaggaaagaactatatttttcaaagatgacgggaactacaagacacgtgctgaagtcaagtttgaaggtgatacccttgttaatagaatcgagttaaaaggtattgattttaaagaagatggaaacattcttggacacaaattggaatacaactataactcacacaatgtatacatcatggcagacaaacaaaagaatggaatcaaagttaacttcaaaattagacacaacattgaagatggaagcgttcaactagcagaccattatcaacaaaatactccaattggcgatggccctgtccttttaccagacaaccattacttgtccacacaatctgccctttcgaaagatcccaacgaaaagagagaccacatggtccttcttgagtttgtaacagctgctgggattacacatggcatggatgaactatacaaataa

The first green part is the promoter. The blue sequence is untranslated region (RNA which is not translated), we just added some nucleotides. tctaga is a site where a restriction enzyme called XbaI can cut the DNA, so you can cut the DNA at exactly this specific site (if you want to attach another promoter which had the same restriction site added.

A terminator that works in E.Coli is

AGAGAATATAAAAAGCCAGATTATTAATCCGGCTTTTTTATTATTT

http://parts.igem.org/Part:BBa_B0011 here you see the structure of it. AAAA binds to TTTT and somehow transcription of DNA into RNA is aborted.

Finally, we have

ggatccttgacagctagctcagtcctaggtattgtgctagctctagatttagtctTCAGGAGGTAGTAatgagtaagggagaagaacttttcactggagttgtcccaattcttgttgaattagatggtgatgttaatgggcacaaattttctgtcagtggagagggtgaaggtgatgcaacatacggaaaacttacccttaaatttatttgcactactggaaaactacctgttccatggccaacacttgtcactactttcgcgtatggtcttcaatgctttgcgagatacccagatcatatgaaacggcatgactttttcaagagtgccatgcccgaaggttatgtacaggaaagaactatatttttcaaagatgacgggaactacaagacacgtgctgaagtcaagtttgaaggtgatacccttgttaatagaatcgagttaaaaggtattgattttaaagaagatggaaacattcttggacacaaattggaatacaactataactcacacaatgtatacatcatggcagacaaacaaaagaatggaatcaaagttaacttcaaaattagacacaacattgaagatggaagcgttcaactagcagaccattatcaacaaaatactccaattggcgatggccctgtccttttaccagacaaccattacttgtccacacaatctgccctttcgaaagatcccaacgaaaagagagaccacatggtccttcttgagtttgtaacagctgctgggattacacatggcatggatgaactatacaaataaTATAGTCGTAGATCTAGAGAATATAAAAAGCCAGATTATTAATCCGGCTTTTTTATTATTT

This is our gene how the cell can read it. Polymerase binds to promoting structure, and starts RNA synthesis. RNA synthesis stops at the terminator. Now we have a RNA string which floats in the cells interior. A ribosome sometimes will encounter our RNA, bind to the ribosome binding site and find the ATG. And start protein synthesis. Much RNA (strong promoter) means you get much protein.

The protein is what we want. Because it may glow under UV light, or act as an enzyme to convert substances (like sugars, fats, …) into other substances. Or act as hormones (think of insulin).

If you want it to work in Bacillus subtillis, add a Bacillus promoter (and terminator, in case this one wouldn’t work) to it. If you want it to work in plants, add a plant promoter and plant terminator. (And the ribosome binding site should look a bit different for high expression: Kozak sequence)


You then have to bring this DNA into a bacterium. What we would have now is a linear DNA molecule. When the bacterium divides, this linear DNA fragment would not replicate with it, so only one cell would have it. Plus, the cell does not like linear DNA, recognizes the free ends, and would digest it immediately. Therefore you usually cut it into a plasmid (that carries a replication origin to be able to replicate the DNA in the cell). And, usual plasmids often carry an antibiotic resistance, so by adding the antibiotic you can kill all the cells which don’t have taken up the DNA.

ggatccttgacagctagctcagtcctaggtattgtgctagctctagatttagtctTCAGGAGGTAGTAatgNNNNNNNNtaaTATAGTCGTAGATCTAGAGAATATAAAAAGCCAGATTATTAATCCGGCTTTTTTATTATTT

And this might be how the antibiotic resistance looks like. NNNN stands for the coding site of the resistance protein. It also needs all the regulatory elements we just learnt.

Advertisements