Skip to main content

Genetics Advanced

1. DNA Replication in Detail

The Replication Fork

DNA replication begins at specific sequences called origins of replication (ori). In prokaryotes, there is typically one origin; in eukaryotes, there are thousands, allowing the genome to be replicated in a reasonable time.

At each origin, helicase unwinds the double helix, creating a replication fork --- a Y-shaped region where the two parental strands are separated.

Key Enzymes and Proteins

Enzyme / ProteinFunction
HelicaseUnwinds and separates the double helix at the origin of replication. Requires ATP hydrolysis.
Single-strand binding proteins (SSBs)Bind to separated strands, preventing re-annealing and protecting from nuclease degradation.
Topoisomerase (DNA gyrase)Relieves torsional strain ahead of the fork by cutting one or both strands, allowing rotation, and resealing.
PrimaseSynthesises short RNA primers (55--1010 nucleotides) complementary to the template strand. RNA primers provide a free 33'-OH\mathrm{OH} for DNA polymerase.
DNA polymerase III(Prokaryotes) The main replicative polymerase. Adds nucleotides to the 33' end, synthesising at 1000  nt/s\approx 1000\;\mathrm{nt/s}. Has 353' \to 5' proofreading exonuclease activity.
DNA polymerase I(Prokaryotes) Removes RNA primers (535' \to 3' exonuclease activity) and replaces them with DNA.
DNA ligaseForms phosphodiester bonds between Okazaki fragments on the lagging strand (and between primer replacements).
Sliding clamp (PCNA in eukaryotes, β\beta-clamp in prokaryotes)Ring-shaped protein that encircles DNA, tethering DNA polymerase to the template for processive synthesis.

Leading and Lagging Strands

DNA polymerase can only synthesise in the 535' \to 3' direction. Since the two template strands are antiparallel, synthesis proceeds differently on each:

  • Leading strand: synthesised continuously in the 535' \to 3' direction, toward the replication fork. Only one RNA primer is needed.
  • Lagging strand: synthesised discontinuously, away from the replication fork, in short segments called Okazaki fragments (10001000--2000  nt2000\;\mathrm{nt} in prokaryotes; 100100--200  nt200\;\mathrm{nt} in eukaryotes). Each fragment requires its own RNA primer.

Okazaki Fragment Processing

  1. DNA polymerase III synthesises an Okazaki fragment, stopping when it reaches the RNA primer of the previous fragment.
  2. DNA polymerase I removes the RNA primer (535' \to 3' exonuclease) and replaces it with DNA.
  3. DNA ligase seals the remaining nick (the gap between the 33'-OH\mathrm{OH} of the newly synthesised DNA and the 55'-phosphate of the previous fragment).

Proofreading and Fidelity

DNA polymerase III has 353' \to 5' exonuclease (proofreading) activity: if an incorrect nucleotide is incorporated, the polymerase detects the mismatched base pair, reverses direction, removes the incorrect nucleotide, and replaces it. This reduces the error rate from approximately 10510^{-5} (without proofreading) to 10710^{-7} per base pair per replication.

Mismatch repair (MMR) corrects errors that escape proofreading: after replication, the newly synthesised strand is identified (by nicks or methylation patterns), and mismatched bases are excised and resynthesised. MMR further reduces the error rate to approximately 10910^{-9}.

Eukaryotic vs Prokaryotic Replication

FeatureProkaryotesEukaryotes
DNA polymerasesPol I, Pol III (main replicative)Pol α\alpha (primase + short extension), Pol δ\delta (lagging), Pol ε\varepsilon (leading)
Origins of replicationOne (oriC)Many (thousands); replicons
Okazaki fragment length10001000--2000  nt2000\;\mathrm{nt}100100--200  nt200\;\mathrm{nt}
Replication rate1000  nt/s\approx 1000\;\mathrm{nt/s}50  nt/s\approx 50\;\mathrm{nt/s} (but many forks simultaneously)
TelomeresCircular chromosomes (no ends)Linear chromosomes; telomeres protect ends
TelomeraseNot neededExtends 33' ends of chromosomes; reverse transcriptase + RNA template

The End Replication Problem

In eukaryotes, the leading strand can be synthesised to the very end of the chromosome, but the lagging strand cannot complete the terminal Okazaki fragment (there is no upstream primer to replace). This results in a progressive shortening of chromosomes with each round of replication (50\approx 50--200  bp200\;\mathrm{bp} per division in somatic cells).

Telomerase extends the 33' end by adding tandem repeats of a short sequence (TTAGGG in humans) using its built-in RNA template. This prevents the loss of coding DNA. Telomerase is active in germ cells, stem cells, and most cancer cells but is inactive in most somatic cells.


2. Transcription in Detail

Initiation in Prokaryotes

  1. RNA polymerase binds to the promoter region upstream of the gene.
  2. The 35-35 box (consensus TTGACA) and 10-10 box (consensus TATAAT, Pribnow box) are recognised by the sigma factor (σ\sigma) subunit of RNA polymerase.
  3. The sigma factor positions RNA polymerase at the correct start site (+1) and facilitates local DNA unwinding (forming an open complex).
  4. RNA polymerase begins synthesising mRNA in the 535' \to 3' direction, using the template (antisense) strand as a template.
  5. After synthesising approximately 10  nt10\;\mathrm{nt}, the sigma factor dissociates.

Initiation in Eukaryotes

Eukaryotic transcription is more complex, involving three RNA polymerases and multiple transcription factors:

RNA polymeraseProduct
Pol IrRNA (28S, 18S, 5.8S)
Pol IImRNA, snRNA, microRNA
Pol IIItRNA, 5S rRNA, other small RNAs

Pol II transcription initiation:

  1. General transcription factors (TFIIA, TFIIB, TFIID, TFIIE, TFIIF, TFIIH) bind sequentially to the promoter.
  2. TFIID contains the TATA-binding protein (TBP), which binds to the TATA box (consensus TATAAA, approximately 25  bp25\;\mathrm{bp} upstream of the transcription start site).
  3. Additional transcription factors and RNA polymerase II assemble, forming the pre-initiation complex (PIC).
  4. TFIIH has helicase activity that unwinds DNA and kinase activity that phosphorylates the C-terminal domain (CTD) of RNA polymerase II, releasing it from the promoter to begin elongation.

Elongation

  • RNA polymerase unwinds DNA ahead and rewinds it behind, transcribing approximately 4040--60  nt/s60\;\mathrm{nt/s}.
  • Nascent mRNA exits through a channel in RNA polymerase.

Termination

Prokaryotes: two main mechanisms:

  • Rho-dependent: the Rho factor (a helicase) binds to the rut site on the mRNA and chases RNA polymerase, dissociating the complex when it catches up.
  • Rho-independent: the mRNA forms a hairpin loop followed by a poly-U tract. The hairpin causes RNA polymerase to pause, and the weak poly-U--dA hybrid dissociates.

Eukaryotes: transcription continues well past the gene. The pre-mRNA is cleaved at a polyadenylation signal (AAUAAA), and the polymerase dissociates.

Post-Transcriptional Modification (Eukaryotes)

  1. 55' capping: addition of 7-methylguanosine cap (m7G\mathrm{m}^7\mathrm{G}). Functions: protects mRNA from 55' exonuclease degradation; aids ribosome binding; facilitates nuclear export.
  2. 33' polyadenylation: cleavage after AAUAAA signal and addition of 200200--250250 adenosine residues (poly-A tail). Functions: stabilises mRNA; aids nuclear export; enhances translation.
  3. Splicing: removal of introns by the spliceosome (a complex of snRNPs --- small nuclear ribonucleoproteins). Introns are excised and exons are ligated. Alternative splicing allows one gene to produce multiple mRNA isoforms (and therefore multiple protein variants).

3. Translation in Detail

tRNA Structure

Transfer RNA (tRNA) is a small (75  nt\approx 75\;\mathrm{nt}) RNA molecule that carries a specific amino acid and recognises the corresponding codon on mRNA.

Key features:

  • Acceptor stem: the 33' end has the sequence CCA, where the amino acid is attached by aminoacyl-tRNA synthetase (one enzyme per amino acid).
  • Anticodon loop: contains the anticodon (three nucleotides complementary to the mRNA codon).
  • D loop and Tψ\psiC loop: structural elements involved in tRNA folding and ribosome binding.
  • Modified bases: e.g., inosine (I) in the anticodon can pair with U, C, or A, allowing wobble at the third codon position.

The Wobble Hypothesis (Crick, 1966)

The 55' base of the anticodon (which pairs with the 33' base of the codon) has relaxed base-pairing rules ("wobble"):

Anticodon 55' baseCodon 33' bases recognised
CG
AU
UA or G
GC or U
I (inosine)U, C, or A

Wobble explains why the 6161 sense codons are read by fewer than 6161 tRNAs (humans have approximately 4545 tRNA species).

Ribosome Structure

Ribosomes consist of a large subunit and a small subunit, composed of rRNA and proteins.

ComponentProkaryotes (70S70\mathrm{S})Eukaryotes (80S80\mathrm{S})
Small subunit30S30\mathrm{S} (16S rRNA + proteins)40S40\mathrm{S} (18S rRNA + proteins)
Large subunit50S50\mathrm{S} (23S + 5S rRNA + proteins)60S60\mathrm{S} (28S + 5.8S + 5S rRNA + proteins)

Three tRNA binding sites:

  • A site (aminoacyl): holds the incoming aminoacyl-tRNA.
  • P site (peptidyl): holds the tRNA carrying the growing polypeptide chain.
  • E site (exit): holds the deacylated tRNA before it exits the ribosome.

Steps of Translation

Initiation:

  1. The small ribosomal subunit binds to the 55' cap of the mRNA (in eukaryotes) and scans along the mRNA until it finds the start codon AUG in the context of the Kozak consensus sequence (in eukaryotes: GCCRCCAUGG).
  2. The initiator tRNA carrying methionine (Met) binds to the start codon in the P site.
  3. The large ribosomal subunit joins, forming the complete translation complex.
  4. Initiation factors (eIFs in eukaryotes, IFs in prokaryotes) are released.

Elongation:

  1. Aminoacyl-tRNA delivery: an aminoacyl-tRNA matching the next codon enters the A site, escorted by elongation factor Tu (EF-Tu) in prokaryotes (eEF1α\alpha in eukaryotes), which hydrolyses GTP.
  2. Peptide bond formation: peptidyl transferase (an rRNA ribozyme in the large subunit) catalyses the formation of a peptide bond between the amino acid in the P site and the amino acid in the A site.
  3. Translocation: elongation factor G (EF-G) in prokaryotes (eEF2 in eukaryotes), using GTP, moves the ribosome by one codon: the empty tRNA moves to the E site and exits; the peptidyl-tRNA moves from the A site to the P site; the next codon is positioned in the A site.

Termination:

  1. When a stop codon (UAA, UAG, UGA) enters the A site, there is no corresponding tRNA.
  2. Release factors (RF1, RF2 in prokaryotes; eRF1 in eukaryotes) bind to the stop codon.
  3. Peptidyl transferase hydrolyses the bond between the polypeptide and the tRNA, releasing the polypeptide.
  4. The ribosome dissociates into its subunits. Release factor RF3 (GTPase) facilitates this.

Post-Translational Modifications

After translation, polypeptides may undergo:

  • Folding: assisted by chaperone proteins (e.g., Hsp70, GroEL/GroES) that prevent aggregation and promote correct folding.
  • Cleavage: removal of signal peptides, propeptides (e.g., insulin is cleaved from proinsulin).
  • Chemical modifications: phosphorylation, glycosylation, acetylation, lipidation (addition of fatty acid groups for membrane anchoring).
  • Assembly: quaternary structure assembly (e.g., haemoglobin α2β2\alpha_2\beta_2).

4. Gene Regulation

The Lac Operon (Extended)

The lac operon in E. coli is subject to dual regulation:

Negative control (repressor):

  • Gene lacIlacI (constitutively expressed) produces the lac repressor protein.
  • In the absence of lactose, the repressor binds to the operator, blocking RNA polymerase.
  • When lactose is present, allolactose (an isomer of lactose) binds to the repressor, causing a conformational change that reduces its affinity for the operator. The repressor detaches, and RNA polymerase can transcribe the structural genes.

Positive control (CAP-cAMP):

  • When glucose is low, intracellular cAMP levels are high (adenylate cyclase is active).
  • cAMP binds to CAP (catabolite activator protein), and the cAMP-CAP complex binds to the CAP site upstream of the promoter, bending the DNA and facilitating RNA polymerase binding.
  • When glucose is high, cAMP is low, CAP is inactive, and the lac operon is transcribed at very low basal levels (catabolite repression).

Summary table:

GlucoseLactosecAMPCAPRepressorlac operon
HighAbsentLowInactiveBoundOFF
HighPresentLowInactiveUnboundVery low (basal)
LowAbsentHighActiveBoundOFF
LowPresentHighActiveUnboundON (maximal)

The Trp Operon (Repressible Operon)

The trp operon in E. coli regulates tryptophan biosynthesis. It is normally ON (tryptophan is needed) and is turned OFF when tryptophan is abundant.

  • Repressor protein (product of trpRtrpR) is inactive alone.
  • When tryptophan (corepressor) binds to the repressor, the complex binds to the operator, blocking transcription.
  • Attenuation: in addition to the repressor, the trp operon has a leader peptide sequence with two consecutive tryptophan codons. During transcription, if tryptophan is abundant, the ribosome quickly translates the leader peptide, allowing formation of a terminator hairpin (attenuator), causing premature transcription termination. If tryptophan is scarce, the ribosome stalls at the tryptophan codons, preventing terminator formation, and the full mRNA is transcribed.

Epigenetics

Epigenetic regulation involves heritable changes in gene expression that do not alter the DNA sequence.

DNA methylation:

  • Addition of a methyl group (CH3\mathrm{CH}_3) to cytosine bases at CpG dinucleotides by DNA methyltransferases.
  • Hypermethylation of promoter regions generally silences gene expression by preventing transcription factor binding or recruiting proteins that condense chromatin.
  • Hypomethylation is associated with active gene expression.
  • Genomic imprinting: certain genes are expressed in a parent-of-origin-specific manner due to differential methylation (e.g., IGF2 is expressed from the paternal allele; H19 from the maternal allele).

Histone modification:

  • Histone proteins have N-terminal "tails" that can be chemically modified:
    • Acetylation (by histone acetyltransferases, HATs): adds acetyl groups to lysine residues, neutralising the positive charge and reducing histone-DNA binding. This loosens chromatin (euchromatin) and promotes transcription.
    • Methylation (by histone methyltransferases): can activate or repress transcription depending on which residue is methylated (e.g., H3K4me3 activates; H3K9me3 and H3K27me3 repress).
    • Phosphorylation: associated with chromosome condensation during mitosis.

Epigenetic inheritance:

  • During DNA replication, maintenance methyltransferases copy the methylation pattern to the new strand, allowing epigenetic marks to be inherited through cell division.
  • Environmental factors (diet, stress, toxins) can alter epigenetic marks, with potential transgenerational effects.

5. Genetic Engineering Techniques

Restriction Enzymes (Restriction Endonucleases)

Bacterial enzymes that recognise specific palindromic DNA sequences (usually 44--8  bp8\;\mathrm{bp}) and cut both strands at specific positions within or near the recognition site.

EnzymeRecognition sequenceCut type
EcoRI55'-G\vertAATTC-33'Sticky ends (5' overhang)
BamHI55'-G\vertGATCC-33'Sticky ends (5' overhang)
HindIII55'-A\vertAGCTT-33'Sticky ends (5' overhang)
SmaI55'-CCC\vertGGG-33'Blunt ends

Sticky ends (cohesive ends): single-stranded overhangs that can base-pair with complementary sticky ends from another DNA fragment cut with the same enzyme. This facilitates the formation of recombinant DNA molecules.

Recombinant DNA Technology Steps

  1. Isolation: the gene of interest is cut from genomic DNA using restriction enzymes (or synthesised chemically / by PCR).
  2. Insertion into vector: the gene and a plasmid vector (cut with the same restriction enzyme) are mixed. DNA ligase forms phosphodiester bonds between the gene and the plasmid.
  3. Transformation: the recombinant plasmid is introduced into host cells (E. coli) by heat shock, electroporation, or chemical transformation.
  4. Selection: cells that have taken up the plasmid are selected using antibiotic resistance markers. Blue-white screening (using lacZ gene disruption) distinguishes recombinant from non-recombinant colonies.
  5. Expression: the host cell transcribes and translates the inserted gene, producing the desired protein.

Gel Electrophoresis (Extended)

  • Agarose gel electrophoresis: separates DNA fragments (100100--25000  bp25000\;\mathrm{bp}). Higher agarose concentration = smaller pore size = better resolution of small fragments.
  • Polyacrylamide gel electrophoresis (PAGE): separates smaller fragments (11--1000  bp1000\;\mathrm{bp}) with higher resolution. Used for DNA sequencing and protein separation.
  • Pulsed-field gel electrophoresis (PFGE): separates very large DNA fragments (10410^4--107  bp10^7\;\mathrm{bp}) by periodically changing the direction of the electric field.

PCR (Extended)

Quantitative PCR (qPCR / real-time PCR): measures the amount of DNA produced in real time using fluorescent dyes or probes. The cycle threshold (CtC_t) is inversely proportional to the starting DNA quantity.

Reverse transcription PCR (RT-PCR): uses reverse transcriptase to convert RNA into cDNA, which is then amplified by PCR. Used to measure gene expression (quantitative RT-PCR, qRT-PCR).


6. Genetic Crosses and Chi-Squared Tests

Dihybrid Crosses with Linkage

When two genes are on the same chromosome, they do not assort independently. The phenotypic ratio deviates from 9:3:3:19:3:3:1.

Recombination frequency (RF): RF=number of recombinant offspringtotal offspring×100%\mathrm{RF} = \frac{\text{number of recombinant offspring}}{\text{total offspring}} \times 100\%

  • RF<10%\mathrm{RF} < 10\%: genes are closely linked.
  • RF50%\mathrm{RF} \approx 50\%: genes assort independently (on different chromosomes or very far apart).

Chi-Squared Test

Used to determine whether observed data deviate significantly from expected ratios:

χ2=(OiEi)2Ei\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}

Steps:

  1. State the null hypothesis (H0H_0): the observed ratios match the expected ratios.
  2. Calculate expected values from the total and expected ratio.
  3. Calculate χ2\chi^2.
  4. Determine degrees of freedom: df=n1\mathrm{df} = n - 1 (where nn is the number of categories).
  5. Compare χ2\chi^2 to the critical value at the chosen significance level (p=0.05p = 0.05).
  6. If χ2>\chi^2 > critical value, reject H0H_0; the deviation is statistically significant.

Pedigree Analysis

FeatureAutosomal dominantAutosomal recessiveX-linked recessive
Affected individuals per generationUsually every generationMay skip generationsMostly males; skips generations via carrier females
Affected children of unaffected parentsNo (at least one parent affected)Yes (both parents are carriers)Yes (mother is carrier, father unaffected)
Male-to-male transmissionYesYesNo
Carrier femalesNoYesYes
Ratio in carrier ×\times normal mating1:11:1 affected:normal1:2:11:2:1 (affected:carrier:normal)1:11:1 (affected male:normal female)

Common Pitfalls

  • Confusing the leading and lagging strands: the leading strand is synthesised continuously toward the fork; the lagging strand is synthesised discontinuously away from the fork.
  • Stating that "DNA polymerase synthesises in the 353' \to 5' direction": DNA polymerase always synthesises in the 535' \to 3' direction. The template is read 353' \to 5'.
  • Confusing introns and exons: introns are non-coding regions removed by splicing; exons are coding regions retained in the mature mRNA.
  • Assuming all mutations are harmful: most are neutral, some are beneficial, and the beneficial ones are the raw material for natural selection.
  • Confusing replication and transcription: replication copies DNA to produce DNA; transcription copies DNA to produce RNA.
  • Misapplying the chi-squared test with expected values below 55: the chi-squared test is unreliable when any expected value is less than 55; categories should be combined if necessary.
  • Confusing epigenetic and genetic changes: epigenetic changes alter gene expression without changing the DNA sequence; genetic changes alter the DNA sequence itself.

Practice Problems

Question 1: DNA Replication -- Leading and Lagging Strand

A DNA molecule has the following sequence on one strand (template strand for the lagging strand): 33'-TACGGAATTCGATCCGAAT-5'. (a) Write the sequence of the complementary strand (template for the leading strand). (b) Identify the direction of synthesis for each strand. (c) If replication begins at the left end and proceeds to the right, which strand is the leading strand and which is the lagging strand? (d) How many RNA primers would be required to replicate this region if Okazaki fragments are approximately 88 nucleotides long?

Answer

(a) Complementary strand: 55'-ATGCCTTAAGCTAGGCTTA-3'

(b) Both strands are synthesised in the 535' \to 3' direction. The complementary strand (55'-ATGCCTTAAGCTAGGCTTA-3') is synthesised 535' \to 3' (left to right). The template strand (33'-TACGGAATTCGATCCGAAT-5') is read 353' \to 5' (right to left), and the new strand is synthesised 535' \to 3' (left to right in the Okazaki fragments).

(c) If replication proceeds from left to right:

  • The complementary strand (55'-ATGCCTTAAGCTAGGCTTA-3') is synthesised continuously toward the replication fork \to leading strand.
  • The template strand (33'-TACGGAATTCGATCCGAAT-5') requires Okazaki fragments \to lagging strand.

(d) The lagging strand template is 1919 nucleotides long. With Okazaki fragments of 8\approx 8 nucleotides: 19/8=3\lceil 19/8 \rceil = 3 Okazaki fragments, requiring 33 RNA primers (one per fragment). Plus the initial primer on the leading strand.

Question 2: Operon Regulation Prediction

An E. coli culture is growing in a medium containing lactose but no glucose. Predict whether the lac operon is ON or OFF. The culture is then supplemented with a high concentration of glucose. Predict the new state of the lac operon and explain the molecular mechanism, including the roles of allolactose, cAMP, CAP, and the lac repressor.

Answer

With lactose, no glucose: the lac operon is ON (maximal transcription).

Mechanism:

  • Lactose is present: allolactose (a lactose isomer) binds to the lac repressor, causing it to detach from the operator.
  • Glucose is absent: cAMP levels are high (adenylate cyclase is active). cAMP binds to CAP, activating it. The cAMP-CAP complex binds to the CAP site, enhancing RNA polymerase binding.
  • With no repressor blocking the operator and CAP enhancing transcription, RNA polymerase transcribes the lac operon at maximum rate.

After adding glucose: the lac operon is OFF (or very low basal expression).

Mechanism:

  • Glucose is now present: glucose uptake inhibits adenylate cyclase (via the phosphotransferase system, decreasing cAMP). With low cAMP, CAP cannot bind to the CAP site, reducing RNA polymerase affinity for the promoter.
  • Although the repressor is still detached (lactose is still present), the lack of CAP activation means RNA polymerase binds poorly and transcription is minimal (catabolite repression).
  • E. coli preferentially metabolises glucose because it yields more ATP per molecule than lactose and does not require inducer synthesis.
Question 3: Chi-Squared Test with Two Genes

In Drosophila, genes for body colour (b+b^+ = grey, bb = black) and wing shape (v+v^+ = normal, vv = vestigial) are linked on chromosome 2. A test cross between a double heterozygote (b+b  v+v/b  b  v  vb^+ b\; v^+ v / b\; b\; v\; v) and a double recessive (b  b  v  v/b  b  v  vb\; b\; v\; v / b\; b\; v\; v) produces the following offspring:

PhenotypeNumber
Grey, normal430
Black, vestigial445
Grey, vestigial65
Black, normal60
Total1000

(a) Calculate the recombination frequency. (b) Perform a chi-squared test to determine whether the data deviate from the expected ratio for linked genes at p=0.05p = 0.05 (critical value =3.84= 3.84 for 11 df). (c) Calculate the map distance between the two genes.

Answer

(a) Recombinant phenotypes: grey vestigial (6565) and black normal (6060). Total recombinants: 65+60=12565 + 60 = 125. RF=1251000×100%=12.5%\mathrm{RF} = \frac{125}{1000} \times 100\% = 12.5\%

(b) Expected values (assuming linked genes with 12.5%12.5\% recombination):

  • Parental (grey normal + black vestigial): 87.5%87.5\% of 1000=8751000 = 875 (437.5437.5 each)
  • Recombinant (grey vestigial + black normal): 12.5%12.5\% of 1000=1251000 = 125 (62.562.5 each)

χ2=(430437.5)2437.5+(445437.5)2437.5+(6562.5)262.5+(6062.5)262.5\chi^2 = \frac{(430 - 437.5)^2}{437.5} + \frac{(445 - 437.5)^2}{437.5} + \frac{(65 - 62.5)^2}{62.5} + \frac{(60 - 62.5)^2}{62.5} =56.25437.5+56.25437.5+6.2562.5+6.2562.5= \frac{56.25}{437.5} + \frac{56.25}{437.5} + \frac{6.25}{62.5} + \frac{6.25}{62.5} =0.129+0.129+0.100+0.100=0.458= 0.129 + 0.129 + 0.100 + 0.100 = 0.458

χ2=0.458<3.84\chi^2 = 0.458 < 3.84 (critical value). We fail to reject H0H_0. The data do not deviate significantly from the expected ratio for linked genes with 12.5%12.5\% recombination.

(c) Map distance: 12.5  cM12.5\;\mathrm{cM} (the recombination frequency approximates map distance in centiMorgans for small distances).

Question 4: Epigenetics and Gene Expression

A gene is silenced by DNA methylation in a patient's cells. (a) Explain the molecular mechanism by which DNA methylation silences gene expression. (b) If the patient takes a drug that inhibits DNA methyltransferases, predict the effect on the gene's expression and explain the potential consequences. (c) Explain why identical twins, who share the same DNA sequence, can have different disease susceptibilities.

Answer

(a) DNA methyltransferases add methyl groups (CH3\mathrm{CH}_3) to cytosine residues at CpG dinucleotides in the promoter region of the gene. Methylated DNA recruits methyl-CpG-binding domain proteins (MBDs), which in turn recruit histone deacetylases (HDACs). HDACs remove acetyl groups from histones, increasing the positive charge on histones and strengthening histone-DNA interactions. The chromatin becomes highly condensed (heterochromatin), preventing transcription factors and RNA polymerase from accessing the promoter. The gene is transcriptionally silenced.

(b) Inhibiting DNA methyltransferases would prevent the maintenance of methylation patterns during DNA replication. Over successive cell divisions, the promoter would become progressively hypomethylated. The chromatin would relax (become euchromatin), and transcription factors would regain access to the promoter. Gene expression would increase (or be reactivated). This is the basis for epigenetic therapy in certain cancers (e.g., azacitidine for myelodysplastic syndromes).

(c) Identical twins have the same genome but accumulate different epigenetic modifications over their lifetimes due to: different environmental exposures (diet, toxins, stress, physical activity), different stochastic (random) epigenetic changes, and different in utero environments. These epigenetic differences alter gene expression patterns, contributing to discordant disease susceptibility (e.g., one twin develops autoimmune disease or cancer while the other does not).

Question 5: Genetic Engineering Design

Design a procedure to produce human growth hormone (HGH) using recombinant DNA technology in E. coli. Include the following steps: (a) obtaining the HGH gene, (b) selecting a suitable plasmid vector, (c) creating the recombinant plasmid, (d) transforming E. coli, (e) selecting transformed colonies, (f) inducing HGH expression. Explain the role of each enzyme and genetic element used.

Answer

(a) Obtaining the HGH gene: extract mRNA from human pituitary cells, use reverse transcriptase to synthesise cDNA. Amplify the HGH cDNA by PCR using primers that incorporate restriction sites (e.g., EcoRI and BamHI at the 55' and 33' ends). This produces a gene without introns (essential for expression in prokaryotes, which cannot splice eukaryotic introns).

(b) Vector selection: use a plasmid with: an origin of replication (for replication in E. coli), an antibiotic resistance gene (e.g., ampicillin resistance, ampR^R), a multiple cloning site (MCS) with EcoRI and BamHI sites, and a promoter (e.g., lac promoter for inducible expression).

(c) Creating the recombinant plasmid: digest both the HGH cDNA and the plasmid with EcoRI and BamHI. Mix the fragments with DNA ligase, which forms phosphodiester bonds between the HGH gene and the opened plasmid.

(d) Transformation: introduce the ligation mixture into competent E. coli cells by heat shock (42C42^\circ\mathrm{C} for 90  s90\;\mathrm{s}) or electroporation. Allow recovery in rich medium.

(e) Selection: plate cells on agar containing ampicillin. Only cells that have taken up the plasmid (with ampR^R) will grow. To distinguish recombinant from non-recombinant colonies, use blue-white screening: the plasmid carries the lacZ gene in the MCS; insertion of the HGH gene disrupts lacZ, so colonies containing the recombinant plasmid are white (X-gal + IPTG in the medium).

(f) Inducing expression: grow the recombinant E. coli in culture, then add IPTG (isopropyl β\beta-D-1-thiogalactopyranoside) to induce the lac promoter. The E. coli transcribes and translates the HGH gene, producing the protein. The HGH can be purified from the culture using chromatography or antibody-based methods.


Worked Examples

Worked Example: Calculating DNA Replication Time

The human genome is 3.2×109  bp3.2 \times 10^9\;\mathrm{bp}. Eukaryotic DNA polymerase synthesises at 50  nt/s\approx 50\;\mathrm{nt/s}. The average distance between origins of replication is 150  kb\approx 150\;\mathrm{kb}. Calculate the minimum time required to replicate the entire genome.

Solution

Number of replication origins: 3.2×109150×10321333\frac{3.2 \times 10^9}{150 \times 10^3} \approx 21333 origins.

Each replication fork synthesises DNA at 50  nt/s50\;\mathrm{nt/s}. Each origin produces two replication forks (bidirectional replication), so the rate per origin is 2×50=100  nt/s2 \times 50 = 100\;\mathrm{nt/s}.

Length per replication unit: 150  kb=150000  bp150\;\mathrm{kb} = 150000\;\mathrm{bp}.

Time per replication unit: 150000100=1500  s=25  minutes\frac{150000}{100} = 1500\;\mathrm{s} = 25\;\mathrm{minutes}.

Since all origins fire approximately simultaneously (during S phase), the total replication time is approximately 2525 minutes (plus the time for origin firing, which is staggered over 1\approx 1--22 hours in reality). The actual S phase in human cells lasts 66--88 hours, reflecting the staggered activation of origins and other cellular processes.

Worked Example: Transcription and Translation -- Gene Length Calculation

A protein has a molecular weight of 55  kDa55\;\mathrm{kDa} and an average amino acid molecular weight of 110  Da110\;\mathrm{Da}. The gene has 44 introns with an average length of 1200  bp1200\;\mathrm{bp} and 55 exons. (a) Calculate the number of amino acids and the minimum coding sequence length. (b) If the mature mRNA is 1800  bp1800\;\mathrm{bp} long, what proportion is coding sequence? (c) Calculate the pre-mRNA length.

Solution

(a) Number of amino acids: 55000110=500\frac{55000}{110} = 500 amino acids.

Coding sequence: 500×3=1500  bp500 \times 3 = 1500\;\mathrm{bp}.

(b) Proportion of mature mRNA that is coding: 15001800=83.3%\frac{1500}{1800} = 83.3\%.

The remaining 300  bp300\;\mathrm{bp} (16.7%16.7\%) are the 55' and 33' untranslated regions (UTRs).

(c) Pre-mRNA length = exons + introns = 1800+(4×1200)=1800+4800=6600  bp1800 + (4 \times 1200) = 1800 + 4800 = 6600\;\mathrm{bp}.

The introns constitute 48006600=72.7%\frac{4800}{6600} = 72.7\% of the pre-mRNA, consistent with the observation that most human genes consist predominantly of non-coding intronic sequence.


Common Pitfalls (Expanded)

  • Confusing leading and lagging strands: leading = continuous, toward fork; lagging = discontinuous, away from fork, Okazaki fragments.
  • DNA polymerase direction: always synthesises 535' \to 3'; the template is read 353' \to 5'.
  • Confusing introns and exons: introns = removed by splicing; exons = retained in mature mRNA.
  • All mutations are harmful: most are neutral; beneficial mutations drive evolution.
  • Confusing replication and transcription: replication = DNA \to DNA; transcription = DNA \to RNA.
  • Chi-squared test limitations: unreliable if any expected value <5< 5.
  • Epigenetic vs genetic: epigenetic = gene expression changes without DNA sequence change.

Exam-Style Problems

Problem 1: Extended Response -- DNA Replication and Mutagens

Describe the process of DNA replication, naming all enzymes involved and explaining their functions. Explain how each of the following mutagens causes mutations and the type of mutation produced: (a) UV radiation (thymine dimers), (b) nitrous acid (deamination), (c) benzopyrene (intercalation). Explain how the cell repairs UV-induced damage through nucleotide excision repair (NER).

Problem 2: Quantitative -- PCR and qPCR

A researcher uses qPCR to quantify the expression of gene XX in two tissue samples. For sample A, the threshold is reached at cycle 22; for sample B, at cycle 28. The standard curve (using known copy numbers) shows that 1 log10_{10} difference in copy number corresponds to 3.32 cycles. (a) Calculate the fold difference in gene XX expression between samples A and B. (b) Explain why qPCR uses a fluorescent DNA-binding dye or probe rather than measuring the final product by gel electrophoresis. (c) What is the role of the reference (housekeeping) gene in qPCR analysis?

Problem 3: Extended Response -- Operon Regulation Comparison

Compare and contrast the lac operon and trp operon in E. coli, addressing: (a) whether each is inducible or repressible, (b) the role of the effector molecule, (c) the mechanism of regulation at the operator, and (d) any additional regulatory mechanism (catabolite repression for lac; attenuation for trp). Explain the adaptive advantage of each regulatory strategy.

Problem 4: Extended Response -- Genetic Engineering Ethics

Golden Rice is genetically engineered to produce beta-carotene (provitamin A) in the endosperm, addressing vitamin A deficiency in developing countries. (a) Describe the steps used to create Golden Rice. (b) Discuss two ecological concerns about widespread cultivation of GM crops. (c) Evaluate the argument that GM crops should be banned due to the precautionary principle, considering the potential benefits for human health.

Problem 5: Data Analysis -- Pedigree and Probability

A pedigree shows a family with an autosomal recessive condition (cystic fibrosis, CF). Individual I-1 is unaffected, I-2 is unaffected. They have two children: II-1 (unaffected daughter) and II-2 (affected son). II-1 marries an unrelated unaffected man (II-3) with no family history of CF. They have one child, III-1 (unaffected). (a) Determine the genotypes of all individuals. (b) Calculate the probability that individual II-1 is a carrier. (c) If II-1 and II-3 have another child, what is the probability it will have CF? (d) The population carrier frequency for CF is approximately 11 in 2525. How does this affect the probability for a random mating?


If You Get These Wrong, Revise:


Additional Worked Examples

Worked Example: Nucleotide Counting and Chargaff's Rules

A double-stranded DNA molecule is 1200  bp1200\;\mathrm{bp} long and contains 300300 adenine bases on one strand. (a) Determine the number of each nucleotide on both strands. (b) Calculate the total number of hydrogen bonds holding the two strands together. (c) If this DNA codes for a protein, what is the maximum number of amino acids in the protein?

Solution

(a) In double-stranded DNA, A pairs with T and G pairs with C (Chargaff's rules: A=TA = T, G=CG = C).

If strand 1 has 300300 A, then:

  • Strand 1: A=300A = 300, and strand 2 (complementary) has T=300T = 300 at those positions.
  • Total TT in the molecule: the 300300 T on strand 2 that pair with the 300300 A on strand 1. But strand 1 also has its own T bases.

Total base pairs =1200= 1200. Total nucleotides =2400= 2400.

Let A1A_1 be the number of A on strand 1 =300= 300. Then T2=A1=300T_2 = A_1 = 300 (T on strand 2 pairing with A on strand 1).

Since the molecule has 24002400 total nucleotides, and Atotal=TtotalA_{total} = T_{total}, Gtotal=CtotalG_{total} = C_{total}: Atotal+Ttotal+Gtotal+Ctotal=2400A_{total} + T_{total} + G_{total} + C_{total} = 2400 2Atotal+2Gtotal=24002A_{total} + 2G_{total} = 2400 Atotal+Gtotal=1200A_{total} + G_{total} = 1200

But we only know A1=300A_1 = 300, not AtotalA_{total}. We need more information. If the problem states that one strand has 300300 A bases, then:

  • Strand 1 has A1=300A_1 = 300, so strand 2 has T2=300T_2 = 300 at those positions.
  • Strand 1 has 1200300=9001200 - 300 = 900 remaining positions. If G1=xG_1 = x, then C1=yC_1 = y and T1=zT_1 = z where x+y+z=900x + y + z = 900.
  • Without additional information about the GC content, we cannot determine unique values. However, the key relationships are:
    • Atotal=TtotalA_{total} = T_{total} and Gtotal=CtotalG_{total} = C_{total}.
    • If the GC content is 40%40\%, then Gtotal=Ctotal=0.20×2400=480G_{total} = C_{total} = 0.20 \times 2400 = 480 each, and Atotal=Ttotal=240A_{total} = T_{total} = 240 each.

(b) A--T pairs have 2 hydrogen bonds; G--C pairs have 3 hydrogen bonds. If GC content is 40%40\%: 480480 G--C pairs ×3=1440\times 3 = 1440 H-bonds; 240240 A--T pairs ×2=480\times 2 = 480 H-bonds. Total =1920= 1920 H-bonds.

(c) Maximum amino acids =12003=400= \frac{1200}{3} = 400 amino acids (assuming no stop codons in the coding sequence, which is unrealistic; the actual number would be fewer due to stop codons and non-coding regions).

Worked Example: Restriction Mapping

A circular plasmid of 5000  bp5000\;\mathrm{bp} is digested individually and in combination with two restriction enzymes, EcoRI and BamHI. The results are:

  • EcoRI alone: 2000  bp2000\;\mathrm{bp}, 3000  bp3000\;\mathrm{bp}
  • BamHI alone: 1500  bp1500\;\mathrm{bp}, 3500  bp3500\;\mathrm{bp}
  • EcoRI + BamHI: 500  bp500\;\mathrm{bp}, 1000  bp1000\;\mathrm{bp}, 1500  bp1500\;\mathrm{bp}, 2000  bp2000\;\mathrm{bp}

Determine the restriction map showing the positions and order of the EcoRI and BamHI sites.

Solution

From single digests:

  • EcoRI produces 2 fragments (20002000 and 30003000), so there are 2 EcoRI sites.
  • BamHI produces 2 fragments (15001500 and 35003500), so there are 2 BamHI sites.

Total fragments from double digest: 4, which equals 2×22 \times 2 (the expected number for 2 enzymes with 2 sites each, if no sites overlap).

The double digest fragments sum to: 500+1000+1500+2000=5000  bp500 + 1000 + 1500 + 2000 = 5000\;\mathrm{bp} (correct).

To construct the map, compare single and double digest fragments:

  • The 1500  bp1500\;\mathrm{bp} fragment appears in both BamHI alone and the double digest. This means one BamHI--BamHI fragment is not cut by EcoRI. So the 1500  bp1500\;\mathrm{bp} BamHI fragment remains intact in the double digest.
  • The 2000  bp2000\;\mathrm{bp} EcoRI fragment appears in both EcoRI alone and the double digest. So one EcoRI--EcoRI fragment is not cut by BamHI.

Working through the overlaps:

  • The 2000  bp2000\;\mathrm{bp} EcoRI fragment (from single digest) is cut into 500+1500500 + 1500 in the double digest (since 500+1500=2000500 + 1500 = 2000). Wait -- 20002000 appears in the double digest, so this fragment is NOT cut by BamHI.
  • The 3000  bp3000\;\mathrm{bp} EcoRI fragment is cut by BamHI into 1000+20001000 + 2000 in the double digest. (1000+2000=30001000 + 2000 = 3000.)

So BamHI cuts within the 3000  bp3000\;\mathrm{bp} EcoRI fragment, splitting it into 10001000 and 20002000.

Now: the 1500  bp1500\;\mathrm{bp} BamHI fragment is not cut by EcoRI (it appears intact in the double digest). The 3500  bp3500\;\mathrm{bp} BamHI fragment is cut by EcoRI into 500+1000+2000=3500500 + 1000 + 2000 = 3500. That gives 3 fragments from one region, but we only have 500500 and 10001000 and 15001500 and 20002000 in the double digest.

Let me reconsider: the 3500  bp3500\;\mathrm{bp} BamHI fragment is cut by EcoRI. Since the 2000  bp2000\;\mathrm{bp} EcoRI fragment is uncut, the EcoRI sites must lie within the 3500  bp3500\;\mathrm{bp} BamHI fragment. The 3500  bp3500\;\mathrm{bp} region is cut by 2 EcoRI sites into 3 pieces. But we need these pieces plus the intact 1500  bp1500\;\mathrm{bp} fragment to give the double digest fragments.

The 3500  bp3500\;\mathrm{bp} BamHI fragment, when cut by 2 EcoRI sites, produces 3 fragments that sum to 35003500. From the double digest, the fragments within this region would be 500500, 10001000, and 20002000 (500+1000+2000=3500500 + 1000 + 2000 = 3500).

So the order around the circular plasmid is: BamHI -- 500500 -- EcoRI -- 10001000 -- EcoRI -- 20002000 -- BamHI -- 15001500 -- (back to start).

Verification: EcoRI to EcoRI =1000  bp= 1000\;\mathrm{bp} (short fragment), and the other EcoRI to EcoRI (going the other way around) =500+1500+2000=4000= 500 + 1500 + 2000 = 4000. But EcoRI alone gave 20002000 and 30003000. This does not match.

Let me redo this more carefully. Label the four sites in order around the circle: let us place them as positions on a linear map 00 to 50005000.

BamHI sites at positions 00 and 15001500 (giving fragments 15001500 and 35003500). EcoRI sites need to be placed such that EcoRI-to-EcoRI distances are 20002000 and 30003000.

If EcoRI sites are at positions aa and bb (with a<ba < b), then ba=2000b - a = 2000 (or 30003000) and (5000b)+a=3000(5000 - b) + a = 3000 (or 20002000).

Case 1: ba=2000b - a = 2000, (5000b)+a=3000(5000 - b) + a = 3000.

Now, BamHI cuts at 00 and 15001500. The double digest fragments come from cutting at all four sites.

Case 1a: Place a=500a = 500, b=2500b = 2500. Sites in order: 00 (BamHI), 500500 (EcoRI), 15001500 (BamHI), 25002500 (EcoRI). Fragments: 5000=500500 - 0 = 500, 1500500=10001500 - 500 = 1000, 25001500=10002500 - 1500 = 1000, 50002500=25005000 - 2500 = 2500. Double digest: 500,1000,1000,2500500, 1000, 1000, 2500. This does not match (500,1000,1500,2000500, 1000, 1500, 2000).

Case 1b: Place a=500a = 500, b=2500b = 2500 is wrong. Let me try a=2000a = 2000, b=4000b = 4000. Sites in order: 00 (BamHI), 15001500 (BamHI), 20002000 (EcoRI), 40004000 (EcoRI). Fragments: 1500,500,2000,10001500, 500, 2000, 1000. That gives 500,1000,1500,2000500, 1000, 1500, 2000. This matches.

So the restriction map (going clockwise from position 0):

  • BamHI at 00
  • BamHI at 15001500
  • EcoRI at 20002000
  • EcoRI at 40004000

Fragments: BamHI--BamHI =1500= 1500; BamHI--EcoRI =500= 500; EcoRI--EcoRI =2000= 2000; EcoRI--BamHI =1000= 1000.

Verification: EcoRI alone: 20002000 (between the two EcoRI sites) and 30003000 (the rest of the circle). BamHI alone: 15001500 and 35003500. Double digest: 500,1000,1500,2000500, 1000, 1500, 2000. All correct.

Worked Example: Sanger Sequencing Reading

A Sanger sequencing reaction using the ddNTP chain termination method produces the following gel electrophoresis results (read from bottom to top, smallest to largest fragment):

Fragment size (from bottom)Lane ALane TLane CLane G
1 (shortest)band
2band
3band
4band
5band
6band
7band
8band
9band
10band

Determine the sequence of the template strand (535' \to 3') and the complementary strand.

Solution

In Sanger sequencing, each lane shows fragments terminating with a specific ddNTP. The bands are read from shortest (bottom) to longest (top), giving the sequence of the newly synthesised strand in the 535' \to 3' direction.

Reading from position 1 to 10: C, T, G, A, C, T, A, G, C, A.

New strand (535' \to 3'): 55'-CTGACTAGCA-33'

The new strand is synthesised complementary to the template strand. The template strand is read in the 353' \to 5' direction by DNA polymerase, so:

Template strand (353' \to 5'): 33'-GACTGATCGT-55', which written 535' \to 3' is: 55'-TGCTAGTCAG-33'

The coding strand (same as new strand, 535' \to 3'): 55'-CTGACTAGCA-33'

Translation (assuming the reading frame starts at position 1): Codons: CTG, ACT, AGC, A Amino acids: Leu, Thr, Ser (the last nucleotide is incomplete)

Worked Example: Genetic Probability with Multiple Genes

In humans, albinism is caused by an autosomal recessive allele (aa). Cystic fibrosis is caused by an autosomal recessive allele (cc). Both conditions are rare. A couple, both phenotypically normal, have a child with both albinism and cystic fibrosis. (a) What are the genotypes of the parents? (b) What is the probability that their next child will be phenotypically normal (neither condition)? (c) What is the probability that their next child will be a carrier of both conditions but phenotypically normal?

Solution

(a) Since the child has both conditions (aaccaacc), each parent must carry at least one recessive allele for each gene. Both parents are phenotypically normal, so their genotypes are AaCcAaCc.

(b) For each gene independently, Aa×Aa34Aa \times Aa \to \frac{3}{4} normal phenotype. P(normal for both)=34×34=916P(\text{normal for both}) = \frac{3}{4} \times \frac{3}{4} = \frac{9}{16}

(c) Carrier of both but normal: genotype must be AaCcAaCc. P(AaCc)=P(Aa)×P(Cc)=12×12=14P(AaCc) = P(Aa) \times P(Cc) = \frac{1}{2} \times \frac{1}{2} = \frac{1}{4}

(Note: this assumes the two genes are on different chromosomes. If linked, the calculation would differ depending on the recombination frequency.)

Worked Example: Analysing a Pedigree for X-Linked Dominant Inheritance

A family shows the following pattern for a rare disorder:

  • A affected father (I-1) and unaffected mother (I-2) have 4 children: 2 affected daughters, 1 unaffected daughter, 1 affected son.
  • One affected daughter (II-1) marries an unaffected man (II-2) and has 2 affected daughters and 1 unaffected son.

(a) Determine the mode of inheritance. (b) State the genotypes of all individuals. (c) Calculate the probability that II-1 and II-2's next child will be affected.

Solution

(a) Key observations:

  • Affected father passes the condition to both sons and daughters (rules out Y-linked).
  • All affected fathers' daughters are affected (consistent with X-linked dominant).
  • An affected father has an affected son: this rules out X-linked inheritance (fathers pass X only to daughters, Y to sons). An affected son means he received a Y from his father, so the trait must be autosomal dominant.

Wait -- re-reading: the affected father has an affected son. Under X-linked dominant, a father passes his X to daughters and Y to sons, so sons of an affected father should be unaffected. Since there IS an affected son, this is autosomal dominant.

(b) Let DD = dominant (affected), dd = recessive (unaffected).

  • I-1: DdDd (affected, must carry dd since the condition is rare)
  • I-2: dddd (unaffected)
  • Children of I-1 ×\times I-2: each has 12\frac{1}{2} probability of DdDd (affected) or dddd (unaffected).
  • II-1 (affected daughter): DdDd
  • II-2 (unaffected husband): dddd
  • II-1 ×\times II-2: each child has 12\frac{1}{2} probability of being DdDd (affected) or dddd (unaffected).

(c) Probability next child is affected: 12\frac{1}{2}.


Additional Common Pitfalls

  • Confusing PCR primers with RNA primers: PCR uses DNA primers (short, synthetic, heat-stable); in vivo replication uses RNA primers synthesised by primase.
  • Assuming sticky ends from different enzymes are compatible: only fragments cut with the same restriction enzyme (or enzymes producing identical overhangs) can be ligated directly.
  • Forgetting that qPCR measures DNA amount, not RNA: to measure gene expression (RNA), you must first perform reverse transcription (RT-qPCR).
  • Misidentifying the template strand in Sanger sequencing: the gel bands represent the newly synthesised strand, not the template.
  • Assuming autosomal dominant conditions cannot skip generations: while rare, reduced penetrance can cause an individual to carry the allele without expressing the phenotype.
  • Confusing recombination frequency with map distance for large distances: recombination frequency plateaus at 50%50\% for genes far apart or on different chromosomes, so map distances >50  cM> 50\;\mathrm{cM} require mapping functions (e.g., Kosambi or Haldane).

Additional Exam-Style Problems with Full Solutions

Problem 6: Extended Response -- Nucleotide Excision Repair

UV radiation causes thymine dimers (covalent bonds between adjacent thymine bases on the same DNA strand). (a) Explain why thymine dimers are problematic for DNA replication and transcription. (b) Describe the process of nucleotide excision repair (NER) in eukaryotic cells, naming all key proteins involved. (c) Explain why individuals with xeroderma pigmentosum (XP) have a greatly increased risk of skin cancer. (d) Compare NER with base excision repair (BER), explaining when each is used.

Answer 6

(a) Thymine dimers distort the DNA double helix, causing a kink. During replication, DNA polymerase cannot read past the dimer, stalling the replication fork and potentially leading to double-strand breaks or error-prone translesion synthesis. During transcription, RNA polymerase may stall or incorporate incorrect nucleotides, producing mutated mRNA and potentially dysfunctional proteins.

(b) Nucleotide excision repair in eukaryotes (global genome NER):

  1. Damage recognition: the XPC-RAD23B complex recognises the helix distortion caused by the thymine dimer.
  2. Verification: TFIIH (containing XPB and XPD helicases) verifies the damage and unwinds approximately 2020--30  bp30\;\mathrm{bp} of DNA around the lesion.
  3. Incision: endonucleases XPG (cuts 33' to the damage) and XPF-ERCC1 (cuts 55' to the damage) excise an oligonucleotide of approximately 2424--3232 nucleotides.
  4. Gap filling: DNA polymerases δ\delta or ε\varepsilon (with PCNA) synthesise new DNA using the undamaged strand as a template.
  5. Ligation: DNA ligase I seals the nick.

(c) Xeroderma pigmentosum is caused by mutations in NER genes (XPA through XPG). Without functional NER, thymine dimers accumulate. Each dimer is a potential mutation site; mutations in tumour suppressor genes (e.g., p53) or proto-oncogenes can lead to uncontrolled cell division and skin cancer. XP patients are extremely sensitive to UV light and develop skin cancers at a young age.

(d) NER removes bulky, helix-distorting lesions (thymine dimers, chemical adducts) by excising a segment of approximately 3030 nucleotides. BER removes small, non-helix-distorting base modifications (deaminated bases, oxidised bases like 8-oxoguanine, alkylated bases) by excising a single damaged base: a specific DNA glycosylase recognises and removes the damaged base, AP endonuclease cuts the backbone, DNA polymerase β\beta fills the single-nucleotide gap, and DNA ligase seals it.

Problem 7: Quantitative -- PCR Efficiency and Copy Number

A qPCR reaction starts with 10410^4 copies of a target DNA fragment. After 30 cycles, the fluorescence signal reaches the threshold. (a) Assuming 100%100\% efficiency (doubling each cycle), calculate the theoretical number of copies after 30 cycles. (b) If the actual efficiency is 92%92\% (each cycle produces 1.92×1.92\times copies), calculate the actual number of copies. (c) The reaction contains 100  μL100\;\mathrm{\mu L} total volume. Calculate the final concentration in copies/μL\mathrm{\mu L}. (d) Explain why PCR efficiency is typically less than 100%100\%.

Answer 7

(a) At 100%100\% efficiency, copies double each cycle: N=N0×2n=104×230N = N_0 \times 2^n = 10^4 \times 2^{30}. 210=10241032^{10} = 1024 \approx 10^3, so 230=(210)31092^{30} = (2^{10})^3 \approx 10^9. N=104×1.074×109=1.074×1013N = 10^4 \times 1.074 \times 10^9 = 1.074 \times 10^{13} copies.

More precisely: 230=10737418242^{30} = 1073741824. N=10000×1073741824=1.074×1013N = 10000 \times 1073741824 = 1.074 \times 10^{13} copies.

(b) At 92%92\% efficiency: N=N0×(1+E)n=104×(1.92)30N = N_0 \times (1 + E)^n = 10^4 \times (1.92)^{30}. (1.92)30(1.92)^{30}: ln(1.92)=0.6523\ln(1.92) = 0.6523, so 30×0.6523=19.56930 \times 0.6523 = 19.569, e19.569=3.17×108e^{19.569} = 3.17 \times 10^8. N=104×3.17×108=3.17×1012N = 10^4 \times 3.17 \times 10^8 = 3.17 \times 10^{12} copies.

(c) Concentration: 3.17×1012100=3.17×1010\frac{3.17 \times 10^{12}}{100} = 3.17 \times 10^{10} copies/μL\mu\mathrm{L}.

(d) PCR efficiency is less than 100%100\% because: primers may anneal imperfectly, DNA polymerase may dissociate from the template, secondary structures in the DNA may block polymerase progression, reagents become limiting in later cycles, and the denaturation temperature may not fully separate all DNA strands.

Problem 8: Extended Response -- Transcription Factors and Development

The Hox genes are a family of transcription factors that control body plan development in animals. (a) Explain what is meant by "transcription factor" and describe the general structure of a sequence-specific transcription factor. (b) Explain the concept of colinearity as it applies to Hox genes. (c) A mutation in a Hox gene causes a homeotic transformation (e.g., legs developing in place of antennae in Drosophila). Explain how a single gene mutation can cause such a dramatic phenotypic change. (d) Discuss why Hox genes are highly conserved across animal phyla.

Answer 8

(a) A transcription factor is a protein that binds to specific DNA sequences (enhancers or promoters) and regulates (activates or represses) the transcription of target genes. General structure of a sequence-specific transcription factor:

  • DNA-binding domain: recognises and binds to a specific DNA sequence (e.g., zinc finger, helix-turn-helix, leucine zipper, basic helix-loop-helix motifs).
  • Activation domain (or repression domain): interacts with other transcription factors, co-activators, or components of the basal transcription machinery (e.g., RNA polymerase II) to modulate transcription.
  • Regulatory domain: responds to signalling molecules (e.g., ligand binding, phosphorylation) that control the transcription factor's activity.

(b) Colinearity refers to the correspondence between the order of Hox genes on the chromosome and their spatial expression pattern along the anterior-posterior axis of the embryo. Hox genes at the 33' end of the cluster are expressed anteriorly; those at the 55' end are expressed posteriorly. Additionally, the temporal order of expression follows the chromosomal order: 33' genes are expressed earlier in development.

(c) Hox genes encode transcription factors that regulate the expression of many downstream genes involved in segment identity. A mutation in a Hox gene changes the spatial expression pattern of the transcription factor, causing the wrong set of downstream genes to be activated in a given body segment. For example, in Drosophila, the Antennapedia mutation causes the Antennapedia protein (normally expressed in thoracic segments) to be expressed in the head, activating leg-development genes instead of antenna-development genes. This illustrates the concept of a master regulatory gene: one gene controlling the expression of an entire developmental programme.

(d) Hox genes are highly conserved because they control fundamental aspects of body plan development. Mutations in Hox genes typically have severe consequences (often lethal), creating strong purifying selection against changes. The homeobox domain (the DNA-binding region) is particularly conserved because it must recognise specific DNA sequences; even small changes could alter target gene recognition. The conservation of Hox genes across bilaterians (from fruit flies to humans) reflects their essential role in a developmental toolkit that evolved in the common ancestor of all bilaterian animals.

Problem 9: Data Analysis -- Codon Usage and tRNA Abundance

The following table shows the codon usage frequency (per 1000 codons) for leucine in a bacterial genome and the corresponding tRNA gene copy number:

CodonUsage (per 1000)tRNA gene copies
UUA121
UUG353
CUU142
CUC101
CUA71
CUG525

(a) Explain the correlation between codon usage and tRNA gene copy number. (b) A synthetic gene for human insulin is to be expressed in this bacterium. The human insulin gene uses the following leucine codons: CUG (40%40\%), UUA (20%20\%), CUU (15%15\%), CUA (15%15\%), UUG (10%10\%). Suggest how the gene could be optimised for expression in this bacterium. (c) Explain why codon optimisation increases protein yield.

Answer 9

(a) There is a positive correlation between codon usage frequency and tRNA gene copy number. Codons used more frequently (e.g., CUG with 5252 per 10001000) have more tRNA gene copies (55 copies), while rare codons (e.g., CUA with 77 per 10001000) have fewer tRNA gene copies (11 copy). This ensures that abundant tRNAs match the most frequently used codons, maximising translational efficiency.

(b) The human insulin gene uses some rare codons in the bacterium (UUA, CUA). To optimise expression, replace rare codons with synonymous codons that are frequently used in the bacterium:

  • Replace UUA (1212 per 10001000) with CUG (5252 per 10001000) or UUG (3535 per 10001000).
  • Replace CUA (77 per 10001000) with CUG (5252 per 10001000).
  • Keep CUG and UUG (already well-matched).
  • Keep CUU (1414 per 10001000 with 22 tRNA copies) -- acceptable.

This is codon optimisation: synthesising a gene with codons preferred by the host organism.

(c) Codon optimisation increases protein yield because:

  • When a rare codon is encountered, the corresponding tRNA is scarce. The ribosome pauses or stalls while waiting for the correct aminoacyl-tRNA.
  • Prolonged stalling can cause ribosome dissociation (drop-off), premature termination, or mRNA degradation.
  • Codon optimisation ensures abundant tRNAs are available for each codon, maintaining fast, smooth translation and maximising protein output per mRNA molecule.
Problem 10: Extended Response -- CRISPR-Cas9 Gene Editing

CRISPR-Cas9 is a revolutionary gene editing technology. (a) Describe the natural function of CRISPR-Cas in bacteria. (b) Explain how CRISPR-Cas9 has been adapted for genome editing in eukaryotic cells, including the roles of the guide RNA and Cas9 nuclease. (c) A researcher wants to knock out a gene in human cells using CRISPR-Cas9. Describe the design of the guide RNA and explain how a frameshift mutation is introduced. (d) Discuss two off-target effects of CRISPR-Cas9 and how they might be minimised. (e) Evaluate the ethical implications of germline gene editing in humans.

Answer 10

(a) CRISPR-Cas is an adaptive immune system in bacteria and archaea. When a bacterium survives a bacteriophage infection, it incorporates fragments of the phage DNA (spacers) into the CRISPR array in its genome. Upon subsequent infection by the same phage, the CRISPR array is transcribed and processed into CRISPR RNAs (crRNAs). These crRNAs guide Cas proteins to the matching phage DNA, where Cas cleaves the DNA, destroying the phage genome and protecting the bacterium.

(b) In the genome editing adaptation:

  • A single guide RNA (sgRNA) is designed: it combines the crRNA (containing a 2020-nucleotide sequence complementary to the target DNA) and the tracrRNA (which binds Cas9) into a single molecule.
  • The Cas9 nuclease is the endonuclease that cuts DNA. It is directed to the target by the sgRNA.
  • The target site must be immediately upstream of a protospacer adjacent motif (PAM), which is 55'-NGG-33' for Streptococcus pyogenes Cas9.
  • Cas9 creates a double-strand break (DSB) 33 bp upstream of the PAM.
  • The cell repairs the DSB by either non-homologous end joining (NHEJ), which often introduces insertions/deletions (indels) causing frameshift mutations, or homology-directed repair (HDR), which can incorporate a desired sequence if a donor template is provided.

(c) To knock out a gene:

  1. Design a 2020-nt sgRNA complementary to an early exon of the target gene, immediately upstream of a PAM (55'-NGG-33').
  2. The sgRNA directs Cas9 to create a DSB in the exon.
  3. NHEJ repairs the break but often introduces small indels (1--10 bp).
  4. If the indel size is not a multiple of 3, it causes a frameshift, changing the reading frame downstream.
  5. The frameshift typically introduces a premature stop codon, producing a truncated, non-functional protein -- effectively knocking out the gene.

(d) Off-target effects:

  1. Cas9 cutting at genomic sites with similar (but not identical) sequences: the sgRNA may tolerate up to 3--5 mismatches, especially in the 55' end of the guide. This can cause DSBs at unintended locations, potentially disrupting other genes or regulatory elements. Minimisation: use bioinformatics to select sgRNAs with minimal similarity to other genomic sites; use Cas9 variants with higher fidelity (e.g., eSpCas9, HiFi Cas9); use paired nickases (two Cas9 nickases that create single-strand breaks on opposite strands, requiring two nearby recognition events).

  2. Large deletions or chromosomal rearrangements: simultaneous DSBs at the target site and an off-target site (or two target sites) can cause deletion of the intervening sequence, inversions, or translocations. Minimisation: use transient Cas9 expression (mRNA or ribonucleoprotein delivery rather than plasmid); validate edits by whole-genome sequencing.

(e) Ethical implications of germline editing:

  • Germline edits are heritable, affecting all future generations without their consent.
  • Unintended off-target mutations in the germline could be passed on, potentially causing disease.
  • There is a risk of creating "designer babies" -- editing for non-therapeutic traits (intelligence, appearance, athletic ability), which could exacerbate social inequality.
  • The technology could widen the gap between wealthy and poor nations if only the wealthy can access it.
  • Proponents argue germline editing could eliminate devastating genetic diseases (e.g., Huntington's disease, cystic fibrosis) from family lines.
  • The 2018 He Jiankui case (CRISPR-edited babies in China) was widely condemned for lack of ethical oversight, inadequate informed consent, and uncertain safety.
Problem 11: Quantitative -- Southern Blot and DNA Fingerprinting

A forensic DNA sample is analysed using a VNTR (variable number tandem repeat) probe. The suspect's DNA produces two bands at 3200  bp3200\;\mathrm{bp} and 5600  bp5600\;\mathrm{bp}. A crime scene sample produces two bands at 3200  bp3200\;\mathrm{bp} and 5600  bp5600\;\mathrm{bp}. The VNTR allele frequencies in the population are: 3200  bp=0.153200\;\mathrm{bp} = 0.15, 5600  bp=0.085600\;\mathrm{bp} = 0.08, and the remaining alleles collectively have a combined frequency of 0.770.77. (a) Assuming Hardy-Weinberg equilibrium, calculate the probability of this specific genotype in the population. (b) If three independent VNTR loci are tested and all match between the suspect and the crime scene sample, and the match probabilities for the other two loci are 1/1201/120 and 1/851/85, calculate the combined probability of a random match. (c) Explain why this probability does not equal the probability of the suspect's guilt.

Answer 11

(a) The suspect is heterozygous: genotype is 3200/56003200/5600. P(genotype)=2×p3200×p5600=2×0.15×0.08=0.024=1/41.7P(\text{genotype}) = 2 \times p_{3200} \times p_{5600} = 2 \times 0.15 \times 0.08 = 0.024 = 1/41.7.

This is the probability that a randomly selected individual from the population has this specific genotype at this locus.

(b) Combined probability (product rule, assuming loci are independent): Pcombined=141.7×1120×185=1425340P_{combined} = \frac{1}{41.7} \times \frac{1}{120} \times \frac{1}{85} = \frac{1}{425340}

Approximately 11 in 425000425\,000.

(c) The match probability is NOT the probability of guilt because:

  • It gives the probability that a randomly selected person would match the DNA profile, not the probability that the suspect is the source of the DNA.
  • The correct interpretation requires Bayesian reasoning: P(guiltmatch)P(\text{guilt} | \text{match}) depends on the prior probability of guilt (based on other evidence).
  • In a city of 1010 million people, 10000000/42534023.510\,000\,000 / 425\,340 \approx 23.5 people would be expected to match this profile by chance alone.
  • Other evidence (alibi, eyewitness testimony, motive) must be considered alongside the DNA evidence.
Problem 12: Extended Response -- Comparative Genomics

The human genome contains approximately 2000020\,000 protein-coding genes, while E. coli has approximately 43004\,300. However, humans have only about 1.51.5 times as many protein domains (functional units) as E. coli. (a) Explain how humans generate proteomic complexity far greater than the number of genes would suggest. (b) Describe the role of alternative splicing, with a specific example. (c) Explain how post-translational modifications expand proteome diversity. (d) The pufferfish (Fugu rubripes) genome is only 365  Mb365\;\mathrm{Mb} (human: 3200  Mb3200\;\mathrm{Mb}) but contains a similar number of genes. Explain this observation.

Answer 12

(a) Humans generate proteomic complexity through several mechanisms:

  • Alternative splicing: one gene can produce multiple mRNA isoforms, each encoding a different protein variant. Approximately 95%95\% of human multi-exon genes undergo alternative splicing, producing an estimated 55--1010 protein isoforms per gene on average.
  • Post-translational modifications (PTMs): phosphorylation, glycosylation, acetylation, ubiquitination, etc., can alter protein activity, localisation, stability, and interactions without changing the amino acid sequence.
  • Combinatorial protein interactions: proteins can form different complexes with different partners, creating functional diversity.
  • RNA editing: nucleotide changes in mRNA after transcription (e.g., A-to-I editing) can produce protein variants not encoded in the genome.
  • Proteolytic processing: proteins can be cleaved into different active fragments (e.g., proinsulin to insulin).

(b) Alternative splicing: the troponin T (TNNT) gene has 1818 exons. Through alternative splicing of 1111 of these exons, TNNT produces 6464 different mRNA isoforms in different muscle types (cardiac vs. skeletal). Each isoform has slightly different calcium-binding properties, fine-tuning muscle contraction in different tissues. Another classic example is the DSG2 (desmoglein-2) gene, which has multiple isoforms with different adhesive properties.

(c) Post-translational modifications:

  • Phosphorylation: adding a phosphate group can activate or deactivate enzymes (e.g., glycogen phosphorylase is activated by phosphorylation). A single protein can be phosphorylated at multiple sites, creating a "phosphocode" with 2n2^n possible states (nn = number of phosphorylation sites).
  • Glycosylation: addition of sugar chains can affect protein folding, stability, cell-surface localisation, and cell-cell recognition. Different glycosylation patterns on the same protein create functional variants (glycoforms).
  • Ubiquitination: marking proteins for degradation by the proteasome, or altering their activity or localisation.

(d) The pufferfish genome is compact because:

  • It has much less non-coding DNA (fewer introns, shorter introns, less intergenic sequence, fewer repetitive elements and transposons).
  • Pufferfish genes have shorter 33' and 55' UTRs.
  • The number and types of protein-coding genes are similar because all vertebrates share a common set of genes. The difference in genome size is due to "junk DNA" (non-coding, mostly repetitive sequences), not to gene number. This is known as the C-value paradox: genome size does not correlate with organismal complexity.

  • DNA structure and mutations: Review ./molecular-biology for base pairing, DNA structure, and mutation types.
  • Cell division and chromosomes: Review ./cell-biology for mitosis, meiosis overview, and chromosome structure.
  • Mendelian genetics and inheritance: Review ./genetics for monohybrid and dihybrid crosses, sex linkage.
  • Evolution and population genetics: Review ./evolution-depth for Hardy-Weinberg equilibrium, selection, and genetic drift.
  • Enzyme kinetics: Review ./metabolism-cell-biology for Michaelis-Menten kinetics applied to restriction enzymes and polymerases.
  • Immunology and antibody diversity: Review ./immunology for V(D)J recombination as a mechanism of generating diversity through DNA rearrangement.

Supplementary: Genetic Engineering Applications (HL Extension)

Genetically Modified Organisms (GMOs) -- Case Studies

Golden Rice:

  • Developed by Potrykus and Beyer (2000) to address vitamin A deficiency (VAD), which causes approximately 500000500\,000 cases of childhood blindness annually.
  • Two genes from daffodil (psy, phytoene synthase) and one from bacterium (crtI, phytoene desaturase) were introduced into rice, enabling the endosperm to produce beta-carotene (provitamin A).
  • Golden Rice 2 (2005) used a maize psy gene, achieving much higher beta-carotene levels (37  μg/g37\;\mathrm{\mu g/g}).
  • Challenges: public opposition to GMOs, regulatory hurdles, acceptance by farmers and consumers, degradation of beta-carotene during storage and cooking.

Bt Cotton:

  • Engineered to express a gene from Bacillus thuringiensis (Bt) that produces Cry protein toxins lethal to lepidopteran pests (cotton bollworm).
  • Benefits: reduced insecticide use (>50%>50\% reduction in some regions), increased yield, reduced production costs.
  • Concerns: resistance evolution in pest populations (managed by refuge planting -- non-Bt crops planted nearby to maintain susceptible pest populations), impact on non-target organisms (minimal for most non-target species but some concern for beneficial insects).

GloFish:

  • Zebrafish genetically modified with a fluorescent protein gene (originally from a jellyfish, GFP, or from coral, RFP). Originally developed for environmental monitoring (fluoresces in the presence of certain pollutants).
  • First GM animal approved for sale as a pet (2003 in the US).
  • Ethical considerations: environmental release concerns (GloFish are sterile, reducing this risk), precedent for GM pet animals.

Gene Therapy

Gene therapy aims to treat or cure genetic diseases by introducing functional copies of genes into a patient's cells.

Types:

  1. Somatic gene therapy: targets body cells (not germ cells). Changes are not heritable. Example: Luxturna (voretigene neparvovec) treats retinal dystrophy caused by RPE65 mutations by delivering a functional RPE65 gene via AAV vector.
  2. Germline gene therapy: targets germ cells or embryos. Changes are heritable. Currently banned in most countries due to ethical concerns (He Jiankui case, 2018).

Vectors:

  • Viral vectors: adenovirus (large capacity, transient expression), AAV (small capacity, long-term expression, low immunogenicity), lentivirus (integrates into genome, stable expression, risk of insertional mutagenesis).
  • Non-viral vectors: liposomes, naked DNA injection, electroporation. Lower efficiency but safer.

Challenges:

  • Immune response to the vector (especially for repeat administrations).
  • Targeting the correct tissue.
  • Achieving sufficient and sustained expression.
  • Risk of insertional mutagenesis (viral integration near oncogenes can cause cancer).
  • Ethical considerations: access, cost (current gene therapies cost \0.5--2$ million per treatment).

Pharmacogenomics

Pharmacogenomics studies how genetic variation affects drug response. Examples:

GeneVariantDrug affectedEffect
CYP2C19Poor metaboliser (*2, *3)Clopidogrel (antiplatelet)Reduced activation of prodrug; higher cardiovascular events
CYP2D6Ultra-rapid metaboliser (*1xN)CodeineFaster conversion to morphine; risk of respiratory depression
VKORC1-1639 G>AWarfarin (anticoagulant)Reduced VKORC1 expression; lower warfarin dose needed
HLA-B*57:01PresentAbacavir (HIV drug)High risk of severe hypersensitivity reaction
TPMTLow activity variantsAzathioprine (immunosuppressant)Reduced drug inactivation; higher toxicity risk

DNA Fingerprinting in Forensics

DNA fingerprinting (DNA profiling) analyses short tandem repeats (STRs) at multiple loci:

Procedure:

  1. Extract DNA from biological evidence (blood, saliva, hair, semen).
  2. PCR amplify STR loci (typically 1313--2020 loci in the CODIS system -- Combined DNA Index System).
  3. Capillary electrophoresis separates PCR products by size.
  4. Automated fragment analysis determines the number of repeats at each locus.
  5. The DNA profile is compared to:
    • A reference sample from the suspect.
    • The national DNA database (millions of profiles).
    • Population frequency databases (to calculate match probability).

Interpretation:

  • If the DNA profile from the evidence matches the suspect, the match probability (probability that a randomly selected person would match) is calculated. For 1313 CODIS loci, the match probability is typically <1< 1 in 101310^{13} (less than one in the world population).
  • The combined paternity index (CPI) or likelihood ratio is used to present the strength of evidence in court.

Limitations:

  • Partial profiles (degraded DNA, small or mixed samples) are less discriminating.
  • Contamination can produce false matches.
  • Related individuals share more DNA and have higher match probabilities.
  • Laboratory errors can occur (sample mix-ups, contamination).

Worked Example: STR Analysis and Match Probability

A forensic sample is analysed at 3 STR loci with the following results:

LocusEvidence genotypeSuspect genotypePopulation allele frequencies
D3S135815, 1715, 17p15=0.25p_{15} = 0.25, p17=0.20p_{17} = 0.20
vWA14, 1614, 16p14=0.22p_{14} = 0.22, p16=0.15p_{16} = 0.15
FGA22, 2422, 24p22=0.18p_{22} = 0.18, p24=0.10p_{24} = 0.10

(a) Calculate the match probability at each locus assuming Hardy-Weinberg equilibrium. (b) Calculate the combined match probability. (c) If the suspect's brother is also a potential source of the DNA, explain why the match probability is higher for the brother.

Solution

(a) For each locus, the probability of a random match is: P=pi2+pj2P = p_i^2 + p_j^2 if homozygous, or 2pipj2p_ip_j if heterozygous.

D3S1358 (15,17): P=2×0.25×0.20=0.100P = 2 \times 0.25 \times 0.20 = 0.100 vWA (14,16): P=2×0.22×0.15=0.066P = 2 \times 0.22 \times 0.15 = 0.066 FGA (22,24): P=2×0.18×0.10=0.036P = 2 \times 0.18 \times 0.10 = 0.036

(b) Combined match probability (product rule, assuming loci are independent): Pcombined=0.100×0.066×0.036=0.000238=1/4202P_{combined} = 0.100 \times 0.066 \times 0.036 = 0.000238 = 1/4202

Approximately 1 in 42004200 people would be expected to match this profile. With more loci (13 CODIS loci), the match probability would be <1< 1 in 101010^{10}.

(c) Siblings share on average 50%50\% of their alleles. The brother has a 50%50\% chance of sharing each allele with the suspect (vs. the population frequency for a random person). The sibling match probability is much higher than the random match probability. For a rough estimate: at each locus, the brother has approximately a 50%50\% chance of sharing each allele, so the probability of matching at one locus is approximately (0.5)2+(0.5)2=0.50(0.5)^2 + (0.5)^2 = 0.50 (if homozygous) or 2×0.5×0.5=0.502 \times 0.5 \times 0.5 = 0.50 (if heterozygous). Across 3 loci: approximately 0.53=0.1250.5^3 = 0.125 (11 in 88). This is far higher than the random match probability (11 in 42024202), illustrating why the suspect's relationship to other potential sources of DNA is relevant in forensic analysis.