Skip to main content

Computer Organizations

Computer Architecture

CPU architecture

Control Unit (CU)

Program Counter (PC)

Stores the address of the next instruction

Memory Address Register (MAR)

Stores the current address

Memory Data Register (MDR)

Two way register that stores the instruction or data where MAR is pointing to.

Current Instruction Register (CIR)

Register that stores the current instruction

Arithmetic Logic Unit (ALU)

Cache

Primary Memory

Static RAM (SRAM)

SRAM are volatile semiconductor memory that uses flip flops to store each bit of data. This is characterized by fast access times and used in cache memory.

Dynamic RAM (DRAM)

DRAM are volatile memory that store each bit as a electric charge in a capacitor within each memory cell. This is characterized by denser and cheaper than SRAM, used in main system memory (RAM modules).

Machine Instruction Cycle

  • Fetch
    • The next address in PC is copied to MAR
    • PC increment to point to the next instruction
    • Instruction at memory location stored in MAR is copied to MDR
    • Instruction in MDR copied to CIR
  • Decode
    • CU decodes the instruction in CIR
  • Execute
    • CU send the signal to relevant component of the CPU
      • Arithmetic or logical operations: ALU
      • Jump commands: Store the jump address to PC

Von Neumann Architecture

The Von Neumann architecture is the dominant design for most general-purpose computers. It is characterized by:

  1. Single memory space: Both data and instructions are stored in the same memory.
  2. Single bus: Data and instructions share the same pathway between memory and the CPU.
  3. Sequential execution: Instructions are processed one at a time in sequence.

Von Neumann vs Harvard Architecture

FeatureVon NeumannHarvard
MemorySingle unified memory for data and instructionsSeparate memory spaces for data and instructions
BusSingle bus (bottleneck for simultaneous access)Separate buses for data and instructions
SpeedSlower due to bus contentionFaster — can fetch instruction and data simultaneously
ComplexitySimpler design, cheaper to manufactureMore complex, more expensive
UsageMost general-purpose computers (PCs, laptops)Digital signal processors (DSPs), microcontrollers, embedded systems

Exam tip: The IB may ask you to explain why the Von Neumann bottleneck exists. Answer: because a single bus means the CPU cannot read an instruction and read/write data at the same time, creating a performance limitation.

Binary and Number Systems

Representing Data in Binary

All data in a computer is represented using binary digits (bits) — 0s and 1s.

Units of measurement:

UnitSize
BitSingle binary digit (0 or 1)
Nibble4 bits
Byte8 bits
Kilobyte (KB)1024 bytes
Megabyte (MB)1024 KB
Gigabyte (GB)1024 MB
Terabyte (TB)1024 GB

Integer Representation

Unsigned integers represent non-negative whole numbers:

BitsRange
80 to 255
160 to 65,535
320 to 4,294,967,295

Two's complement is used to represent both positive and negative integers:

BitsRange
8-128 to 127
16-32,768 to 32,767
32-2,147,483,648 to 2,147,483,647

Converting to two's complement (e.g., represent -42 in 8-bit):

  1. Write the positive number in binary: 42 = 00101010
  2. Invert all bits (one's complement): 11010101
  3. Add 1: 11010101 + 1 = 11010110
  4. Result: -42 in 8-bit two's complement = 11010110

Verification: 11010110 in two's complement = -(invert + 1) = -(00101001 + 1) = -(00101010) = -42

Binary Arithmetic

Addition:

00101011 (43)
+ 00011101 (29)
-----------
01001000 (72)

Addition with overflow (carry beyond available bits):

11010110 (-42 in two's complement)
+ 11100111 (-25 in two's complement)
-----------
110111101 (carry bit is lost; result = 10111101 = -67)

Worked example: -42 + (-25) = -67. In 8-bit two's complement, 10111101 = -(01000010 + 1) = -(01000011) = -67. Correct.

Exam tip: Practice binary addition and two's complement conversion. The IB often asks you to perform these operations and verify the results.

Hexadecimal

Hexadecimal (base 16) is used as a shorthand for binary. Each hex digit represents 4 bits.

HexBinaryHexBinary
0000081000
1000191001
20010A1010
30011B1011
40100C1100
50101D1101
60110E1110
70111F1111

Converting binary to hex: Group bits into sets of 4 from right to left.

110101101101 0110D6

Uses of hexadecimal:

  • Color codes in HTML/CSS (e.g., #FF5733)
  • Memory addresses
  • MAC addresses
  • Error codes
  • Assembly language programming

Logic Gates and Boolean Algebra

Basic Logic Gates

GateSymbol (text)Truth tableExpression
ANDA AND BOutput 1 only when both inputs are 1A · B
ORA OR BOutput 1 when at least one input is 1A + B
NOTNOT AOutput is the inverse of the input
NANDNOT (A AND B)Output 0 only when both inputs are 1A · B ̅
NORNOT (A OR B)Output 1 only when both inputs are 0A + B ̅
XORA XOR BOutput 1 when inputs are differentA ⊕ B

AND truth table:

ABA AND B
000
010
100
111

XOR truth table:

ABA XOR B
000
011
101
110

Boolean Algebra Laws

LawExpression
IdentityA AND 1 = A; A OR 0 = A
NullA AND 0 = 0; A OR 1 = 1
ComplementA AND NOT A = 0; A OR NOT A = 1
IdempotentA AND A = A; A OR A = A
CommutativeA AND B = B AND A; A OR B = B OR A
Associative(A AND B) AND C = A AND (B AND C)
DistributiveA AND (B OR C) = (A AND B) OR (A AND C)
De Morgan'sNOT(A AND B) = NOT A OR NOT B; NOT(A OR B) = NOT A AND NOT B

Worked example: Simplify NOT(A AND B) AND (NOT A OR NOT B) using De Morgan's Law.

  1. NOT(A AND B) = NOT A OR NOT B (De Morgan's)
  2. Expression becomes: (NOT A OR NOT B) AND (NOT A OR NOT B)
  3. By idempotent law: NOT A OR NOT B

Exam tip: De Morgan's Laws are frequently tested. Memorize them and practice simplifying Boolean expressions step by step.

Worked example: Half adder

A half adder adds two single bits and produces a sum and a carry:

Sum = A XOR B
Carry = A AND B
ABSum (XOR)Carry (AND)
0000
0110
1010
1101

A full adder extends this by including a carry-in from the previous stage:

Sum = A XOR B XOR CarryIn
CarryOut = (A AND B) OR (CarryIn AND (A XOR B))

Secondary Storage

Secondary storage is non-volatile memory that retains data when the computer is powered off.

Magnetic Storage

  • Hard Disk Drive (HDD): Uses spinning magnetic platters and read/write heads. Typical capacity: 500 GB to 20 TB. Access time: 5–10 ms. Advantages: high capacity, low cost per GB. Disadvantages: mechanical parts (fragile), slower than SSDs, higher power consumption.

Solid State Storage

  • Solid State Drive (SSD): Uses NAND flash memory with no moving parts. Typical capacity: 256 GB to 4 TB. Access time: 0.1 ms. Advantages: fast access, durable, low power. Disadvantages: more expensive per GB, limited write cycles.
  • USB flash drive: Portable solid state storage. Typical capacity: 8 GB to 256 GB.
  • SD cards: Used in cameras, phones, and other portable devices.

Optical Storage

  • CD-ROM / CD-R / CD-RW: Capacity ~700 MB. Uses laser to read/write data.
  • DVD-ROM / DVD-R / DVD-RW: Capacity ~4.7 GB (single layer) or ~8.5 GB (dual layer).
  • Blu-ray: Capacity ~25 GB (single layer) or ~50 GB (dual layer). Uses shorter-wavelength blue laser for higher density.

Comparison of Storage Media

MediumSpeedCapacityVolatileDurabilityCost per GB
RAMVery fast8–128 GBYesData lost on power offHigh
SSDFast256 GB–4 TBNoNo moving partsMedium
HDDModerate500 GB–20 TBNoMechanical, fragileLow
USB FlashModerate8–256 GBNoPortable, durableMedium
CD/DVDSlow0.7–8.5 GBNoScratch-proneLow
Blu-rayModerate25–50 GBNoScratch-proneLow

Exam tip: When comparing storage media, consider access speed, capacity, portability, durability, and cost. The IB may ask you to recommend a storage solution for a specific scenario and justify your choice.

Operating Systems

Functions of an Operating System

  1. Memory management: Allocates memory to processes, implements virtual memory (using secondary storage as an extension of RAM).
  2. Processor management: Schedules CPU time among running processes using scheduling algorithms (round-robin, priority-based, first-come first-served).
  3. Device management: Manages input/output devices through device drivers; handles interrupts.
  4. File management: Organizes files in directories/folders; manages file permissions, creation, deletion, and access.
  5. User interface: Provides a command-line interface (CLI) or graphical user interface (GUI) for user interaction.
  6. Security: Implements user authentication, access control, and firewall functionality.

Types of Operating Systems

TypeDescriptionExamples
Single-user, single-taskOne user, one task at a timeMS-DOS, early embedded systems
Single-user, multi-taskOne user, multiple tasks simultaneouslyWindows, macOS
Multi-userMultiple users simultaneously with resource protectionLinux, Unix, mainframe OS
Real-timeGuaranteed response within strict time constraintsEmbedded systems, medical devices
DistributedMultiple computers working together as a single systemCloud OS, cluster computing
EmbeddedDesigned for specific hardware with limited resourcesIoT devices, routers

Virtual Memory

Virtual memory allows the computer to use more memory than is physically available by using secondary storage as an extension of RAM.

  • Paging: Memory is divided into fixed-size blocks called pages. Pages are swapped between RAM and the hard disk as needed.
  • Page fault: Occurs when the CPU tries to access a page that is not currently in RAM. The OS must fetch it from disk, which is significantly slower.
  • Thrashing: Occurs when the system spends more time swapping pages in and out of RAM than actually executing processes. This severely degrades performance.

Exam tip: The IB may ask you to explain the difference between RAM and virtual memory. Key point: RAM is physical memory; virtual memory is a technique that uses disk space to simulate additional RAM.

Processor Performance

Factors Affecting Processor Performance

FactorDescriptionImpact
Clock speedNumber of cycles per second (measured in GHz)Higher clock speed = more instructions/sec
Bus widthNumber of bits that can be transferred simultaneouslyWider bus = more data transferred per cycle
Word sizeNumber of bits the CPU can process in one operation (32-bit, 64-bit)Larger word size = more data processed
Cache sizeAmount of fast memory on/near the CPULarger cache = fewer slow RAM accesses
Number of coresNumber of independent processing unitsMore cores = more parallel processing
Instruction setRISC (fewer, simpler instructions) vs CISC (more, complex instructions)Affects efficiency for different workloads

RISC vs CISC

FeatureRISC (Reduced Instruction Set Computer)CISC (Complex Instruction Set Computer)
InstructionsFew, simple, uniform-lengthMany, complex, variable-length
ExecutionOne instruction per cycle (typically)One instruction may take multiple cycles
RegistersMany general-purpose registersFewer registers
Memory accessLoad/Store architecture (only load/store access memory)Memory operands allowed in many instructions
ExamplesARM (mobile devices), MIPS, RISC-Vx86 (Intel, AMD)
Use caseEmbedded systems, mobile devices, energy-efficient designsDesktop PCs, servers, applications needing complex ops

Multi-core Processors

Modern CPUs often contain multiple cores, each capable of independent processing:

  • Dual-core: 2 cores
  • Quad-core: 4 cores
  • Octa-core: 8 cores

Advantages: True parallel execution of multiple threads; improved multitasking performance; better performance per watt.

Challenges: Software must be written to utilize multiple cores (parallel programming); shared resources (cache, memory bus) can create bottlenecks; increased complexity in OS scheduling.

Exam tip: The IB may ask about the difference between a multi-core processor and multiple single-core processors. Key point: multi-core processors share the same package and cache, reducing communication overhead, while multiple separate processors communicate through the system bus which is slower.

Peripheral Devices

Input Devices

DeviceFunctionUse case
KeyboardText and command inputGeneral computing
MousePointing, clicking, draggingGUI navigation
ScannerConverts physical documents/images to digital formatDocument digitization
MicrophoneAudio inputVoice recording, voice commands
WebcamVideo inputVideo conferencing
TouchscreenDirect touch inputMobile devices, kiosks
Barcode readerReads barcode dataRetail, inventory
RFID readerReads RFID tags wirelesslyAccess control, tracking

Output Devices

DeviceFunctionUse case
MonitorVisual displayPrimary output
PrinterHard copy outputDocuments, photos
SpeakersAudio outputMultimedia, alerts
ProjectorLarge-scale visual displayPresentations, education
ActuatorConverts digital signals to physical movementRobotics, automation

Device Drivers

A device driver is specialized software that allows the operating system to communicate with a hardware device. Without the correct driver, the OS cannot use the device.

Example: When you connect a new printer, the OS needs the printer's driver to understand how to send print jobs, manage paper trays, and handle print settings.

Worked Example: Machine Instruction Cycle Trace

Consider the following simple program stored in memory:

AddressInstruction
100LOAD 200
101ADD 201
102STORE 202
AddressValue
20015
20127
202?

Trace:

StepActionPCMARMDRCIRALU/CU Activity
1Fetch: PC → MAR, PC + 1101100
2Fetch: Memory[MAR] → MDR101100LOAD 200
3Fetch: MDR → CIR101100LOAD 200LOAD 200
4Decode: CU decodes CIR101100LOAD 200LOAD 200CU identifies LOAD op
5Execute: Address 200 → MAR101200LOAD 200LOAD 200CU sends address to MAR
6Execute: Memory[MAR] → MDR (value = 15)10120015LOAD 200
7Execute: MDR → Accumulator10120015LOAD 200ACC = 15
8Fetch: PC → MAR, PC + 1102101
9Fetch: Memory[MAR] → MDR102101ADD 201
10Fetch: MDR → CIR102101ADD 201ADD 201
11Decode: CU decodes CIR102101ADD 201ADD 201CU identifies ADD op
12Execute: Address 201 → MAR102201ADD 201ADD 201CU sends address to MAR
13Execute: Memory[MAR] → MDR (value = 27)10220127ADD 201
14Execute: ACC + MDR → ACC10220127ADD 201ACC = 15 + 27 = 42
15Fetch: PC → MAR, PC + 1103102
16Fetch: Memory[MAR] → MDR103102STORE 202
17Fetch: MDR → CIR103102STORE 202STORE 202
18Decode: CU decodes CIR103102STORE 202STORE 202CU identifies STORE op
19Execute: ACC → MDR (value = 42)10310242STORE 202
20Execute: Address 202 → MAR10320242STORE 202CU sends address to MAR
21Execute: MDR → Memory[MAR]10320242STORE 202Memory[202] = 42

Final state: Memory address 202 contains the value 42 (15 + 27).

Exam tip: You do not need to trace every single micro-step like this in the exam. A simplified trace showing PC, MAR, MDR, CIR, and Accumulator at each instruction cycle (fetch-decode-execute) is usually sufficient. However, understanding the full detail helps you explain what happens at each stage.


Cache Memory: Worked Example

Cache memory is a small, fast memory located between the CPU and main memory (RAM). It stores frequently accessed data and instructions to reduce the average time to access memory.

Cache Hit and Cache Miss

  • Cache hit: The data or instruction the CPU needs is found in the cache. This is fast.
  • Cache miss: The data is not in the cache and must be fetched from main memory. This is slower.
  • Hit rate: The percentage of memory accesses that are cache hits.
  • Miss rate: 1hitrate1 - \mathrm{hit rate}.

Worked Example: Cache Hit Rate Calculation

Problem: A CPU makes 10,000 memory accesses. The cache has a hit time of 2 ns and the main memory has an access time of 50 ns. Out of the 10,000 accesses, 8,500 are cache hits and 1,500 are cache misses. Calculate: a) The hit rate. b) The average memory access time (AMAT). c) The total time for all 10,000 accesses if there were no cache.

Solution:

a) Hit rate:

Hitrate=CachehitsTotalaccesses=850010000=0.85=85%\mathrm{Hit rate} = \frac{\mathrm{Cache hits}}{\mathrm{Total accesses}} = \frac{8500}{10000} = 0.85 = 85\%

b) Average Memory Access Time (AMAT):

AMAT=(Hitrate×Hittime)+(Missrate×Misspenalty)\mathrm{AMAT} = (\mathrm{Hit rate} \times \mathrm{Hit time}) + (\mathrm{Miss rate} \times \mathrm{Miss penalty})

The miss penalty is the time to fetch from main memory: 50 ns.

AMAT=(0.85×2)+(0.15×50)=1.7+7.5=9.2ns\mathrm{AMAT} = (0.85 \times 2) + (0.15 \times 50) = 1.7 + 7.5 = 9.2 \mathrm{ ns}

c) Without cache (all accesses from main memory):

Totaltime=10000×50=500,000ns\mathrm{Total time} = 10000 \times 50 = 500,000 \mathrm{ ns}

With cache:

Totaltime=8500×2+1500×50=17000+75000=92,000ns\mathrm{Total time} = 8500 \times 2 + 1500 \times 50 = 17000 + 75000 = 92,000 \mathrm{ ns}

The cache reduces total access time by approximately 82%.


Pipelining

Pipelining is a technique where multiple instructions are overlapped in execution, similar to an assembly line in a factory. Instead of waiting for one instruction to complete all stages before starting the next, each stage processes a different instruction simultaneously.

The Three-Stage Pipeline

StageDescription
FetchFetch the next instruction from memory
DecodeDecode the instruction and read operands
ExecutePerform the operation and write back the result

How It Works

Without pipelining, executing 3 instructions takes 9 clock cycles (3 cycles each):

Clock: 1 2 3 4 5 6 7 8 9
Inst 1: F D E
Inst 2: F D E
Inst 3: F D E

With pipelining, 3 instructions take only 5 clock cycles:

Clock: 1 2 3 4 5
Inst 1: F D E
Inst 2: F D E
Inst 3: F D E

The speedup is approximately equal to the number of pipeline stages (in the ideal case).

Pipeline Hazards

A hazard is a situation that prevents the next instruction from executing in its designated clock cycle.

Hazard TypeCauseSolution
Data hazardAn instruction depends on the result of a previous instructionForwarding, stalling (inserting bubbles)
Structural hazardTwo instructions need the same hardware resource simultaneouslyDuplicate resources (e.g., separate caches)
Control hazardA branch/jump changes the instruction flowBranch prediction, delayed branching

Branch Prediction

When the CPU encounters a conditional branch (e.g., JUMP IF EQUAL), it does not yet know whether the branch will be taken. Branch prediction attempts to guess the outcome:

  • Static prediction: Always predict "not taken" or always predict "taken."
  • Dynamic prediction: Uses history of previous branches to make more accurate predictions. Modern CPUs achieve prediction accuracy above 95%.

If the prediction is wrong, the pipeline must be flushed (partially completed instructions are discarded), which incurs a performance penalty.


RISC vs CISC: Extended Comparison

Instruction Execution

AspectRISCCISC
Instruction countMore instructions per task (simpler each)Fewer instructions per task (complex each)
Clock speedGenerally higher (simpler circuits)Generally lower (more complex circuits)
Code sizeLarger (more instructions needed)Smaller (each instruction does more)
Compiler designSimpler (uniform instruction format)More complex (variable formats)
Power consumptionLower (simpler design)Higher (more complex design)

Practical Examples

  • RISC: ARM processors dominate the mobile and embedded market. The ARM architecture is used in virtually all smartphones, tablets, and many IoT devices. Apple's M-series chips (M1, M2) are ARM-based RISC processors.
  • CISC: The x86 architecture (Intel Core, AMD Ryzen) dominates desktop and server markets. Most personal computers run CISC processors.

Common Pitfalls

  1. Confusing registers. The MAR stores an address, while the MDR stores the data or instruction at that address. The CIR stores the current instruction being executed, not the data.

  2. PC increment timing. The PC is incremented during the fetch stage, not after execution. This means if a jump instruction is executed, the PC is overwritten with the jump address, discarding the incremented value.

  3. Cache vs RAM. Cache is smaller, faster, and more expensive per byte than RAM. SRAM is used for cache; DRAM is used for main memory. Do not confuse these.

  4. Von Neumann bottleneck. The bottleneck arises because data and instructions share a single bus, not because of the CPU speed. The Harvard architecture avoids this by using separate buses.

  5. Binary overflow. In two's complement, overflow occurs when adding two positive numbers gives a negative result, or adding two negative numbers gives a positive result. The carry bit alone does not indicate overflow.

  6. Pipelining is not always faster. Pipeline hazards (data, structural, control) can reduce the effective speedup. The theoretical maximum speedup equals the number of stages, but this is never achieved in practice.


Problem Set

Question 1

A CPU uses a three-stage pipeline (Fetch, Decode, Execute). Each stage takes 1 clock cycle. How many clock cycles are required to execute: a) 1 instruction? b) 10 instructions? c) 100 instructions?

Answer 1

a) 1 instruction: 3 cycles (the pipeline must fill before the first instruction completes). b) 10 instructions: 3+(101)=123 + (10 - 1) = 12 cycles. After the first instruction, each additional instruction adds 1 cycle. c) 100 instructions: 3+(1001)=1023 + (100 - 1) = 102 cycles.

Question 2

A computer has a two-level cache system. The L1 cache has a hit rate of 80% with an access time of 1 ns. The L2 cache has a hit rate of 95% (of the remaining accesses) with an access time of 5 ns. Main memory has an access time of 100 ns. Calculate the average memory access time.

Answer 2

L1 hit: 0.80×1=0.800.80 \times 1 = 0.80 ns. L1 miss rate: 0.200.20. L2 hit rate of misses: 0.950.95. L2 hit: 0.20×0.95×(1+5)=0.19×6=1.140.20 \times 0.95 \times (1 + 5) = 0.19 \times 6 = 1.14 ns. L2 miss (goes to main memory): 0.20×0.05×(1+5+100)=0.01×106=1.060.20 \times 0.05 \times (1 + 5 + 100) = 0.01 \times 106 = 1.06 ns.

AMAT =0.80+1.14+1.06=3.00= 0.80 + 1.14 + 1.06 = 3.00 ns.

Question 3

Convert the following 8-bit two's complement binary numbers to decimal: a) 01011010 b) 10110100 c) 11111111

Answer 3

a) 01011010: Positive (MSB = 0). 64+16+8+2=9064 + 16 + 8 + 2 = 90. b) 10110100: Negative (MSB = 1). Invert: 01001011. Add 1: 01001100 =64+8+4=76= 64 + 8 + 4 = 76. So the value is 76-76. c) 11111111: Negative (MSB = 1). Invert: 00000000. Add 1: 00000001 =1= 1. So the value is 1-1.

Question 4

Explain the difference between the Von Neumann architecture and the Harvard architecture. In your answer, describe the Von Neumann bottleneck and explain how the Harvard architecture addresses it.

Answer 4

The Von Neumann architecture uses a single memory space for both data and instructions, connected to the CPU by a single bus. This means the CPU cannot read an instruction and read/write data simultaneously, creating the Von Neumann bottleneck — the bus becomes a performance limitation because it can only transfer one item at a time.

The Harvard architecture uses separate memory spaces for data and instructions, each with its own bus. This allows the CPU to fetch the next instruction and read/write data at the same time, eliminating the bottleneck. The trade-off is increased hardware complexity and cost. Harvard architecture is commonly used in DSPs and embedded systems where performance is critical.

Question 5

A simplified CPU has the following registers: PC (initially 100), ACC (initially 0), MAR, MDR, and CIR. The following instructions are in memory:

AddressInstruction
100LOAD 150
101SUB 151
102STORE 152
AddressValue
15042
15117
152?

Trace the fetch-decode-execute cycle for each instruction, showing the state of the PC, MAR, MDR, CIR, and ACC after each cycle. What is the final value stored at address 152?

Answer 5

Instruction 1: LOAD 150

StagePCMARMDRCIRACC
Fetch101100LOAD 150LOAD 1500
Decode101100LOAD 150LOAD 1500
Execute10115042LOAD 15042

Instruction 2: SUB 151

StagePCMARMDRCIRACC
Fetch102101SUB 151SUB 15142
Decode102101SUB 151SUB 15142
Execute10215117SUB 15125

Instruction 3: STORE 152

StagePCMARMDRCIRACC
Fetch103102STORE 152STORE 15225
Decode103102STORE 152STORE 15225
Execute10315225STORE 15225

Final value at address 152: 25 (which is 421742 - 17).