Lecture 2: Resources and computer architecture
Exam
Interaction of computer architecture and the OS, resources and their management
Important questions:
- What are the building blocks of a computer system?
- Which resources are represented by these building blocks?
- How does code (in the OS) interact with hardware resources?
- What are the most relevant developments in computer architecture of the last decades and which problems/benefits are related to these developments?
Computers as they are no more
- The typical von Neumann-style computer
- Addressable unified memory for code and data
- I/O devices in the same or a different address range
- Optional: Interrupts notify CPU of the completion of an I/O operation
- Optional: I/O devices can use DMA (direct memory access) to transfer data to memory without CPU interaction
Asynchronous execution: interrupts
- Access to I/O devices is often slow
- Polling sends a command and then waits until the device returns data
- With interrupts, the device notifies the program when data is ready
- This changes the control flow the CPU executes!
- More complex to develop software for
Buses
- Components of the computer are connected by buses
- Address bus: indentifying component
- Data bus: transfer information
- Control bus: metainformation (read/write, interrupt)
- CPU has control over the bus
- Exception: DMA
Getting a bit more real
- Simple model of execution only works efficitently if the speed of memory is on par with the speed of the CPU
- This was the case until ca. 1980
- Today: 'memory gap':
- CPU speed ~ 10 000x faster, but memory speed only ~ 10x faster
Introdusing a memory hierarchy
- Idea: introduse caches
- Small, but fast intermediate levels of memory
- Caches can only hold a partial copy of the whole memory
- Unified caches vs. separate instruction and data caches
- Expensive to manufacture
- Later: introduction of multiple level of cache (L1, L2, L3..)
- Each one bugger but slower than the previous one
- Caches work efficiently due to two locality principles:
- Temporal locality: a program accessing some part of memory is likely to access the same memory soon thereafter
- Spatial locality: a program accessing some part of memory is likely to access nearby memory next
- The further from the CPU:
- Increasing size
- Decreasing speed
Memory impact: non-functional properties
- Memory has a large influence on non-functional properties of a system
- Average, best and worst case performance, troughput and latencies
- Power and energy consumption
- Reliability and security
- Non-functional properties depend on many parameters of memory, e.g.:
- Cache architecture
- Memory type
- Alignment and aliasing of data
When one processor is not enough
- Moore's Law (1965)
- Observation that the number of transistors in a dense integrated circuit (IC) doubles about every two years
- Accordingly, increase in CPU speed due to smaller semiconductor structures
- This development is hitting physical limitations
- CPU frequencies 'stuck' at ~3 GHz
- Energy consumption is additional limiting factor
- What can we do with all these transistors?
- Bigger caches - energy hungry and prone to faults!
- Put more processors on a chip!
- Earlier high-end systems already used multiple separate processor chips
- Old as well as new problems (with multiple cores):
- Memory throughput now has to satisfy demands of n processors
- Software now has to support execution on multiple processors
- Caches need to be coherent so they hold the same copies of main memory data
More processors, more memories
- Problem: Memory throughput now has to satisfy demands of n processors
- Provide each processor with its own main memory!
- NUMA (non unified memory architecture)
- And new problems show up:
- How to access data in another CPU's memory?
- Who decides which CPU is allowed to use the bus?
- Is a commom bus still efficient?
On-chip communication
- Use high-speed networks instead of conventional buses
- Using ideas from computer networking
- On-chip network can achieve high throughput and low latencies
Heterogeneous systems: GPGPUs
- In modern computers, not only CPUs can execute code
- GPGPUs (general purpose graphics processing units)
- Massively parallel processors for typical parallel tasks
- 3D graphics, signal processing, machine learning, bitcoin mining etc.
- Few features for protection, security
- Traditionally, GPUs were accessible to a single program only for drawing
- In modern systems, multiple programs want direct access to the GPGPU
- How can the OS multiplex the GPGPU safely and securely?
Security
- There's another important non-functional property!
- Multiple programs running simultaneously
- e.g. an online banking application and a video player
- How can we prevent the video player from accessing memory of the banking app?
- Restrict access to non permitted memory ranges
- The MMU only makes memory visible to a running program 'belonging' to it
The MMU
- Idea: intercept 'virtual' addresses generated by the CPU
- MMU checks for 'allowed' addresses
- Translates allowed addresses to 'physical' addresses in main memory using a translation table
- Problem: translation table for each single address would be large
- Split memory into pages of identical size (power of 2)
- Apply the same translation to all addresses in the page: page table
- MMUs were originally separate ICs sitting between CPU and RAM, today they are fitted on the CPU
Page table structure
- Find a compromise page size allowing both flexibility and efficiency
The memory translation process
- The MMU splits the virtual address coming from the CPU into three parts:
- 10 bits (31-22) page directory entry (PDE) number
- 10 bits (21-12) page table entry (PTE) number
- 12 bits (11-0) page offset inside the refernces page (untranslated)
- Translation process
- Read PDE entry from directory -> address of one page table
- Read PTE entry from table -> physical base address of memory page
- Add offset from original virtual address to obtain the complete physical memory address
Speeding up the translation
- Where is the page table stored?
- Can be several MB in size -> doesn't fit on the CPU
- Page dir and page table are in main memory!
- Using virtual memory address translation requires three main memory accesses!
- Use cache!
- The MMU uses a special cache on the CPU - the translation lookaside buffer (TLB)
What about the operating system?
- New hardware capabilities have to be used efficiently
- The OS has to manage an multiplex the related resources
- Has to provide code for all new capabilites
- These often interact with other parts of the system, making the overall OS more complex
- A modern OS also has to ensure adherence to non-functional requirements (security, energy, real-time, etc.)
- OS has to do more bookkeeping and statistics
- Some of the non-functional properties contradict each other
- Finally, the OS itself has to be efficient!