tdt4186/lectures

📂 assets
📄 01
📄 02
📄 03
📄 04 ✨
📄 05
📄 06
📄 07
📄 08
📄 09
📄 10
📄 11
📄 12
📄 13
📄 14
📄 15
📄 16
📄 17
📄 18
📄 19
📄 20
📄 21
📄 22
📄 questions

Lecture 4, part 1: Processes

Previous lecture Next lecture

Exam

Details about the process abstraction and its use in Unix

Important questions:

What is the definition of a process and what is the difference to a program?
What is a process hierarchy and why does it exist?
- Parent/child processes, orphans, zombies and PID 1 (init)
How can processes perform I/O, how can it be (re)configured?
- Relation of the I/O concept to the Unix philosophy?
How do processes interact with the OS: system calls
How can processes be created/controlled/terminated?
- Which Unix syscalls are used for process management?
- Pros and cons of the Unix fork/exec model
Optimizations for process creation in Unix: copy-on-write
What are details of the extended process state model?

Review: processes..

are "programs in execution"
- dynamic, not static
- alternating sequences of "CPU bursts" and "I/O bursts"
require resources of the computer
have a state
- READY, RUNNING, BLOCKED
are conceptionally considered to be independent, concurrent control flows
are under the control of the operating system
- resource allocation and revocation

Unix (Thompson and Ritche 1968)

A system with a long (hi)story
Origin: AT&T Bell Labs
- Developed as an alternative to "Multics"
Version 1 created on a PDP 7
- written in assembler
- 8192 18 bit words of memory
Version 3 implemented in C on a PDP11
- C was created to enable OS development in a high-level language

Unix variants

Unix processes

Are the primary way to structure activities
- Application as well as system processes
Can create new processes in a fast and easy way
- Parent process -> child process
Form a process hierarchy

Unix shells

A "shell" around the operating system "core"
Text based user interface to start commands (Unix programs)
- Commands can be located anywhere on the file system
- Shell searches in directories given in the $PATH environment variable
Shell prompt example: me@unix:~>which vim
Every executed command is a separate child process
Typically, the shell blocks (waits) until the last command has terminated
It is possible to suspend, continue and terminate commands (job control) and to have commands executing in the background

Unix shells: job control

Standard I/O channels

Usually connected to the terminal in which the shell runs that started the process
- Standard input (stdin): read user input (keyboard)
- Standard ouput (stdout): text output of the process (terminal window)
- Standard error (stderr): separate channel for error messages (usually also connected to the terminal)
Almost all Unix commands also accept files as input or output channels (instead of the terminal)
Shells provide a simple syntax to redirect the standard I/O channels

Redirecting standard I/O

> redirects standard output
< redirects standard input
| (pipe) symbol tells the shell to connect the standard output of the left process to the standard input of the right process

The Unix philosophy

Doug Mcllroy, the inventor of Unix pipes, summarized the Unix philosophy as follows:

This is the Unix Philosophy:
Write programs that do one thing and do it well.
Write programs to work together.
Write programs to handle text streams,
because that is a universal interface.

This is commonly expressed in a shorter way: "Do one thing, do it well."

Lecture 4, part 2: Processes

Process-OS interaction in Unix (1)

How does an application program request a service from the operating system?
From the point of view of the application, calling an operating system service looks like a regular function call, e.g.: pid = fork();
However, arbitrarily calling code inside the OS kernel is dangerous:
- No checking of permission to execute a function
- No checking for correct parameters
  - Security nightmare!
The transition from code ecevuting in an application to code running in the kernel needs to be protected!

Process-OS interaction in Unix (2)

Many CPUs provide several execution modes:
- "user mode": only restricted functionality is allowed
- "kernel" or "supervisor mode": full access to all hardware resources
Special machine instructions are provided to transistion from user to kernel mode:
- int 0x80 (intel x86), syscall/sysenter (intel/AMD64)
- trap (Motorola 68k), SVC (ARM), ECALL (RISC-V)
Executing such an instruction causes the CPU to change its current execution mode to kernel mode and jump to an address predetermined by the processor hardware: system call

Process-OS interaction in Unix (3)

Applications can execute a syscall instruction directly, but
- This stops working when the syscall interface changes
In most modern systems, the C library (libc) provides stubs (adapter functions) that call the actual syscall
- The stub function is a regular function linked to the application

Unix process control: syscalls

A first overview of process related system calls (syscalls)

Syscall	Description	Manual section
getpid	returns PID of the calling process	(2)
getppid	returns PID of the parent process	(2)
getuid	return the UID of the calling process	(2)
fork	creates a new child process	(2)
exit	terminates the calling process	(3)
_exit	terminates the calling process	(2)
wait	waits for the termination of a child process	(2)
execve	loads and starts a program in the context of the calling process	(2)

Read Unix manual pages with man <num> <command>

Unix processes in detail: fork()

System call: pid_t fork(void)

Duplicates the calling process (the standard way to create new processes in Unix!)
The child process inherits..
- Address space (code, data, bss, stack segments)
- User and group ID
- Standard I/O channels
- Process group, signal table
- Open files, current working directory
Not copied are the following:
- Process ID (PID), parent process ID (PPID)
- Pending signals, accounting data, ..
One process calls fork, but two processes return 🤔

Use of fork()

Discussion: fast process creation

Copying the address space takes a lot of time
- Especially if the program immediately calls exec..() afterwards -> complete waste of time!
Historice solution: vfork
- The parent process is suspended until the child process calls exec..() or terminates using _exit()
- The child simply uses code and data of its parent (without copying!)
  - The child process must not change any data
  - Sometimes not so simple: e.g., don't call exit(), but _exit()!
Modern solution: copy on write
- Parent and child process share the same code and data segments using the memory management unit (MMU)
- A segment is copied onlu if the child process changes any data
- This is not the case when exec..() is called directly after fork()
- fork() using copy on write is almost as fast as vfork()

Unix processes in detail: _exit()

System call: void wait(int)

Terminates the calling process and passes an integer argument as "exit status" to the parent process
Releases the resources allocated by the process
- Open files, used memory, ..
Sends a signal SIGCHLD to its parent process
There is also a library function exit which additionally releases resources used by the C library
- Among other things, the outputs (flushes) all data still stored in output buffers!
Normal processes should use exit, not _exit

Discussion: orphaned process

A Unix process is orphaned when its parent process terminates
What happens to our process hierarchy?
- The init process (always pid 1) adopts all orphaned processes.

Unix processes in detail: wait()

System call: pid_t wait(int *)

Blocks the calling process until one of its child processes terminates
The return value is the terminated child's PID
Using the int * parameter, the caller is passed the child's "exit status" (and more)
wait returns immediately if all child processes are already terminated

Use of wait()

Discussion: zombies

A terminated process is called a "zombie" until its exit status is requested using wait
The resources allocated to such processes can be released, but the OS project management still needs to know about them
Especially the exit status has to be saved
Zombies are annoted by ps by <defunct>

Unix processes in detail: execve()

System call: int execve(const char *command, const char *args[], const char *encp[])

Loads and starts the command passed in the "command" parameter
Only returns in case of an error
- e.g. command does not exits, no access, etc.
Replaced the complete address space of the calling process
- but is remains the same process!
- Sam PID, PPID, open files, etc.
The C library provides some comfortable support functions that internally call execve: execl, execv, execlp, execvp, ..

Use of exec()

Discussion: why no forkexec()?

The parent process has more control if we separate the calls to fork and execve:
- Execute operations in the context of the child process
- Full access to the parent processes data
Unix shells use this feature to e.g.
- redirect the standard I/O channels
- configure pipes

Unix process states

Conclusion

Process management is an important part of any OS
- Unix has a process hierarchy
- The init process (PID 1) is the root of the hierarchy
Special approach taken in Unix: separate process creation (fork) and program execution (exec)!
- Used by the Unix shell to implement I/O redirection
Small set of basic system calls for process management
- Hardware support required to make fork efficient
Real-world process states are quite complex