Table of Contents

Every assignment requires substantial programming effort.

1. Pintos instructional operating system

  • A small Unix-like operating system with very limited capabilities.
  • Implemented in 2005 for use with Stanford’s CS140 OS class
  • Intended to be run on IA32/x86 processor emulator (Bochs, QEMU)

Online assignment of pintos from cms.caltech.edu.

2. Five assignments

  • Write a basic operating system shell (1 week)
  • Kernel-level threading and thread-scheduling (2 weeks)
  • Implement kernel system calls for user-mode programs (2 weeks)
  • Implement a virtual memory system for Pintos (2 weeks)
  • Implement an ext2-like filesystem for Pintos (2 weeks)

3. Example: Filesystems

3.1. 1. Standarized Interface

There are many kinds of storage media used in a typical computer, different storage technologies require different kinds of maintenance.

3.1.1. Magnetic disks are sensitive to fragmentation

Large files should be stored in contiguous regions of the disk, or disk-seek times will kill access performances.

3.1.2. SSD memory blocks must be erased before they can be rewritten, and the erase-block size is much larger than read/write page size

SSDs use a type of memory called NAND flash memory, which is organized into blocks and have constant seek time (do not care about fragmentation) SSDs also employ a technique called wear leveling, which distributes writes and erasures evenly across the memory cells to prolong the life of the drive. This process inherently involves spreading data out across the SSD, which might be seen as a form of fragmentation but doesn't degrade performance like in HDDs.

The nature of NAND require that a memory block be erased before new data can be written to it.

  1. Which means the SSDs are very inefficient in updating small amounts of data.
    • A page is the smallest unit that can be written to or read from, typically ranging from 4KB to 16KB in size.
    • A block is a larger unit consisting of multiple pages, often in the range of 128KB to 256KB, blocks are the smallest unit that can be erased.

    Say you want to update just 1 KB of data within a 256 KB block. To do this, the SSD's controller must:

    1. Read the entire 256KB block into memory.
    2. Erase the entire 256KB block on disk.
    3. Modify the 1KB of data in memory.
    4. Write back the entire 256 KB block (including the modified 1 KB and the 255 KB of unchanged data) to the SSD.
  2. To minimize performance and wear issues, the filesystem must interact with SSDs differently than with magnetic disks

    The design of NAND flash memory, with its block-based erasing and page-based writing, is a result of trade-offs between cost, complexity, performance, and storage density. (The cells in a NAND block are interconnected in such a way that it's not feasible to isolate and reset individual pages without affecting others.) Hence, the entire block must be reset at once. While block-based erasing is less efficient for small, frequent writes, it's more efficient for larger, less frequent writes, which is a common use case for many storage applications. If you're frequently updating small bits of data, you're effectively erasing and rewriting large blocks each time, which can lead to faster wear and reduced lifespan of the SSD.

3.1.3. A Virtual File System (VFS) presents a single unified view of all different disks and files in the computer.

UNIX operating systems provide a simple mechanism for interacting with storage devices in the computer. And all those devices use essentially the same interface with only the real API difference: how to open each file.

3.2. 2. Resource Sharing

In UNIX, multiple processes can manipulate the same file. How to sharing the limited hardware resources among different processes?

3.2.1. Scenario: Delete Shared Files

  • Process A opens file foo.txt to read and write it.
  • Later, process B deletes foo.txt, while A is still using it. (UNIX file deletion is performed using the unlink() system call).

The operating system must coordinate access to these shared resources in a well-defined manner (e.g. to maintain system security, correctness, performance, etc.) And different OS has different answers, which leveraged in their context:

  1. Windows: Telling process B to "buzz off".

    In Windows, if process B tries to delete a file that process A is using, that doesn't make any sense, so Windows won't do that action for process B.

  2. Unix: "Temporary Files"

    Process B is allowed to delete foo.txt while A is using by simply update the directory entry to mark foo.txt is no longer being present, but the file data itself is still present until process A is finished working with it. After process A terminates, OS reclaims space used by foo.txt.

    1. Interprocess Communication

      This method provides a simple and effective way for processes to exchange data without leaving a permanent file on the filesystem. It enhances security and cleanliness, as the file is automatically removed when the last process holding it open terminates.

    2. Temporary Files

      They are essential for cases where data needs to be written and read back during the process execution, but there is no need to retain this data after the process completes. A process can create a temporary file and immediately unlink() it. This ensures that the file is not accessible from the file system, thus providing privacy and security.

      "Research" and "development" are two different things. "Development" has clear goals but "research" is goalless, because it's the act of discovering something new. If you are a researcher, you need to continue enjoying your research at hand. Other than Unix, I've had many research topics like voice recognition, language, searching, security, games, and whatever I found intersting. Unix resulted from a research on new thing we were merely interested in, and we are very luckey it turned out to be very fruitful. – Ken L. Thompson

4. Brief History of OS

4.1. 1. Early Mainframes:

Initially, mainframe computers were general-purpose, handling everything from loading programs to printing results. This process was inefficient due to significant waiting times.

4.2. 2. Batch Processing:

To improve efficiency, later mainframes adopted batch processing. This approach allowed output to be handled by simpler, cheaper computers, saving time and better utilizing resources.

4.3. 3. Multiprogramming:

This concept revolutionized computing by allowing multiple programs to reside in memory simultaneously. The OS could switch between programs when one was idle (e.g., waiting for I/O operations), thereby keeping the CPU busy and improving system throughput.

4.4. 4. Memory Partitioning and Process Isolation:

With multiprogramming, mainframe memory was divided into sections for each job. This raised challenges like ensuring process isolation (preventing processes from interfering with each other) and securing the OS's control over memory allocation and management.

4.5. 5. Timesharing and Multitasking:

Timesharing, an extension of multiprogramming, allowed multiple users to interact directly with the mainframe, entering commands and receiving immediate feedback. This led to the concept of multitasking in operating systems, as the system would switch between users, keeping the CPU highly active.

4.6. 6. Microcomputers and GUIs:

With the spread of integrated circuits and cheaper processors, individual users began using their own microcomputers. Graphical User Interfaces (GUIs) were developed to simplify computer usage, even for those not keen on understanding the underlying technology.

4.7. 7. Multiprocessor and Multicore Systems:

As increasing CPU frequency became more challenging, multiprocessor or multicore systems emerged. These systems required OS support for complex tasks like coordinating access to shared data structures and process scheduling, taking into account multiprocessor setups to optimize cache utilization.

4.8. 8. Virtualization and Hypervisors:

Modern computing allows for running an operating system as an application within another OS. This has led to the development of virtualization technologies and hypervisors, creating new layers of complexity and capability in computing.

5. Research & Development

Each stage in this evolution reflects a response to technological advancements and changing user needs. From efficiency and resource utilization in the early days to accessibility, multitasking, and now virtualization. Have you ever thought about the reason behind it? What makes OS such easy to evolved over the years?

Ken Thompson's quote captures the essence of research and development beautifully.

5.1. Natural Needs (Research)

Research is exploratory and often without a predefined goal, driven by curiosity and the desire to uncover new knowledge or solve novel problems. In science and technology, innovation often stems from addressing the most immediate and natural problems/needs. Solutions that seem simple in hindsight are frequently the result of a deep understanding of the fundamental issues at hand.

The history of operating systems reflects a natural progression based on user needs and technological capabilities. Each stage in this evolution—from batch processing to modern virtualized and multicore environments: Addresses specific challenges and user requirements in a Natrual and logical manner.

5.2. Straightforward & Simple Approach (Development)

Development, on the other hand, is more goal-oriented, focused on creating specific products or solutions in a straightforward, simple way. Many of the most successful technologies are grounded in simplicity. This principle is evident in Unix's design philosophy, which emphasizes simplicity, clarity, and modularity. These qualities make it both powerful and versatile, allowing it to adapt to changing needs over time.

Created: 2024-10-28 Mon 19:29

Validate