MIT6828 学习笔记 000 (book::c1)

现在看 MIT 6.828 的标题是 6.1810 ，可能号码变了，但链接里还是 6.828 ，这一篇对应 Xv6 book 的 Chapter 1

Xv6 系统调用表：

系统调用	功能
`int fork()`	创建子进程，返回子进程 PID （在父进程处）
`int exit(int status)`	结束当前进程，并把 `status` 返回给父进程的 `wait` 无返回直
`int wait(int* status)`	等待子进程退出并在 `int* status` 得到子进程的 `status` ，返回子进程的 PID
`int kill(int pid)`	结束 PID 对应进程
`int getpid()`	获取当前进程 PID
`int sleep(int n)`	暂停 `n` 个时钟周期
`int exec(char* file, char* argv[])`	执行 `file` 带上 `argv` ，只在发生错误时返回
`char* sbrk(int n)`	给当前进程的内存扩容 `n` bytes ，返回新内存空间的开头
`int open(char* file, int flags)`	指定 `flags` 打开文件，返回 `fd` （文件描述符）
`int write(int fd, char* buf, int n)`	把 `buf` 里的 `n` bytes 写进 `fd` ，返回 `n`
`int read(int fd, char* buf, int n)`	从 `fd` 里读 `n` bytes 进 `buf` ，返回实际读进的 byte 数； `0` 表示文件尾
`int close(int fd)`	关闭 `fd`
`int dup(int fd)`	返回一个新 `fd` （关联同一个文件）
`int pipe(int p[])`	创建一个 pipe ， `p[0]` 读、 `p[1]` 写
`int chdir(char* dir)`	切换到 `dir` 下
`int mkdir(char* dir)`	创建 `dir`
`int mknod(char* file, int, int)`	创建设备文件
`int fstat(int fd, struct stat* st)`	获取 `fd` 关联文件的信息放进 `st`
`int stat(char* file, struct stat* st)`	获取 `file` 信息放进 `st`
`int link(char* file1, char* file2)`	给 `file1` 创建新名称 `file2`
`int unlink(char* file)`	删除 `file`

没有标明的情况下返回 0 表示正常， -1 表示异常

1. 进程和内存

1.1. `fork`

fork 系统调用创建一个和父进程一模一样的进程，并在父进程和子进程两处返回，父进程的 fork 系统调用返回子进程的 PID ，子进程的 fork 系统调用返回 0 （可能这就是为什么创建出的子进程在设计上与父进程一模一样）

PID ：内核给进程分配的 ID

简单的 fork 演示，正因为父子进程各有各的内存空间，父子进程的 pid 值并不相同

#include <stdio.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <unistd.h>

int main(int argc, const char* argv[]) {
    int pid = fork();
    if (pid == 0) {
        printf("child: exiting\n");
        exit(0);
    }
    else if (pid > 0) {
        printf("parent: child=%d\n", pid);
        int child_status = 0;
        wait(&child_status);
        printf("parent: child[%d] exited[%d]\n", pid, child_status);
    }
    else {
        printf("fork error\n");
    }
    return 0;
}

1.2. `exec`

exec 把当前进程的上下文替换成另一个程序后执行另一个程序的指令，一个简单的例子：

#include <stdio.h>
#include <sys/wait.h>
#include <unistd.h>

int main(int argc, char* argv[]) {
    int pid = fork();
    if (pid == 0) {
        // child process
        execv(argv[1], &argv[2]);
    }
    int child_status = 0;
    wait(&child_status);
    printf("child process exited with status %d\n", child_status);
    return 0;
}

从命令行参数读取子进程的 argv ，比如上面的代码编译成 a.out ：

1	a.out /usr/bin/echo echo 'hello world'

相当于在第 9 行 execv("/usr/bin/echo", {"echo", "hello world", NULL}) ，输出：

1 2	hello world child process exited with status 0

POSIX 的 execv 跟 Xv6 的 exec 差不多，作为演示就用 execv 了

2. IO 和文件描述符 (file descriptor)

fd 一般是一个小数字。每个进程可以通过创建管道、 dup 一个管道向内核申请 fd ，这个 fd 是一层抽象，可以关联文件、目录或设备，使它们看起来都是字节流。 Xv6 把 fd 作为每个进程的文件表 (file table) 的索引，这样每个进程的 fd 都从 0 开始，新申请的 fd 将会是可用的最小数字，一般来说一个进程有三个 fd ：

0: stdin
1: stdout
2: stderr

#include <fcntl.h>
#include <sys/wait.h>
#include <unistd.h>

// NOLINTNEXTLINE(misc-unused-parameters)
int main(int argc, char* argv[]) {
    if (fork() == 0) {
        close(STDIN_FILENO);
        open("input.txt", O_RDONLY);
        char* argv[2];
        argv[0] = "cat";
        argv[1] = NULL;
        execvp(argv[0], argv);
    }
    wait(NULL);
    return 0;
}

exec 不会把 fork 来的 file table 覆盖掉，所以默认情况下 fd 不会变

上面这段代码模拟了 shell ：

1	cat < input.txt

第 8 行把子进程的 stdin (0) 关了，第 9 行又打开了 input.txt ，这时可用的最小 fd 就是 0 ，所以这次 open 获取到的 fd 一定是 0 ，这样就把 stdin 定向到了 input.txt 文件，然后 exec 执行 cat 程序。

clang-tidy 会提醒 open 应该加上 O_CLOEXEC 意思是执行 exec 时关掉这个 fd ，这个例子正好是 O_CLOEXEC 的特例，而且 Xv6 没有实现这个 flag ，不用加。

这也解释了为什么 exec 和 fork 是分开的两个系统调用（在 fork 和 exec 之间程序员有机会重定向各个 fd ）

exec 后父子进程的 fd 会共享同一个 offset ，比如：
1
2
3
4
5
6
7
8
if (fork() == 0) {
    write(STDOUT_FILENO, "hello ", 6);
    exit(0);
}
else {
    wait(NULL);
    write(STDOUT_FILENO, "world\n", 6);
}
会向 stdout 写 hello world\n 。类似的还有在 dup 之后，两个 fd 也会共享同一个 offset ：
1
2
3
int f = dup(STDOUT_FILENO);
write(STDOUT_FILENO, "hello ", 6);
write(f, "world\n", 6);
也会向 stdout 写 hello world\n ， shell 的：
1
some_command 2>&1
也是同样的情况：执行 some_command 时使 fd 2 是通过 dup fd 1 得到的

3. 管道 (pipe)

内核提供一块缓冲区，把两个 fd 暴露给进程，一个只读、一个只写，通常被用来实现进程间通信，比如直接输出一段文本给 wc ：

#include <sys/wait.h> // wait
#include <unistd.h> // fork

// NOLINTNEXTLINE(misc-unused-parameters)
int main(int argc, char* argv[]) {
    int p[2];
    pipe(p);
    if (fork() == 0) {
        close(p[1]);
        close(STDIN_FILENO);
        dup(p[0]);
        close(p[0]);
        char* argv[2];
        argv[0] = "wc";
        argv[1] = NULL;
        execv("/usr/bin/wc", argv);
    }
    close(p[0]);
    write(p[1], "hello world\n",12);
    close(p[1]);
    wait(NULL);
    return 0;
}

相当于 shell ：

1	echo 'hello world' \| wc

第 7 行的 pipe 创建了一个管道，只读 fd 为 p[0] 、只写 fd 为 p[1] ，10-11 行和上面的 cat < input.txt 模拟代码同样的原理把子进程的 stdin 定向到了 p[0] ，然后 exec 调用 wc ，这样 wc 就会从管道读取内容。第 19 行父进程向管道只写 fd 写入内容， wc 就会收到这些内容。运行结果：

1       2      12

fork 后父子进程各有 pipe 创建的一对 fd ，上面子进程 wc 会 read 这个管道，阻塞可能有两种情况：
阻塞，直到其他进程向管道的只写 fd 写入内容
阻塞，直到拥有这个管道的只写 fd 的进程都将这个管道的只写 fd close 掉，返回 0 表示文件尾
所以第 9 行子进程必须把管道的只写 fd 关掉，否则 wc 在 read 完父进程写进管道的内容后会被下一个 read 阻塞

不用 pipe 其实也能做到：

1	echo 'hello world' > /tmp/foo; wc < /tmp/foo

但 pipe 相比于临时文件有几个优点：

不用手动清理临时文件
pipe 理论上可以不断地传输无限的数据，而临时文件需要磁盘有足够空间
pipe 的发送方和接收方可以同时工作，而临时文件接收方在发送方完成发送后才能开始工作

4. 文件系统

Xv6 文件系统可以有数据文件以及目录，目录可以包含数据文件对应的命名索引和其他目录，这样目录就会形成一棵树。比如 /etc/nginx/nginx.conf 表示根目录 / 下 etc 目录下 nginx 目录下名为 nginx.conf 的文件。

索引 (inode) ，一个索引可以有多个命名链接，每个链接里存放一个目录中的条目，这个条目包含文件名和一个指向 inode 的引用。

索引里存放文件的信息，包括：类型（文件/目录/设备）、大小、内容所在磁盘位置、指向这个索引的链接数。三者之间差不多是这样的关系：

(nginx.conf)  名为nginx.conf的链接
     ↓
  (inode)     文件的索引，inode
     ↓
  (data)      磁盘上的数据

4.1. `chdir`

切换当前目录，下面两段代码功能相同：

1
2
3

chdir("/etc");
chdir("nginx");
open("nginx.conf", O_RDONLY);

1	open("/etc/nginx/nginx.conf", O_RDONLY);

4.2. `mkdir`

创建目录：

1	mkdir("/dir");

创建文件可以 open 带上 O_CREATE ：

1 2	f = open("/dir/file", O_CREATE \| O_WRONLY); close(f);

4.3. `mknod`

创建块设备 (block device) 或字符设备 (character device) ，第二、三个参数分别为设备的 major number 和 minor number ，这两个参数唯一地标识一个内核设备。

1	mknod("/console", 1, 1);

4.4. `fstat`

stat 系列函数可以获取 inode 的信息：

1 2	struct stat st; fstat(STDIN_FILENO, &st);

Xv6 对 struct stat 的定义在 kernel/stat.h 里：

#define T_DIR     1   // Directory
#define T_FILE    2   // File
#define T_DEVICE  3   // Device

struct stat {
  int dev;     // File system's disk device
  uint ino;    // Inode number
  short type;  // Type of file
  short nlink; // Number of links to file
  uint64 size; // Size of file in bytes
};

4.5. `link`

link 系统调用给同一个 inode 创建另一个命名链接（ hard link ）：

1 2	open("a", O_CREATE \| O_WRONLY); link("a", "b");

4.6. `unlink`

unlink 系统调用移除一个链接 (hard link) ，当一个 inode 没有被任何链接指向、且没有任何进程的 fd 关联，内核会删除这个 inode ，原文件所在磁盘空间恢复可用：

1	unlink("a");

据说可以利用这个特点创建一个无名的临时文件：

1 2	f = open("/tmp/foo", O_CREATE \| O_RDWR); unlink("/tmp/foo");

创建了 /tmp/foo 后又删除了它的链接，在 /tmp 目录下就看不到 foo 的名称了，但实际上 fd f 还关联着这个文件，内核不会立刻删除它的 inode ，进程可以继续读写这个文件，而 fd f 被关闭时内核会自动删除它的 inode 。

5. Xv6

POSIX (Portable Operating System Interface) 规范了 Unix 系统调用接口，而 Xv6 因为没有实现一些系统调用（如 lseek ），且一些系统调用与 POSIX 不同（比如它的 exec ），所以不兼容 POSIX ，只能算 Unix-like 。

Xv6 不弄这些花里胡哨的，只是为了教学存在的精简操作系统，没有现代操作系统的丰富的系统调用和网络、窗口系统、用户层面线程、众多设备驱动等内核服务。而且用 Unix 术语来说所有 Xv6 进程都是以 root 身份运行的。

练习题

题目：用一对管道实现父子进程 “ping-pong” ，每次一个字节，并衡量性能（每秒传输次数）

#include <unistd.h> // fork

// NOLINTNEXTLINE(misc-unused-parameters)
int main(int argc, char* argv[]) {
    int parent_pipe2child[2];
    if (pipe(parent_pipe2child) != 0) {
        write(STDERR_FILENO, "pipe failed\n", 12);
        return -1;
    }
    int child_pipe2parent[2];
    if (pipe(child_pipe2parent) != 0) {
        write(STDERR_FILENO, "pipe failed\n", 12);
        return -1;
    }
    int pid = fork();
    if (pid < 0) {
        write(STDERR_FILENO, "fork failed\n", 12);
        return -1;
    }
    char buf = 'a';
    if (pid == 0) {
        // parent process
        close(parent_pipe2child[0]);
        close(child_pipe2parent[1]);
        while (1) {
            write(parent_pipe2child[1], &buf, 1); // ping
            read(child_pipe2parent[0], &buf, 1);
        }
    }
    else {
        // child process
        close(parent_pipe2child[1]);
        close(child_pipe2parent[0]);
        while (1) {
            read(parent_pipe2child[0], &buf, 1); // pong
            write(child_pipe2parent[1], &buf, 1);
        }
    }
    return 0;
}

我想了半天实在不知道应该怎么衡量才是最优解，懒得写了

RayAlto's Blog

MIT6828 学习笔记 000 (book::c1)

1. 进程和内存

1.1. fork

1.2. exec

2. IO 和文件描述符 (file descriptor)

3. 管道 (pipe)

4. 文件系统

4.1. chdir

4.2. mkdir

4.3. mknod

4.4. fstat

4.5. link

4.6. unlink