【Linux】基础IO及文件描述符相关内容详细梳理

0. C语言文件I/O

在C语言中，我们学习了相关函数来读写文件，例如：fopen，fwrite，fread，fprintf等，

在C语言中文件的打开方式：

        r Open text file for reading.
The stream is positioned at the beginning of the file.

        r+ Open for reading and writing.
The stream is positioned at the beginning of the file.

        w         Truncate(缩短) file to zero length or create text file for writing.
The stream is positioned at the beginning of the file.

        w+       Open for reading and writing.
                  The file is created if it does not exist, otherwise it is truncated.
                    The stream is positioned at the beginning of the file.

        a Open for appending (writing at end of file).
                    The file is created if it does not exist.
The stream is positioned at the end of the file.

        a+ Open for reading and appending (writing at end of file).
The file is created if it does not exist. The initial file position
for reading is at the beginning of the file,
                    but output is always appended to the end of the file.

示例：

#include <stdio.h>
#include <string.h>
int main()
{FILE *fp = fopen("myfile", "w");if(!fp){printf("fopen error!\n");}const char *msg = "hello IO!\n";int count = 5;while(count--){fwrite(msg, strlen(msg), 1, fp);}fclose(fp);return 0;
}

由于标准输出也是文件，所以输出信息到显示器就有了多种方式：

include <stdio.h>
#include <string.h>
int main()
{const char *msg = "hello fwrite\n";fwrite(msg, strlen(msg), 1, stdout);printf("hello printf\n");fprintf(stdout, "hello fprintf\n");return 0;
}

stdin & stdout & stderr：

C语言中默认打开三个输入输出流，分别是stdin & stdout & stderr，

这三个流的类型都是FILE*，fopen函数返回值类型，也就是文件指针。

C语言的文件操作函数实际上都封装了系统调用，我们以Linux系统为例，开始介绍系统IO接口。

1. 系统I/O

通过系统调用接口，我们可以实现和上面功能一致的代码：

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
int main()
{umask(0);int fd = open("myfile", O_WRONLY|O_CREAT, 0644); //打开/创建文件if(fd < 0){perror("open");return 1;}int count = 5;const char *msg = "hello IO!\n";int len = strlen(msg);while(count--){     //写入文件write(fd, msg, len);//fd: 文件描述符 msg：缓冲区首地址， len: 本次读取，期望写入多少个字节的                //数据。 返回值：实际写了多少字节数据
}close(fd);//关闭文件return 0;
}

下面详细介绍一下系统调用接口：

1.1 open

定义：

#include <fcntl.h>
int open(const char *pathname, int flags);
int open(const char *pathname, int flags, mode_t mode);

pathname：文件路径，指向要打开的文件。
flags：控制文件的打开方式，包括读、写、创建等模式。
mode（可选）：设置文件的权限，只在创建文件时有效。

参数设置：

flags：控制打开文件的行为，常见的标志如下：
- O_RDONLY：只读方式打开文件。
- O_WRONLY：只写方式打开文件。
- O_RDWR：读写方式打开文件。
- O_CREAT：若文件不存在则创建文件。
- O_TRUNC：若文件已存在并以写方式打开，将文件长度截断为0。
- O_APPEND：追加写模式，在文件末尾添加内容。
多个标志可以使用按位或运算符 | 组合在一起，例如 O_CREAT | O_WRONLY | O_TRUNC。
mode：在文件创建时指定文件的权限（八进制数表示），如0644表示用户有读写权限，组和其他人只有读权限。

返回值：

成功时返回文件描述符，这是一个非负整数。
失败时返回 -1，并设置 errno 变量来指明错误原因（例如，文件不存在、没有权限等）。

示例：

#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>int main() {int fd = open("example.txt", O_CREAT | O_WRONLY | O_TRUNC, 0644);if (fd == -1) {perror("open");return 1;}// 写入数据const char *data = "Hello, Open Function!";write(fd, data, sizeof(data));// 关闭文件close(fd);return 0;
}

1.2 write

定义：

#include <unistd.h>
ssize_t write(int fd, const void *buf, size_t count);

fd：文件描述符（file descriptor），由open函数返回，标识要写入数据的文件或设备。
buf：指向要写入的数据缓冲区的指针。
count：要写入的数据字节数。

返回值：

成功时返回实际写入的字节数（可能小于请求的count字节数），通常等于count，但在特定情况下（如磁盘空间不足、网络写入等）可能会小于count。
失败时返回-1，并设置errno来指明错误原因（如无空间、文件描述符无效等）。

示例：

#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>int main() {// 使用 open 打开或创建文件 example.txtint fd = open("example.txt", O_CREAT | O_WRONLY | O_TRUNC, 0644);if (fd == -1) {perror("open");return 1;}// 要写入的数据const char *data = "Hello, write function!";// 使用 write 写入数据ssize_t bytes_written = write(fd, data, sizeof(data));if (bytes_written == -1) {perror("write");close(fd);return 1;}printf("Wrote %zd bytes to example.txt\n", bytes_written);// 关闭文件描述符close(fd);return 0;
}

注意事项：

写入的数据量：write返回的字节数可能小于请求的字节数，应检查返回值并在必要时多次调用write来完成数据写入。
文件偏移：每次调用write会从当前文件偏移量写入数据，并将文件偏移量向后移动bytes_written字节。
缓冲区的大小：为了提高性能，可以用较大的缓冲区执行写操作。频繁的较小写入会降低性能。
文件权限：确保文件以写权限打开，否则write调用会失败。

1.3 read

定义：

#include <unistd.h>
ssize_t read(int fd, void *buf, size_t count);

fd：文件描述符，由open函数返回，用于标识要读取数据的文件或设备。
buf：指向存储读取数据的缓冲区的指针。
count：请求读取的字节数。

返回值：

成功时返回实际读取的字节数（可能小于请求的count字节数），若返回值为0，表示已到达文件末尾（EOF）。
失败时返回 -1，并设置 errno 来指明错误原因（如无效文件描述符、读取权限不足等）。

示例：

#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>int main() {// 打开文件 example.txt（假设文件已存在并包含一些内容）int fd = open("example.txt", O_RDONLY);if (fd == -1) {perror("open");return 1;}// 创建缓冲区存储读取的数据char buffer[128];// 使用 read 读取数据ssize_t bytes_read = read(fd, buffer, sizeof(buffer) - 1);if (bytes_read == -1) {perror("read");close(fd);return 1;}// 添加字符串终止符并打印读取的数据buffer[bytes_read] = '\0';printf("Read %zd bytes: %s\n", bytes_read, buffer);// 关闭文件描述符close(fd);return 0;
}

注意事项：

读取的数据量：返回的字节数可能小于请求的count字节数，尤其是在读取网络或管道等不确定大小的数据源时。
文件偏移量：每次调用read后，文件偏移量会自动移动bytes_read字节，以便下一次读取时从文件的下一个位置继续。
文件末尾（EOF）：当读取到文件末尾时，read返回值为 0，这在循环读取数据时尤其重要。
缓冲区大小：对于大型文件，可使用较大的缓冲区以减少系统调用的次数，从而提高性能。

1.4 lseek

定义：

#include <unistd.h>
off_t lseek(int fd, off_t offset, int whence);

fd：文件描述符，由open函数返回，用于标识要操作的文件。
offset：相对于 whence 指定的偏移量，可以是正数或负数。
whence：指定偏移量的参考位置，通常是以下三个常量之一：
- SEEK_SET：文件的开头。
- SEEK_CUR：当前文件位置。
- SEEK_END：文件的末尾。

返回值：

成功时返回新的文件偏移量（相对于文件开头）。
失败时返回 -1，并设置 errno 变量以指明错误原因。

作用：

定位文件读写位置：通过调整文件偏移量，可以从文件的任意位置进行读取或写入。
文件长度操作：可以将文件偏移量设置到文件末尾后的位置，再写入数据来扩展文件大小。
查找文件大小：通过将偏移量设置为文件末尾，并获取新的偏移量，可以确定文件的大小。

示例：

#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>int main() {// 打开文件 example.txtint fd = open("example.txt", O_RDWR);if (fd == -1) {perror("open");return 1;}// 将文件偏移量移动到文件的开头偏移 5 个字节处off_t new_pos = lseek(fd, 5, SEEK_SET);if (new_pos == -1) {perror("lseek");close(fd);return 1;}// 读取 10 个字节并打印char buffer[11];ssize_t bytes_read = read(fd, buffer, 10);if (bytes_read == -1) {perror("read");close(fd);return 1;}buffer[bytes_read] = '\0';  // 添加字符串终止符printf("Read %zd bytes: %s\n", bytes_read, buffer);// 关闭文件描述符close(fd);return 0;
}

注意事项：

偏移量的正负：对于 SEEK_CUR 和 SEEK_END，偏移量可以为负数，从而向文件的前面移动。
越界检查：设置文件偏移量时，应确保不会超出文件实际大小或造成无效操作。
用于扩展文件大小：如果将文件偏移量设置到文件末尾之后，并进行写入操作，系统会在中间填充空字节（通常是 \0），从而扩展文件大小。

常见用法：

文件大小检查：
- 使用 lseek(fd, 0, SEEK_END) 获取文件大小。
跳过特定字节：
- 使用 lseek(fd, offset, SEEK_CUR) 跳过指定数量的字节。
重置文件读写位置：
- 通过 lseek(fd, 0, SEEK_SET) 将文件指针移到开头。

2. 文件描述符fd

2.1 默认打开的文件描述符

Linux进程默认情况下会有3个缺省打开的文件描述符，分别是标准输入0，标准输出1，标准错误2。

所以输入输出还有以下实现方式：

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
int main()
{char buf[1024];ssize_t s = read(0, buf, sizeof(buf));//从标准输入读取if(s > 0){buf[s] = 0;write(1, buf, strlen(buf));//写入到标准输出write(2, buf, strlen(buf));//写入到标准错误} return 0;
}

文件描述符	名称	用途
0	标准输入 (stdin)	从键盘或输入流读取数据
1	标准输出 (stdout)	向屏幕或输出流写入数据
2	标准错误 (stderr)	向屏幕输出错误信息

这些标准描述符在程序启动时自动打开，因此可以直接使用而无需打开操作。例如，printf 函数默认输出到标准输出，即文件描述符 1。

2.2 文件描述符逻辑结构

实际上，文件描述符是进程中管理打开文件的file_struct表中file* fd_array[]数组的下标，其中 file*指针指向对应打开的文件，所以说我们可以通过文件描述符找到并操作对应的文件。

2.3 文件描述符的特点

唯一性：文件描述符在进程内是唯一的，但不同进程可以使用相同的文件描述符值指向不同的文件。
系统资源管理：文件描述符是系统资源，有限数量。操作系统通常为每个进程设定最大文件描述符数量限制。
引用计数：文件描述符共享同一个文件的引用计数，多个文件描述符可以指向同一个文件（例如，文件重定向、文件复制等）。

文件描述符的分配规则：

在files_struct数组当中，找到当前没有被使用的最小的⼀个下标，作为新的文件描述符。

可用以下代码验证：

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main()
{int fd = open("myfile", O_RDONLY);if(fd < 0){perror("open");return 1;}printf("fd: %d\n", fd);close(fd);return 0;
}

输出后fd为3（即0，1，2除外最小的下标）

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main()
{close(0);//close(2);int fd = open("myfile", O_RDONLY);if(fd < 0){perror("open");return 1;}printf("fd: %d\n", fd);close(fd);return 0;
}

结果为，fd为0（close（2）时为2），这就验证了上面的结论。

3. 重定向

3.1 重定向操作

先看一个命令：

ls -al > test.txt

其中'>'表示将左端输出的内容重定向到右端，而左端的默认输出是标准输出，重定向后会将ls打印的内容输出到test.txt文件中。（如果test.txt不存在，也会创建）（‘>>’是追加重定向）

那么重定向是如何实现的呢？

以上面的重定向为例：

此时，默认输出到标准输入的函数例如printf，此时对应的文件描述符仍为1，但是输出的文件却变成了test.txt。

3.2 dup2系统调用

可以通过dup2系统调用完成重定向操作

定义：

#include <unistd.h>
int dup2(int oldfd, int newfd);

oldfd：要复制的原文件描述符。
newfd：目标文件描述符，将被重定向到 oldfd 所指向的文件。

功能：

dup2 会将 newfd 指向 oldfd 所指向的文件。
如果 newfd 已经打开，dup2 会自动关闭它，然后将其重定向到 oldfd 指向的文件。
如果 oldfd 和 newfd 相同，dup2 什么都不会做，直接返回 newfd。

返回值：

成功时返回 newfd。
失败时返回 -1，并设置 errno 以指明错误原因（例如 oldfd 不存在或无效）。

示例：

#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>int main() {// 打开目标文件int fd = open("output.txt", O_WRONLY | O_CREAT | O_TRUNC, 0644);if (fd == -1) {perror("open");return 1;}// 将标准输出（文件描述符 1）重定向到文件描述符 fdif (dup2(fd, 1) == -1) {perror("dup2");close(fd);return 1;}// 使用 printf，输出会写入到 output.txt 文件中，而不是显示在终端上printf("This message is redirected to output.txt\n");// 关闭文件描述符close(fd);return 0;
}

使用场景：

重定向标准输出或标准错误：可以用 dup2 将标准输出（stdout）或标准错误（stderr）重定向到文件或设备，以保存程序的输出或日志。
多重重定向：dup2 可以与管道（pipe）配合使用，将进程的输出重定向到另一个进程的输入，实现更复杂的进程通信。
实现输入重定向：例如将标准输入（stdin，文件描述符 0）重定向到某个文件，从而在程序中读取该文件内容而非终端输入。