The fundamental building block of all I/O in Unix is a sequence of bytes. Most programs work with an even simpler abstraction — a stream of bytes or an I/O stream.
A process references byte streams with the help of descriptors, also known as file descriptors. Pipes, TCP connections, files, TTYs, FIFOs, event queues are all examples of streams referenced by a descriptor.
Flags
open(2)
can be called with various flags that affect the behaviour of the actual file descriptor.
Commonly set flags are as follows:
O_APPEND
: The file is opened in append mode. Before each write(2), the file offset is positioned at the end of the file. The modification of the file offset and the write operation are performed as a single atomic step.O_ASYNC
: Enable signal-driven I/O: generate a signal when input or output becomes possible on this file descriptor.O_CLOEXEC
: the file descriptor remains open in the calling process but onexec(2)
the file descriptor is automatically closed in the newly exec’d program.O_CREAT
: If pathname does not exist, create it as a regular file.O_NONBLOCK
: When possible, the file is opened in nonblocking mode. Neither the open() nor any subsequent I/O operations on the file descriptor which is returned will cause the calling process to wait.- By default, read on any descriptor blocks if there’s no data available. The same applies to write or send.
- When a descriptor is set in nonblocking mode, an I/O system call on that descriptor will return immediately, even if that request can’t be immediately completed. The return value can be any of the following:
- an error: when the operation cannot be completed at all
- a partial count: when the input or output operation can be partially completed
- the entire result: when the I/O operation could be fully completed
Fork/Exec Semantics
This also highlights a difference between exec(2)
and fork(2)
:
fork
: The child process is an exact duplicate of the parent process, inheriting most of its attributes, including open file descriptors, environment variables, and process identity.- After the
fork()
call, both the parent and child processes continue execution from the point immediately after thefork()
call.
- After the
exec
: When a process calls one of theexec()
functions, the program specified by the function arguments is loaded into the process’s memory space, replacing the existing program.- The process ID (PID) of the process remains the same, but the program, file descriptors, code, data, heap, and stack are replaced with those of the new program. The new program starts executing from its entry point (
main()
function), and the original program is effectively terminated and no resources are shared.
- The process ID (PID) of the process remains the same, but the program, file descriptors, code, data, heap, and stack are replaced with those of the new program. The new program starts executing from its entry point (
When spawning new programs from existing programs, both tend to be combined as fork-exec
:
- The parent process first calls
fork()
to create a child process, which is an exact duplicate of the parent process. - The child process then calls one of the
exec()
functions to replace its current program with a new program specified by the function arguments.
Readiness
A descriptor is considered ready if a process can perform an I/O operation on the descriptor without blocking.
A descriptor changes into a ready state when an I/O event happens, such as the arrival of new input or the completion of a socket connection or when space is available on a previously full socket send buffer after TCP transmits queued data to the socket peer.
There are two ways to find out about the readiness status of a descriptor: edge-triggered and level-triggered.
- Level-triggered (poll): To determine if a descriptor is ready, the process tries to perform a non blocking I/O operation.
- Edge-triggered (push): The process receives a notification only when the file descriptor is “ready”. This is normally done with
poll(2)
File Entry
Every descriptor points to a data structure called the file entry. The file entry maintains a per descriptor file offset f_pos
in bytes from the beginning of the file entry object.
Each file entry has an array of function pointers const struct file_operations *f_op;
. This array of function pointers translates generic operations on file descriptors to file-type specific implementations.
Lifecycle
Descriptors are either created explicitly by system calls like open
, pipe
, socket
or are inherited from the parent process.
Descriptors are released when:
- the process exits. If the process in question was the last to reference the file entry, the kernel then deallocates the file entry too
- by calling the
close
system call - implicitly after an
exec
when the descriptor is marked as close on exec (viaO_CLOEXEC
).