Linux Kernel System Calls

This topic was published by and viewed 8198 times since "". The last page revision was "".

Viewing 1 post (of 1 total)
  • Author
    Posts

  • DevynCJohnson
    Keymaster
    • Topics - 437
    • @devyncjohnson

    Many GNU/Linux users have probably heard of systems calls. A system call is a special function/command that a program uses to communicate with the kernel of the operating system. The Linux kernel has a variety of system calls that it recognizes. Learning these system calls helps people to understand how GNU/Linux works. Even general/mainstream Linux users may find it interesting to know just how complicated the system is even though the user cannot see the complexity.

    NOTE: Kernel calls is another name for system calls and so is syscall.

    There are about six kinds of system calls (depending on how you want to classify them). These six are process control, information maintenance, communication, file management, memory management, and device management. "Information maintenance" is referring to system time, attributes of files and devices, and many other sets of information. "Communication" refers to networking, data transfer, attachment/detachment of remote devices.

    When the Linux kernel receives a system call, it executes the command in kernel mode (privileged execution mode). This privileged mode is commonly called ring-0 (pronounced "ring zero").

    NOTE: Some people get interrupts and system calls mixed up. A system call is a command while an interrupt is an event that causes the CPU to stop the current task and tend to the event. Hardware interrupts are called "interrupts" and software interrupts are called "traps" or "exceptions".

    Some of you may be wondering, when an application is programmed, how does it get the code for the standard system calls. Well, the system calls come in the GNU C Library which is also called glibc. This is the library used for applications that run on systems using the Linux kernel and Hurd kernel or any GNU userland. Some derivatives are used in applications running on other kernels. For instance, after some major tweaking, glibc works on the NetBSD, OpenSolaris, FreeBSD kernel. FreeBSD and NetBSD typically use their own libc called "BSD libc". The modified glibc mentioned is used in the Debian system that uses the FreeBSD and NetBSD kernel (Debian GNU/FreeBSD and Debian GNU/NetBSD). Some other glibc derivatives and alternatives include

    • μClibc - This libc is used in mobile devices using the Linux kernel (except Android).
    • Bionic - Used in the Android OS. Bionic is based on BSD libc.
    • dietlibc - This is a lightweight libc for embedded systems.
    • Embedded glibc (EGLIBC) - The libc used in embedded systems is a tweaked/optimized version of the standard glibc.
    • klibc (Kernel libc) - The Linux kernel uses klibc while starting up. Once the kernel is loaded, it then uses glibc. However, not all distros use klibc.
    • Newlib - Used in embedded systems.

    These libraries provide various headers for C/C++ programming. The system calls are put in the code by importing a library as seen below. All of the system calls are not in one header, so an application only contains the system calls that it needs (unless there are some extra calls in the imported library that the program does not use).

    #include <HEADER.h>

    One reason why applications compiled for one operating system do not work on another is because the application uses different system calls. Wine is a compatibility layer (not an emulator) that allows Windows software to work on GNU/Linux and other Unix and Unix-like systems. This works because the Windows system calls are converted to the system calls that Linux recognizes (there are other mechanisms that make Windows applications work). If all systems used the same system calls, then some applications would be more cross-platform (some or many exceptions would exist). Think about source code. An application can be compiled on Linux, Solaris, and FreeBSD, but the binaries would only work on the operating system type on which the application was compiled.

    Winelib is a libc used to compile with source code that only works on Windows systems. However, Winelib makes the compiled program work on Unix and Unix-like systems. Beware, Winelib is not perfect and may not work with some programs. Also, Winelib only works with 32-bit software. Usually, to use Winelib, the make-file for the source code needs some tweaking.

    Once a Linux user learns about the different system calls, then it becomes clear just how complex Linux can be in completing common tasks. Below, many system calls are listed and explained. Notice the double parenthesis after each one. These exist because in computer programming (most languages) functions and commands that are to be executed end in "()" with parameters within the parenthesis. Thus, all of the system calls are functions that are defined and programmed in the kernel's source code.

    NOTE: Some of the obsolete ones will be listed, but most obsolete syscalls will not be mentioned.

    System Calls

    • accept() - This system call creates socket connections. The similar system call accept4() supports flags. This syscall supports various protocols such as IPv4/6, Appletalk, IPX, and others (including sockets for communications between processes).
    • access() - This checks the permissions of a file before the calling process can access the file. This syscall first ensures the specified file exists. If so, then the system call checks if the process/user may read, write, and/or execute the file.
    • acct() - Process accounting is turned on and off using this system call. Process accounting is record keeping for executed commands. This allows admins to be aware of all commands that were executed, who/what executed them, etc.
    • add_key() - This adds or updates a key in the kernel's key management facility.
    • adjtimex() - This system call updates the kernel's clock using an algorithm by David L. Mills'. This system call can also get various information like the amount of microseconds between ticks, current time, offset, precision, etc.
    • alarm() - An alarm is set which will send a signal to a process.
    • alloc_hugepages() - An old system call (no longer used) that allocated and freed huge pages (large chunks of memory).
    • bind() - Newly created sockets get an address (sometimes called a name) from this system call. When connecting to a server, the client uses bind() on its side of the connection (the initiated side) and the server will use connect() on its side of the socket.
    • brk() - Memory can be given to or taken from running processes using this system call. A "program-break" is the last part of a processes memory on RAM. More memory can be given to the process by allocating more memory at the program-break, and deallocated the program's memory removes memory at the program-break.
    • cacheflush() - Flush the data cache in the specified address.
    • capget() - Get the capabilities of threads.
    • capset() - Set the capabilities of threads.
    • NOTE: A thread's capabilities refers to its attributes and permissions such as permissions to access particular network ports, execute Root programs, etc. A complete list of the capabilities may be found here /usr/include/linux/capabilities.h
    • chdir() - Yes, this command users regularly use to change the current directory is a system call.
    • chmod() - Surprise! Another commonly used command is a system call. This one changes file permissions.
    • chown() - At this point, you may not be shocked; another system call that changes file ownership.
    • chroot() - This popular shell command is also a system call. This changes the root directory.
    • clock_getres() - This retrieves the clock's resolution. Resolution is another term for precision. This syscall only works on POSIX clocks.
    • clock_nanosleep() - This system call is like the commonly used "sleep" command, but this system call pauses threads at the nanosecond level.
    • clone() - Like fork(), the process is forked, but not with the same results as fork(). There are several differences between fork() and clone(), but clone() makes a child process that uses the parent process's memory space while fork() gives the child process its own memory space.
    • close() - After a program is done writing or reading a file, the file should be closed to release memory and a file descriptor for reuse. The close() system call performs the closing of the file. Some documentation may say close() closes a file descriptor. This is also true. Closing a file descriptor just means a file descriptor is freed.
    • connect() - Create a connection to a socket.
    • creat() - This system call creates a file. No, this is not a typo. The system call really is called "creat()" without the second "e".
    • delete_module() - Kernel modules are unloaded by this system call. If the specified module is being used or is needed by other modules or the kernel itself, then the syscall will leave the module alone.
    • dup() - File descriptors can be duplicated with this system call. File descriptors would need to be duplicated when a thread viewing a file forks or when a command's output is redirected.
    • epoll_create() - Create a new file descriptor for a new instance of epoll.
    • epoll_ctl() - This system call is used to perform various tasks on an epoll file descriptor.
    • epoll_wait() - This system call waits until an event is performed on a specified epoll file descriptor. This syscall is important when software should only perform some action after a particular event happens to an epoll file descriptor.
    • eventfd() - The Event-File-Descriptor system call creates a file descriptor that is used to notify software about events or to make some software wait on some event.
    • NOTE: An abbreviation for file descriptor is "fd". So, system calls that end in "fd" may relate to file descriptors.
    • execve() - Have you ever wondered which system call (if any) causes executable to run? Well, this system call is the one that does so. execve() can also execute scripts that begin with a valid hashpling. If needed, this syscall will call the Linux Dynamic Linker (ld.so) to set and link required libraries to the executable.
    • _exit() - Processes/threads use this system call to close themselves. Yes, there is an underscore at the beginning of this syscall. If a thread calls _exit(), then only that thread closes.
    • FUN FACT: Programs close in one of three ways – kill signal, fatal error, or calling _exit(). Notice that only one out of three is a graceful way to close. In other words, programs either willing close (_exit()), crash (fatal error), or they are murdered (kill signal). Wow, software has a harsh life (^u^).
    • exit_group() - All of a process's threads and associated threads are closed with this syscall. This is a special form of _exit().
    • faccessat() - The permissions for the specified file is checked, but this is performed using a directory file descriptor.
    • NOTE: In the most simplest terms, a file descriptor is a special number used to access a file.
    • posix_fadvise() - (commonly called fadvice(), although the actual call is posix_fadvise()) This system call is used to optimize data access. Specifically, this syscall plans ahead what file will be accessed and how to get the data. This speeds up data access for the kernel.
    • fallocate() - Disk space of a specified file is manipulated by this kernel call. Obviously, since every filesystem type (XFS, EXT4, NTFS, tmpfs, etc.) is different, this syscall does not work on all filesystems. fallocate() also works on some pseudo/virtual filesystems like tmpfs.
    • fchmod() - This syscall is the same thing as chmod(). The difference lies in the fact chmod() accepts a path name and fchmod() accepts a file descriptor instead.
    • fchmodat() - This system call changes a file's permissions and the file is specified using a file descriptor. fchmodat() is exactly like chmod() with the difference being their accepted input, a file descriptor and path, respectively. fchmod() and fchmodat() work a little differently from each other.
    • fchown() - Just like chown(), the owner of the specified file is changed. However, fchown() knows the file by its file descriptor, not its path.
    • fchownat() - This syscall is just like fchown().
    • fcntl() - Manipulate the specified file descriptor.
    • fgetxattr() - This syscall gets the value of the specified extended file attribute.
    • finit_module() - Using a file descriptor, an ELF-image is loaded into the kernel space.
    • flistxattr() - Given a file descriptor, this syscall will list the extended attributes owned by the specified file.
    • flock() - Create or remove an advisory lock on the specified file (the file must be open).
    • fork() - Child processes are commonly created using this kernel call. With this call, the child process gets its own PID and memory space. Many other attributes are not inherited.
    • free_hugepages() - Free huge pages (large chunks of memory).
    • fremovexattr() - When using a file descriptor, this syscall can remove an extended attribute.
    • fsetxattr() - With a known file descriptor, an extended attribute can be set.
    • fstat() - The status of a file can be read with this syscall when given a file descriptor.
    • fstatat() - With a directory file descriptor, a file's status can be read.
    • fstatfs() - The statistics of a filesystem can be retrieved with this kernel call. This system call is directed to the filesystem in question by using a file descriptor of any given file on that filesystem.
    • ftruncate() - With a given file descriptor, this kernel call will truncate the specified file to a desired length. This may mean cutting the file, thus losing data, or enlarging the file by adding null bytes. Null bytes are designated as a backslash zero (\0).
    • futex() - Futex stands for Fast User-space muTEX. With this syscall, threads and processes can adhere to the futex standard so the executing code can wait for shared resources. Any code that calls futex() must be written in non-portable assembly instructions.
    • get_kernel_syms() - All of the exported module and kernel symbols can be read with this system call.
    • NOTE: A symbol is a function or variable.
    • get_mempolicy() - To get the NUMA memory policy for a process, use this syscall.
    • get_robust_list() - The robust-futex list can be retrieved with this kernel call.
    • getcpu() - This syscall allows the calling thread to be found on a specific CPU and NUMA node. This is like the thread yelling "I am over here!".
    • getcwd(), getwd(), and get_current_dir_name() - GET Current Working Directory. These three syscalls give the same result, but they each function a little differently.
    • getdents() - This system call gets directory entries.
    • getgid() - The syscall retrieves the real GID of the calling process.
    • getegid() - The syscall retrieves the effective GID of the calling process.
    • FUN FACT: getgid() and getegid() are claimed to NEVER fail according to the man pages. Do we agree with that, or has someone found an exception?
    • getitimer() - This syscall gets the current value of one of the timers of a process. Each process has three timers - ITIMER_REAL (decrements in real time), ITIMER_VIRTUAL (decrements during execution), and ITIMER_PROF (decrements while either the process or system executes). These three timers are called "interval timers".
    • getpeername() - The name of a connected peer socket can be retrieved using this syscall.
    • getpagesize() - The size of a regular page in memory can be know with this kernel call.
    • NOTE: In simplest terms, a page in memory is analogous to a block on a magnetic hard-drive.
    • getpgid() - This syscall gets the PGID of the specified process by using its PID.
    • getpid() - The PID of the calling process is returned.
    • getppid() - The PID of the calling process's parent is returned.
    • getpriority() - The scheduling priority of the specified program is returned by this syscall.
    • getresuid() - The RUID, EUID, and the SUID of the calling process is returned.
    • getresgid()- The RGID, EGID, and the SGID of the calling process is returned.
    • NOTE: GID = Group ID. UID = User ID. R = Real. E = Effective. S = Set.
    • getrlimit() - This kernel call returns the resource limit of a process.
    • getrusage() - The amount of resources used by the specified process is given by this syscall.
    • getsid() - This returns the session ID. "sid" stands for Session ID.
    • getsockname() - The address (name) of the socket is returned by this kernel call.
    • getsockopt() - The options for a specified socket is listed by getsockopt().
    • gettid() - The TID of a thread can be seen with this syscall. TID stands for thread identification.
    • gettimeofday() - The current time and timezone can be seen with this call.
    • getxattr() - With a given inode, this system call retrieves the extended attributes associated with the inode.
    • NOTE: Extended attributes are attributes not normally supported by the filesystem.
    • init_module() - Kernel modules are loaded with this syscall. This system call loads the module into the kernel space and then performs other needed tasks to prepare the module for runtime.
    • inotify_add_watch() - Given a file-path, this syscall creates or modifies an inotify watch.
    • inotify_init() - After an inotify watch is created, it must be started via inotify_init().
    • inotify_rm_watch() - When an inotify watch is no longer needed, this syscall removes the watch specified by a watch descriptor (wd). When watches are made with inotify_init(), the wd is given.
    • io_cancel() - Asynchronous IO tasks are canceled with this syscall.
    • io_destroy() - Instead of canceling asynchronous IO tasks, they can be destroyed, meaning all asynchronous IO tasks associated with the given identifier will be canceled.
    • io_getevents() - The asynchronous IO events listed in the completion queue can be seen with this syscall.
    • io_setup() - Asynchronous IO contexts are made using io_setup().
    • io_submit() - To queue asynchronous IO blocks, use this kernel call.
    • ioctl() - With a file descriptor for a device-file, the device's parameters can be changed.
    • ioperm() - The IO permission of ports are set with this call.
    • iopl() - This kernel call changes the IO privilege level of the process that executed this call.
    • ioprio_get() - The priority and IO scheduling class of threads can be seen with this call. This call can return this information for one or multiple threads.
    • ioprio_set() - The priority and IO scheduling class of a thread can be set with this call.
    • ipc() - To execute System V IPC system calls, the Linux kernel uses ipc() to start a System V IPC call. Not all architectures support ipc(). For instance, this call is not seen in ARM systems. Obviously, when making cross-platform software, do not use ipc().
    • kcmp() - With two PIDs, this system call can identify what kernel resources (if any) are shared between two processes.
    • kexec_load() - This syscall sets up a kernel to be executed after the next reboot. This is useful for running a diagnostic kernel after a system crash.
    • keyctl() - Changes to the key management facility of the kernel is made via keyctl().
    • kill() - A kill signal is sent to the specified process. As you may have noticed, many system calls require a pid, file descriptor, or some other low-level (closer to the hardware/inner-workings) identification to know on what to do work. Since kill is like other syscalls, that is why it uses a pid. Yes, in a command-line users are using a system call (kill) to perform a low-level operation on the system.
    • lgetxattr() - This is just like getxattr() except this system call is used on links to get the extended attributes of the link itself.
    • link() - Hard links are created using link().
    • listen() - This is the system call that listens for connections on a socket.
    • lookup_dcookie() - Using a cookie, the full path of a directory entry can be seen. A cookie is a directory entry identifier.
    • lremovexattr() - The extended attributes of a symbolic link can be removed. The extended attributes are untouched in the file to which the link points.
    • lseek() - A file's offset is changed. This syscall identifies the file based on file descriptor (fd).
    • lsetxattr() - This syscall sets extended attributes on links.
    • lstat() - This call provides information about the specified link.
    • madvise() - This system call is used by applications to give the kernel advice on how the application wishes to use memory. The kernel typically maps out memory as it is needed and as the kernel sees fit. The madvise() syscall comes form an application that can probably run more efficiently if its memory usage is managed in a particular way. Notice that the syscall is advice, meaning the kernel may disregard the application's request.
    • mbind() - This syscall sets the NUMA memory policy.
    • migrate_pages() - All of the memory pages belonging to the specified node will be moved to another node in memory.
    • mincore() - This kernel call checks to ensure the needed pages of memory (a page of memory is like a block of data on a hard-drive) exist where they are expected. If memory is accessed but is missing, a page fault will result, thus causing some severe errors. Thanks to this syscall, such errors can be prevented when the syscall is used.
    • mkdir() - This commonly known shell command is actually a system call that creates a directory. (You probably already knew that)
    • mknod() - This syscall can make a file, device file, or a named pipe.
    • mlock() - A specified portion of a processes virtual address space can be set to remain on the RAM and not go to the swap area. mlockall() is used to lock all of the virtual address space and munlock() and munlockall() can undo those syscalls.
    • mmap() - This maps memory for processes. munmap() unmaps the memory.
    • mount() - Here is another shell command that is actually a syscall. As you all may know, this mounts filesystems whether virtual (pseudo filesystems like tmpfs), real (ext4, fat32), network filesystems (NFS), files (iso files like DVD images).
    • move_pages() - This is another syscall that moves pages, but this syscall moves the memory page-by-page rather than in bulk or whole nodes.
    • mprotect() - This syscall changes the protection of the calling process's memory. In memory, "protection" is analogous (just like/equal) to permissions of files on hard-drives.
    • mq_notify() - Processes can "subscribe" to certain system messages (notifications). This system call allows processes to do so to specific types of notifications in the message queue.
    • mq_open() - POSIX message queues can be made or opened using this kernel call.
    • mq_send() - Messages are sent to the message queue using this system call.
    • mq_unlink() - Message queues can be deleted using this syscall.
    • mremap() - Memory REMAP is a syscall that remaps a virtual memory address. This means the kernel call gets a section of data and changes the size and location of that data's allocated area in memory.
    • msync() - As many people may know, when a file is edited (for example, a plain text file), the file is loaded to memory and changes take place there. To save the changes to the hard-drive, msync() synchronizes the file on RAM with the older file on the hard-drive.
    • nanosleep() - Like the sleep command commonly used in shell scripts, this command suspends execution on that thread. However, this command works on the nano-scale level.
    • nfsservctl() - This is the interface for the NFS daemon.
    • nice() - That commonly used and known command "nice" is another syscall.
    • uname() - Here is yet another syscall that is sometimes used by the user in a command-line or script.
    • open() - This syscall opens files.
    • NOTE: Sometimes, the calling process is referred to as a local process and the other processes are remote. For instance, if both Firefox and Thunderbird are running on the same machine, Firefox refers to Thunderbird as a remote process as does Thunderbird to Firefox. Each process views themselves as local.
    • pause() - This kernel call makes the calling process pause until one of two events take place. These two events are the death of the process (like a kill signal) or receiving a signal.
    • pciconfig_iobase() - This call is used to get information about IO regions on memory.
    • perf_event_open() - The system's performance is monitored when this syscall is executed.
    • personality() - This syscall creates the process's execution domain. In computing, a personality is the way an executable behaves. This refers to the different system calls and application binary interfaces (ABI).
    • perfmonctl() - This kernel call is the interface for the performance monitoring unit (PMU) of IA-64 CPUs.
    • pipe() - This kernel call makes a pipe (|) which is a form of interprocess communication. This sends data from one process to another, and data does not go to the sender.
    • pivot_root() - The root filesystem can be changed using this system call. pivot_root() is commonly used to change the root from initrd.
    • poll() - This syscall watches file descriptors for ones that are ready for IO operations.
    • pread() - With a given offset, pread() reads a file descriptor (fd).
    • pwrite() - With a given offset, pwrite() writes to a file descriptor (fd).
    • preadv() - This system call can read a file descriptor and fill many buffers. preadv() is like pread() and readv() combined.
    • prlimit() - This is getrlimit() and setrlimit() combined into one syscall, so this one call gets and sets a process's resource limits.
    • process_vm_readv() - This kernel call gets data from a specified process (by pid) and gives it to the calling process.
    • process_vm_writev() - The calling process uses this syscall to send data to a remote process.
    • pselect() - This is like poll(), watching for many file descriptors for one to be free for IO operations.
    • ptrace() - A process can control and monitor another process (if permissions permit). The calling process is called the tracer and the process being monitored is called the tracee.
    • pwritev() - This syscall has both the features of writev() and pwrite().
    • query_module() - The information about a module can be received with this syscall.
    • quotactl() - Disk quotas are managed with this syscall.
    • read() - This system call gets data byte-by-byte from the specified file descriptor and places them in the buffer.
    • readahead() - Files are placed in the page cache by this syscall.
    • readlink() - Gets the full real pathname of the file the link points towards.
    • reboot() - Obviously, this syscall reboots the system. When CAD is used (Ctrl+Alt+Del), this kernel call is executed.
    • recv(), recvfrom(), recvmsg() - These three syscalls are nearly the same. They all receive messages from connected sockets, but these calls do so in a different way.
    • recvmmsg() - Like the three calls mentioned previously, recvmmsg() gets messages from sockets. However, this syscall can receive multiple messages at once, while the other calls get one at a time. The code used to make recvmmsg() came from recvmsg(). (Notice the number of "m"s)
    • remap_file_pages() - This system call creates a new mapping on memory. Specifically, remap_file_pages() sets up a nonlinear mapping, meaning the pages are not placed in order on memory.
    • removexattr() - The extended attributes of files are removed with this syscall. The needed parameters include the path of the file and the name of the attribute.
    • rename() - This syscall renames a file.
    • request_key() - Keys can be retrieved from the kernel's key-ring by using this system call.
    • restart_syscall() - Sometimes, syscalls are temporarily paused by a stop signal (typically SIGSTOP). To resume such syscalls, use restart_syscall().
    • rmdir() - Empty directories can be deleted with rmdir().
    • rt_sigqueueinfo(), rt_tgsigqueueinfo() - A signal and data are sent to the specified process using one of these system calls. Both of these calls are the same, but they differ in the accepted parameters. rt_sigqueueinfo() needs to know the tgid (Thread Group ID) while rt_tgsigqueueinfo() needs to know both the tgid and tid (Thread ID).
    • sigaction() - Signals sent to processes may need to be modified. This kernel call allows the calling process to change the desired result of a signal sent to a process.
    • sigpending() - This syscall allows the calling process to view pending signals.
    • sigprocmask() - This kernel call allows the calling process to view its masked signals.
    • NOTE: Masked signals are signals that are blocked.
    • sigsuspend() - This syscall is used to pause a process.
    • sched_get_priority_max(), sched_get_priority_min() - Every scheduling policy (or scheduling algorithm) has a set priority range. These two syscalls return the maximum and minimum priority numbers (respectively) accepted by a policy.
    • sched_setaffinity(), sched_getaffinity() - The CPU affinity (CPU pinning) of a thread can be set or viewed with these syscalls, respectively. CPU affinity assigns a thread or process to a processor. For instance, on systems with multiple processors, processes and threads may not be processed by many CPU chips at once. Instead, code may stay with one CPU.
    • sched_setparam(), sched_getparam() - These syscalls allow the parameters of a schedule to be set and viewed for a process specified by its PID.
    • sched_setscheduler(), sched_getscheduler() - With a given PID, a processes scheduling policy (algorithm) can be set or viewed.
    • sched_yield() - The calling process will be placed at the end of the processor's task queue.
    • select() - This is another syscall used to monitor multiple file descriptors so that an IO task can be performed on the next available descriptor.
    • send(), sendto(), and sendmsg() - These syscalls perform nearly the same task, but they each have a slightly different method of functioning. send() is the same as write() except that the system call accepts flags while write() cannot. These send syscalls all use sockets, but different arguments.
    • sendfile() - This syscall copies data from one file descriptor to another. This is a faster way to copy files since this action is performed within the kernel. Most tasks completed in the kernel space complete faster than they do in the userspace.
    • FUN FACT: Are you wondering how many system calls are being made on your system right now? To figure out how many system calls are made per second system wide (on all processors), use the vmstat command and look at the "sy" column. For my system at the time of executing vmstat, I had five system calls running.
    • sendmmsg() - More than one message can be sent down a socket using this kernel call. Most message-sending syscalls can only send one message down a socket at a time.
    • set_mempolicy() - This syscall is used by the calling process to change their NUMA-memory policy.
    • set_thread_area() - This syscall writes an entry on the local storage array of a thread (TLS = Thread Local Storage).
    • set_tid_address() - A pointer is created by this kernel call that points to the specified TID (Thread ID).
    • setdomainname() - This syscall sets the domain name and saves it as an array with each character in their own field. This value can be retrieved with getdomainname().
    • setfsgid() - This kernel call changes the FileSystem Group ID (FSGID) which is a GID used when accessing network filesystems.
    • setfsuid() - setfsuid() is a lot like setfsgid() except that setfsuid() changes the User ID (UID).
    • setgid() - The calling process is given a Group ID.
    • setgroups() - The supplementary Group IDs are set for the calling process.
    • sethostname() - The hostname is set in the form of an array with one character per field.
    • setns() - A thread can be given a namespace by using this syscall.
    • setpgid() - The GID of a process is set with this kernel call.
    • setpriority() - The schedule priority of a process is set.
    • setreuid(), setregid() - These syscalls set the real and effective User or Group IDs.
    • setresuid(), setresgid() - These two syscalls are like their equivalent kernels calls above, but with the additional ability to set the Saved-User-ID (SUID) or Saved-Group-ID (SGID).
    • setrlimit() - A resource limit is set with this kernel call.
    • setsid() - The process group ID and session ID of the calling process are set to the PID of the calling process.
    • setsockopt() - Options for a specified socket are set using this syscall.
    • settimeofday() - The timezone and time are set via settimeofday().
    • setuid() - The User ID (UID) is set with this call.
    • setup() - This deprecated syscall was once used to prepare devices and filesystems on the system and mount the root filesystem.
    • setxattr() - Extended attributes are set using this kernel call.
    • shutdown() - Many of you may think this system call shuts down the system. Well, guess what? It does not. Rather, this system call closes a socket or at least part of the socket.
    • NOTE: reboot() is the system call that reboots or powers off the system.
    • sigaction() - Signals can be viewed and changed with this syscall.
    • signalfd() - Signals can be accepted by a process via a file descriptor. However, such a file descriptor must be created first using this syscall.
    • sigpending() - The calling thread can view pending signals coming to it using this syscall.
    • sigprocmask() - A process's blocked signals can be changed via sigprocmask().
    • socket() - The commonly discussed sockets are created with this syscall. A socket is a named pipe with additional abilities. The sockets commonly discussed in these syscall articles are sometimes called Unix Domain Sockets.
    • socketpair() - A pair of sockets are created using this system call.
    • splice() - Sockets can be spliced either for input or output.
    • stat() - This syscall returns the "status" of a file. The "status" is information such as the number of blocks owned by the file, the IDs of the owning group and owner, the storage device's ID, file size, number of hardlinks, and a few other pieces of information.
    • statfs() - Use this kernel call to get the "status" of a filesystem. Such status includes the number of free and total blocks, filesystem type, and other information pertaining to the filesystem itself.
    • stime() - This syscall sets the time in seconds since January 1st, 1970 (epoch).
    • subpage_prot() - Pages can be divided into subpages on memory. This system call allows permissions to be set to specific subpages, but only on PowerPC processors.
    • NOTE: Remember that blocks are to hard-drives as pages are to memory.
    • swapon() and swapoff() - These kernel calls turn the swap area on and off, respectively. Turning swap off may be done when the admin is changing its size on a live/active system or for various other reasons.
    • symlink() - Shortcuts (or soft-links) are made using this system call.
    • sync() - When files are changed, the edits are held in memory. The changes are written to the hard-drive after the sync() call is executed.
    • syncfs() - Like sync(), syncfs() causes changes to be written to the storage unit. However, only one file is changed while sync() tells all modifications to be written.
    • sync_file_range() - Like syncfs(), not all synchronizations are executed. However, only a portion of a single file is synchronized with the memory.
    • NOTE: When you make changes to a file (like opening a text file in Gedit), the changes are held in the buffer cache on memory.
    • sysfs() - Information about the current/present filesystems can be viewed with this syscall.
    • sysinfo() - An overview of the systems information (system statistics or sys stats) can be viewed with sysinfo(). Specifically, this data contains various memory space info, time since booting, buffer data, and some other helpful information.
    • syslog() - Kernel messages are viewed by syslog() which is also the system call that gives syslogd (a daemon) the data it places into logs.
    • tee() - The "tee" command used in shells is a system call that splits a pipe into two data pipes.
    • tgkill() - Many users are aware that kill() kills a process, but few people know that individual threads can be killed. Such a task can be done with the syscall tgkill().
    • time() - The current system time will be returned by the syscall in the form of seconds since January 1st, 1970.
    • NOTE: January 1st, 1970 is commonly referred to as "The Epoch". Now, when you read about some software (like a syscall) returning time in seconds since the Epoch, you know that that is what it means.
    • timer_create() - This syscall creates a per-process timer, which is a timer for each process.
    • timer_delete() - Timers can be deleted via timer_delete().
    • NOTE: Timers are identified by "timerid".
    • timer_getoverrun() - This syscall is used for expiration notices (like stating a backup is three days late). This call calculates the time interval between the time the timer was due up to the time timer_getoverrun() is called. This time range is the overrun.
    • timer_gettime() and timer_settime() - These syscalls allow the time on timers to be set or viewed.
    • times() - The various time data of processes are returned with this call. The "time data" includes user time, system time, children's user time, and the children's system time.
    • truncate() - A file (specified by its path) is resized to a particular size. If the file is larger than the desired size, then the extra data is lost. If the file is smaller than the needed size, then null bits are added to reach the final file size.
    • umask() - When files are created, permissions must be set on the file. The umask() kernel call tells the process what permissions to set on the file it is creating.
    • umount() - This is the kernel call used to unmount filesystems.
    • umount2() - This syscall is the same as umount(). However, this system call accepts flags as arguments.
    • uname() - This syscall is a commonly known and used command in the command-line. Uname() returns various information about the active kernel.
    • unlink() - This syscall is used to delete soft links, sockets, pipes, and device files. The remove() syscall is used for files and rmdir() is needed to delete directories, but these calls do not work on soft links, sockets, pipes, or device files. That is why unlink() is needed. As you know, sockets and pipes are used by processes quite often, so unlink() is made to wait until the processes are done using the object it needs to remove. unlinkat() is the same as unlink() except for some minor differences in its functioning and the fact that it accepts flags.
    • unshare() - Sometimes, processes may share various resources (like virtual memory) after they fork. It may not always be desirable that the processes share so many resources. The unshare() syscall helps separate the processes more by making them get their own resources rather than sharing.
    • uselib() - As many GNU/Linux users know, many programs need various shared libraries (like the one under /lib/, /lib32/, /lib64/, /libx32/ ,and elsewhere). The uselib() system call is used by processes to load the needed library. This syscall loads libraries by path.
    • ustat() - The calling process can view the statistics of mounted filesystems using this syscall.
    • utime() - This system call is used to change the access and modification times associated with a file. The system call goes by inode to find the file. This kernel call is commonly used after a file is access or modified.
    • utimensat() - This kernel call is just like utime(), but the difference lies in the precision. utime() is precise down to the microsecond while utimensat() can handle nanoseconds.
    • vfork() - This syscall is like fork(), but with some differences. vfork() gives the child process its own page tables while fork() makes the parent and child processes share. The parent process still shares many of the attributes with the child process.
    • vhangup() - A hang-up is simulated in the terminal. This is needed so users will have a fresh terminal (tty) when they login after someone else has been logged into the terminal.
    • vm86() - This is a kernel call used to initiate a virtual 8086 mode which is commonly needed to run dosemu. The older (and obsolete) version of this system call has been renamed to vm86old(). If dosemu is able to run on your system, then your kernel has the vm86() system call. If not, then your platform cannot support this platform-specific kernel call.
    • vmsplice() - This syscall splices pages of memory to a pipe.
    • wait() - The calling process will be suspended until one of its children processes exit or are killed.
    • wait3() and wait4() - These syscalls are deprecated, so waitid() is used instead.
    • waitid() - This syscall makes the parent process pause until one of its child processes have a particular state change (see the NOTE a few lines below). waitid() is more precise than waitpid() since a specific state change can be the trigger for the parent to resume execution.
    • waitpid() - A child process can be specified by PID. Then, the parent process (which is also the calling process of waitpid()) will be paused until the specified child process experiences a state change of any kind.
    • NOTE: A "state change" refers to one of the following events - the termination of a process, a process resumes execution, or a process is stopped.
    • write() - This syscall writes data from a buffer to the specified file descriptor.

    Non-Standard Syscalls

    There is another set of kernel calls called the non-standard syscalls or unimplemented system calls. These kernel calls are not part of the vanilla kernel, or at least not in recent versions.

    • afs_syscall() - This is a seldomly used system call for OpenAFS. OpenAFS is an open-source alternative to the Andrew FileSystem, both of which are distributed filesystems. This syscall managed the Input/Output of OpenAFS. Most developers now use ioctl() instead.
    • break() - No info available. I cannot find any information about this syscall. If anyone knows anything, please share with us.
    • getpmsg() - This kernel call is the same as getmsg(). However, getpmsg() offered calling processes greater control than getmsg(). Both of these syscalls allowed a process to get a message from a stream.
    • gtty(), stty() - This system call is mostly obsolete due to ioctl(). gtty() may still be used for backwards compatibility, but in general, developers may no longer see this syscall. It is (or was) used to control terminal devices through the device files (/dev/tty*). stty() is an equivalent of gtty().
    • idle() - This kernel call makes the CPU idle.
    • lock() - This system call locks or unlocks access to registers.
    • madvise1() - This syscall is not used in the vanilla kernel, but is used in the Android-Linux kernels. madvice1() is an alias to madvice(), both of which advise the kernel on how to allocate and manage memory for the calling process.
    • mpx() - This kernel call creates and manages multiplexed files.
    • phys() - mmap() is now used instead of phys().
    • prof() - This syscall is a profiler which is used to measure the performance of the kernel.
    • putpmsg() - This is equivalent to putmsg(). Both of these syscalls are used to put messages on the message stream.
    • security() - Some Linux Security Modules (LSMs) use (or had used) this kernel call to define system calls for security purposes. Most LSMs now use socket use socketcall().
    • tuxcall() - This call comes from a TUX module and is sent to the kernel. The call asks the kernel to perform some task for the module. A TUX module is basically a server application/daemon in the form of a Linux module. Imagine an Apache server being a kernel module; that is essentially how TUX works.
    • vserver() - This is a virtualization system call that is used by this specialized kernel (http://linux-vserver.org/Welcome_to_Linux-VServer.org).

    Last Note on System Calls

    This last "syscall" does not belong with the others, but I will mention it here. On some syscall tables (I will explain that in a moment), people may see a ni(), sys_ni(), ni_syscall(), or something of that manner. This is a syscall place holder. To better understand this, it is important to know that syscalls are assigned a syscall number. For instance, here is a syscall table for the v3.10 Android's Linux kernel (https://android.googlesource.com/kernel/common.git/+/android-3.10-adf/arch/m32r/kernel/syscall_table.S). Notice that each syscall has a number associated with it. For instance, write() is syscall "6". When syscalls are given new numbers or are removed, the ni() syscall is a null kernel call that reserves that syscall number for later or private use. These syscall tables may also be called "system call maps". The Linux kernel (when executing) stores the syscall map in the "%eax" register. These numbers are probably only important to syscall developers or assembly programmers.

    Further Reading

Viewing 1 post (of 1 total)