Unix Pipes: Difference between revisions

From OSDev.wiki
Jump to navigation Jump to search
[unchecked revision][unchecked revision]
Content added Content deleted
(copied some of the unix pipe implementation stuff that I kinda tried explaining on forum and somebody asked to copy to wiki..)
No edit summary
Line 1: Line 1:
{{Stub}}
{{Stub}}


Pipes, socketpairs and FIFO's are techniques that allow two process to exchange data through a stream of bytes. Unlike files, however, pipes and friends do not consume disk space but instead have a (usually circular) buffer within kernel. If the buffers become overloaded (e.g. if the ''consumer'' is too slow), the ''producer(s)'' will be turned to ''waiting'' state by the system.
Pipes, socketpairs and FIFOs are techniques that allow two processes to exchange data through a stream of bytes. Unlike files however, pipes and friends do not consume disk space but instead have a (usually circular) buffer within kernel. If the buffers become overloaded (e.g. if the ''consumer'' is too slow), the ''producer(s)'' will be turned to ''waiting'' state by the system.


Pipes are usually one-way streams with a ''producer'' side and a ''consumer'' side. Note that it is perfectly possible to have multiple producers and/or multiple consumers (though multiple consumers tend to make things hard to use). There is a thread in the forum on this subject; [[Topic:12851|Stream-oriented programming]].
Pipes are usually one-way streams with a ''producer'' side and a ''consumer'' side. Note that it is perfectly possible to have multiple producers and/or multiple consumers (though multiple consumers tend to make things hard to use). There is a thread in the forum on this subject; [[Topic:12851|Stream-oriented programming]].


==Usage==
==Usage==
Beside the usual shell-scripting use of pipes (where programs <tt>grep</tt> and <tt>less</tt> are linked together so that <tt>grep</tt>'s output become <tt>less</tt>'s input with <tt>grep kernel | less</tt>), several unix programs uses such techniques to pass data around pre-build process, one of the most notable example of it being <tt>qmail</tt>. When running GCC, for example, using the <tt>-pipe</tt> flag will use pipes instead of temporary files for intermediate compilation results, thus speeding up compilation.
Beside the usual shell-scripting use of pipes (where programs such as <tt>grep</tt> and <tt>less</tt> are linked together so that <tt>grep</tt>'s output become <tt>less</tt>'s input with <tt>grep kernel | less</tt>), several unix programs use such techniques to pass data around pre-build process, one of the most notable example of it being <tt>qmail</tt>. When running GCC, for example, using the <tt>-pipe</tt> flag will use pipes instead of temporary files for intermediate compilation results, thus speeding up compilation.


==Implementation==
==Implementation==
This is copied from my forum post directly, somebody please clean it up ([[User:Mystran|Mystran]] 18:17, 6 December 2007 (CST)):


There are two commands for creating pipes on a Unix system: pipe() and mkfifo(). Both act in similar ways - the first creates an ''unnamed'' or ''anonymous'' pipe, the second creates a ''named'' pipe. Named pipes appear in the filesystem and exist until explicitly deleted; unnamed pipes do not and only exist while at least one filedescriptor is open on it.


The standard way to create a pipe on Unix, is to call mkpipe() which "returns" two file descriptors. See the manpage, but basicly one end can be written to, and the other end read from..
The standard way to create an (unnamed) pipe on Unix, is to call pipe() which "returns" two file descriptors. One is used for writing to the pipe, the other for reading from the pipe.


Internally the pipe is basicly a ringbuffer and a couple of semaphores. Nothing fancy. The size is typically around 4k or so.. thought that isn't really all that important.
Internally the pipe is (normally) a circular buffer/queue and two or more semaphores for locking. The size (on Linux) is typically around 4 kilobytes.


Such a pipe exists as long as either end of the pipe is open. It's cleaned after you close() the both ends. There's a couple of ways to move file descriptors from a process to another on a typical unix system, but most common way is to not:
Such a pipe exists as long as either end of the pipe is open. It is cleaned after you close() both ends. There are a couple of ways to move file descriptors from a process to another on a typical unix system; the most common and 'correct' way being to pass the file descriptor from a parent to a child over a fork() call.


When you write a pipe command line, it's the shell that calls mkpipe() and then when it fork()s the processes to do the work, it does some filedescriptor renaming (see dup()) to get the right descriptors at the right places before it finally exec()s the programs. Remember fork()ed childs inherit the parents open file descriptors by default.
When you write a pipe command line, it's the shell that calls pipe() and then when it fork()s the processes to do the work, it does some filedescriptor renaming (see dup()) to get the right descriptors at the right places before it finally exec()s the programs. Remember fork()ed childs inherit the parents open file descriptors by default.


Named pipes are similar, except they sit somewhere in the filesystem, and if you open the named pipe repeatedly, you get the same pipe every time.. anyway, just more ringbuffers kludged to have a filename. Once created, the named pipe remains in the filesystem until deleted like any other file.
Named pipes are similar, except they sit somewhere in the filesystem, and if you open the named pipe repeatedly, you get the same pipe every time. Once created, the named pipe remains in the filesystem until deleted like any other file.


==See Also==
==See Also==

Revision as of 08:55, 7 December 2007

This page is a stub.
You can help the wiki by accurately adding more contents to it.

Pipes, socketpairs and FIFOs are techniques that allow two processes to exchange data through a stream of bytes. Unlike files however, pipes and friends do not consume disk space but instead have a (usually circular) buffer within kernel. If the buffers become overloaded (e.g. if the consumer is too slow), the producer(s) will be turned to waiting state by the system.

Pipes are usually one-way streams with a producer side and a consumer side. Note that it is perfectly possible to have multiple producers and/or multiple consumers (though multiple consumers tend to make things hard to use). There is a thread in the forum on this subject; Stream-oriented programming.

Usage

Beside the usual shell-scripting use of pipes (where programs such as grep and less are linked together so that grep's output become less's input with grep kernel | less), several unix programs use such techniques to pass data around pre-build process, one of the most notable example of it being qmail. When running GCC, for example, using the -pipe flag will use pipes instead of temporary files for intermediate compilation results, thus speeding up compilation.

Implementation

There are two commands for creating pipes on a Unix system: pipe() and mkfifo(). Both act in similar ways - the first creates an unnamed or anonymous pipe, the second creates a named pipe. Named pipes appear in the filesystem and exist until explicitly deleted; unnamed pipes do not and only exist while at least one filedescriptor is open on it.

The standard way to create an (unnamed) pipe on Unix, is to call pipe() which "returns" two file descriptors. One is used for writing to the pipe, the other for reading from the pipe.

Internally the pipe is (normally) a circular buffer/queue and two or more semaphores for locking. The size (on Linux) is typically around 4 kilobytes.

Such a pipe exists as long as either end of the pipe is open. It is cleaned after you close() both ends. There are a couple of ways to move file descriptors from a process to another on a typical unix system; the most common and 'correct' way being to pass the file descriptor from a parent to a child over a fork() call.

When you write a pipe command line, it's the shell that calls pipe() and then when it fork()s the processes to do the work, it does some filedescriptor renaming (see dup()) to get the right descriptors at the right places before it finally exec()s the programs. Remember fork()ed childs inherit the parents open file descriptors by default.

Named pipes are similar, except they sit somewhere in the filesystem, and if you open the named pipe repeatedly, you get the same pipe every time. Once created, the named pipe remains in the filesystem until deleted like any other file.

See Also