ACC SHELL
sIntroduction to [1mksh-93 dat[22mDecember 21, 1993
[1mCharge Case 311466-6713[0m
[1mFile Case 61175 fro[22mDavid G. Korn
MH 11267
3C-526B x7975
(research!dgk)
TM
[4mMEMORANDUM[24m [4mFOR[24m [4mFILE[0m
[4m1.[24m [4mINTRODUCTION[0m
The term "shell" is used to describe a program that provides
a command language interface. Because the UNIX* system
shell is a user level program, and not part of the operating
system itself, anyone can write a new shell or modify an
existing one. This has caused an evolutionary progress in
the design and implementation of shells, with the better
ones surviving. The most widely available UNIX system
shells are the Bourne shell[7], written by Steve Bourne at
AT&T Bell Laboratories, the C shell[8], written by Bill Joy
at the University of California, Berkeley, and the KornShell
language [9], written by David Korn at AT&T Bell
Laboratories. The Bourne shell is available on almost all
versions of the UNIX system. The C Shell is available with
all Berkeley Software Distribution (BSD) UNIX systems and on
many other systems. The KornShell is available on System V
Release 4 systems. In addition, it is available on many
other systems. The source for the KornShell language is
available from the AT&T Toolchest, an electronic software
distribution system. It runs on all known versions of the
UNIX system and on many UNIX system look-alikes.
There have been several articles comparing the UNIX system
shells. Jason Levitt[10] highlights some of the new
features introduced by the KornShell language. Rich
Bilancia[11] explains some of the advantages of using the
KornShell language. John Sebes[12] provides a more detailed
comparison of the three shells, both as a command language
and as a programming language.
The KornShell language is a superset of the Bourne shell.
The KornShell language has many of the popular C shell
features, plus additional features of its own. Its initial
popularity stems primarily from its improvements as a
command language. The primary interactive benefit of the
____________________
* UNIX is a registered trademark of USL
KornShell command language is a visual command line editor
that allows you to make corrections to your current command
line or to earlier command lines, without having to retype
them.
However, in the long run, the power of the KornShell
language as a high-level programming language, as described
by Dolotta and Mashey[13], may prove to be of greater
significance. [1mksh-93 [22mprovides the programming power of
several other interpretive languages such as [1mawk[22m, [1mFIT[22m, [1mPERL[22m,
and [1mtcl[22m. An application that was originally written in the
C programming language was rewritten in the KornShell
language. More than 20,000 lines of C code were replaced
with KornShell scripts totaling fewer than 700 lines. In
most instances there was no perceptible difference in
performance between the two versions of the code.
The KornShell language has been embedded into windowing
systems allowing graphical user interfaces to be developed
in shell rather than having to build applications that need
to be compiled. The [1mwksh [22mprogram[14] provides a method of
developing OpenLook or Motif applications as [1mksh [22mscripts.
This memo is an introduction to [1mksh-93[22m, the program that
implements an enhanced version of the KornShell language.
It is referred to as [1mksh [22min the rest of this memo. The memo
describes the KornShell language based on the features of
the 12/28/93 release of [1mksh[22m. This memo is not a tutorial,
only an introduction. The second edition of reference [9]
gives a more complete treatment of the KornShell language.
A concerted effort has been made to achieve both System V
Bourne shell compatibility and IEEE POSIX compatibility so
that scripts written for either of these shells can run
without modification with [1mksh[22m. In addition, [1mksh-93 [22mattempts
to be compatible with older versions of [1mksh[22m. When there are
conflicts between versions of the shell, [1mksh-93 [22mselects the
behavior dictated by the IEEE POSIX standard. The
description of features in this memo assumes that the reader
is already familiar with the Bourne shell.
[4m2.[24m [4mCOMMAND[24m [4mLANGUAGE[0m
There is no separate command language. All features of the
language, except job control, can be used both within a
script and interactively from a terminal. However, features
that are more likely to be used while running commands
interactively from a terminal are presented here.
[4m2.1[24m [4mSetting[24m [4mOptions[0m
By convention, UNIX commands consist of a command name
followed by options and other arguments. Options are either
of the form [1m-[4m[22mletter[24m, or [1m-[4m[22mletter[24m [4mvalue[24m. In the former case,
several options may be grouped after a single [1m-[22m. The
argument [1m-- [22msignifies an end to the option list and is only
required when the first non-option argument begins with a [1m-[22m.
Most commands print an error message which shows which
options are permitted when given incorrect arguments. In
addition, the option sequence [1m-? [22mcauses most commands to
print a usage message which lists the valid options.
Ordinarily, [1mksh [22mexecutes a command by using the command name
to locate a program to run and by running the program as a
separate process. Some commands, referred to as [4mbuilt-ins[24m,
are carried out by [1mksh [22mitself, without creating a separate
process. The reasons that some commands are built-in are
presented later. In nearly all cases the distinction
between a command that is built-in and one that is not is
invisible to the user. However, nearly all commands that
are built-in follow command line conventions.
[1mksh [22mhas several options that can be set by the user as
command line arguments at invocation and as option arguments
to the [1mset [22mcommand. Most other options can be set with a
single letter option or as a name that follows the [1m-o[0m
option. Use [1mset -o [22mto display the current option settings.
Some of these options, such as [1minteractive [22mand [1mmonitor [22m(see
[4mJob[24m [4mControl[24m below), are enabled automatically by [1mksh [22mwhen
the shell is connected to a terminal device. Other options,
such as [1mnoclobber [22mand [1mignoreeof[22m, are normally placed in a
startup file. The [1mnoclobber [22moption causes [1mksh [22mto print an
error message when you use [1m> [22mto redirect output to a file
that already exists. If you want to redirect to an existing
file, then you have to use [1m>| [22mto override the [1mnoclobber[0m
option. The [1mignoreeof [22moption is used to prevent the [4mend-of-[0m
[4mfile[24m character, normally [1m^D [22m(Control- d), from exiting the
shell and possibly logging you out. You must type [1mexit [22mto
log out. Most of the options are described in this memo as
appropriate.
[4m2.2[24m [4mCommand[24m [4mAliases[0m
Command aliases provide a mechanism of associating a command
name and arguments with a shorter name. Aliases are defined
with the [1malias [22mbuilt-in. The form of an [1malias [22mcommand
definition is:
[1malias [4m[22mname[24m[1m=[4m[22mvalue[0m
As with most other shell assignments, no space is allowed
before or after the [1m=[22m. The characters of an alias name
cannot be characters that are special to the shell. The
replacement string, [4mvalue,[24m can contain any valid shell
script, including meta-characters such as pipe symbols and
i/o-redirection provided that they are quoted. Unlike [1mcsh[22m,
aliases in [1mksh [22mcannot take arguments. The equivalent
functionality of aliases with arguments can be achieved with
shell functions, described later.
As a command is being read, the command name is checked
against a list of [4malias[24m names. If it is found, the name is
replaced by the alias value associated with the [4malias[24m and
then rescanned. When rescanning the value for an alias,
alias substitutions are performed except for an alias that
is currently being processed. This prevents infinite loops
in alias substitutions. For example with the aliases,
[1malias l=ls 'ls=ls -C'[22m, the command name [1ml [22mbecomes [1mls[22m, which
becomes [1mls -C[22m. Ordinarily, only the command name word is
processed for alias substitution. However, if the value of
an alias ends in a space, then the word following the alias
is also checked for alias substitution. This makes it
possible to define an alias whose first argument is the name
of a command and have alias substitution performed on this
argument, for example [1mnohup='nohup '[22m.
Aliases can be used to redefine built-in commands so that
the alias,
[1malias test=./test[0m
can be used to look for [1mtest [22min your current working
directory rather than using the built-in [1mtest [22mcommand.
Reserved words such as [1mfor [22mand [1mwhile [22mcannot be changed by
aliasing. The command [1malias[22m, without arguments, generates a
list of aliases and corresponding alias values. The [1munalias[0m
command removes the name and text of an alias.
Aliases are used to save typing and to improve readability
of scripts. Several aliases are predefined by [1mksh[22m. For
example, the predefined alias
[1malias integer='typeset -i'[0m
allows the integer variables [1mi [22mand [1mj [22mto be declared and
initialized with the command
[1minteger i=0 j=1[0m
While aliases can be defined in scripts, it is not
recommended. The location of an alias command can be
important since aliases are only processed when a command is
read. A [1m. [22mprocedure (the shell equivalent of an include
file) is read all at once (unlike start up files which are
read a command at a time) so that any aliases defined there
will not effect any commands within this script. Predefined
aliases do not have this problem.
[4m2.3[24m [4mCommand[24m [4mRe-entry[0m
When run interactively, [1mksh [22msaves the commands you type at a
terminal in a file. If the variable [1mHISTFILE [22mis set to the
name of a file to which the user has write access, then the
commands are stored in this [4mhistory[24m file. Otherwise the
file [1m$HOME/.sh_history [22mis checked for write access and if
this fails an unnamed file is used to hold the history
lines. Commands are always appended to this file.
Instances of [1mksh [22mthat run concurrently and use the same
history file name, share access to the history file so that
a command entered in one shell will be available for editing
in another shell. The file may be truncated when [1mksh[0m
determines that no other shell is using the history file.
The number of commands accessible to the user is determined
by the value of the [1mHISTSIZE [22mvariable at the time the shell
is invoked. The default value is 256. Each command may
consist of one or more lines since a compound command is
considered one command. If the character [1m! [22mis placed
within the [4mprimary[24m [4mprompt[24m string, [1mPS1[22m, then it is replaced
by the command number each time the prompt is given.
A built-in command named [1mhist [22mis used to list and/or edit
any of these saved commands. The option [1m-l [22mis used to
specify listing of previous commands. The command can
always be specified with a range of one or more commands.
The range can be specified by giving the command number,
relative or absolute, or by giving the first character or
characters of the command. When given without specifying
the range, the last 16 commands are listed, each preceded by
the command number.
If the listing option is not selected, then the range of
commands specified, or the last command if no range is
given, is passed to an editor program before being re-
executed by [1mksh[22m. The editor to be used may be specified
with the option [1m-e [22mand following it with the editor name.
If this option is not specified, the value of the shell
variable [1mHISTEDIT [22mis used as the name of the editor,
providing that this variable has a non-null value. If this
variable is not set, or is null, and the [1m-e [22moption has not
been selected, then [1m/bin/ed [22mis used. When editing has been
complete, the edited text automatically becomes the input
for [1mksh[22m. As this text is read by [1mksh[22m, it is echoed onto the
terminal.
The [1m-s [22moption causes the editing to be bypassed and just re-
executes the command. In this case only a single command
can be specified as the range and an optional argument of
the form [4mold[24m[1m=[4m[22mnew[24m may be added which requests a simple string
substitution prior to evaluation. A convenient alias,
[1malias r='hist -s'[0m
has been pre-defined so that the single key-stroke [1mr [22mcan be
used to re-execute the previous command and the key-stroke
sequence, [1mr abc=def c [22mcan be used to re-execute the last
command that starts with the letter [1mc [22mwith the first
occurrence of the string [1mabc [22mreplaced with the string [1mdef[22m.
Typing [1mr c > file [22mre-executes the most recent command
starting with the letter [1mc[22m, with standard output redirected
to [4mfile[24m.
[4m2.4[24m [4mIn-line[24m [4mediting[0m
Lines typed from a terminal frequently need changes made
before entering them. With the Bourne shell the only method
to fix up commands is by backspacing or killing the whole
line. [1mksh [22moffers options that allow the user to edit parts
of the current command line before submitting the command.
The in-line edit options make the command line into a single
line screen edit window. When the command is longer than
the width of the terminal, only a portion of the command is
visible. Moving within the line automatically makes that
portion visible. Editing can be performed on this window
until the [4mreturn[24m key is pressed. The editing modes have
editing directives that access the history file in which
previous commands are saved. A user can copy any of the
most recent [1mHISTSIZE [22mcommands from this file into the input
edit window. You can locate commands by searching or by
position.
The in-line editing options do not use the [4mtermcap[24m or
[4mterminfo[24m databases. They work on most standard terminals.
They only require that the backspace character moves the
cursor left and the space character overwrites the current
character on the screen and moves the cursor to the right.
Very few terminals or terminal emulators do not have this
behavior.
There is a choice of editor options. The [1memacs[22m, [1mgmacs[22m, or
[1mvi [22moption is selected by turning on the corresponding option
of the [1mset [22mcommand. If the value of the [1mEDITOR [22mor [1mVISUAL[0m
variables ends with any of these suffixes the corresponding
option is turned on. A large subset of each of these
editors' features is available within the shell. Additional
functions, such as file name completion, have also been
added.
In the [1memacs [22mor [1mgmacs [22mmode the user positions the cursor to
the point needing correction and inserts, deletes, or
replaces characters as needed. The only difference between
these two modes is the meaning of the directive [1m^T[22m. Control
keys and escape sequences are used for cursor positioning
and control functions. The available editing functions are
listed in the manual page.
The [1mvi [22mediting mode starts in insert mode and enters control
mode when the user types ESC ( 033 ). The [4mreturn[24m key, which
submits the current command for processing, can be entered
from either mode. The cursor can be anywhere on the line.
A subset of commonly used [4mvi[24m editing directives are
available. The [1mk [22mand [1mj [22mdirectives that normally move up and
down by one [4mline[24m, move up and down one [4mcommand[24m in the
history file, copying the command into the input edit
window. For reasons of efficiency, the terminal is kept in
canonical mode until an ESC is typed. On some terminals,
and on earlier versions of the UNIX operating system, this
doesn't work correctly. The [1mviraw [22moption, which always uses
[4mraw[24m or [4mcbreak[24m mode, must be used in this case.
Most of the code for the editing options does not rely on
the [1mksh [22mcode and can be used in a stand-alone mode with most
any command to add in-line edit capability. However, all
versions of the in-line editors have some features that use
some shell specific code. For example, with all edit modes,
the ESC-= directive applied to command words (the first word
on the line, or the first word after a [1m;[22m, [1m|[22m, [1m([22m, or [1m&[22m) lists
all aliases, functions, or commands that match the portion
of the given current word. When applied to other words,
this directive prints the names of files that match the
current word. The ESC[1m-* [22mdirective adds the expanded list of
matching files to the command line. A trailing [1m* [22mis added
to the word if it doesn't contain any file pattern matching
characters before the expansion. In [1memacs [22mand [1mgmacs [22mmode,
ESC-ESC indicates command completion when applied to command
names, otherwise it indicates pathname completion. With
command or pathname completion, the list generated by the
ESC-= directive is examined to find the longest common
prefix. With command completion, only the last component of
the pathname is used to compute the longest command prefix.
If the longest common prefix is a complete match, then the
word is replaced by the pathname, and a [1m/ [22mis appended if
pathname is a directory, otherwise a space is added. In [1mvi[0m
mode, [1m\ [22mfrom control mode gives the same behavior.
[4m2.5[24m [4mKey[24m [4mBinding[0m
It is possible to intercept keys as they are entered and
apply new meanings or bindings. A trap named [1mKEYBD [22mis
evaluated each time [1mksh [22mprocesses characters entered from
the keyboard, other than those typed while entering a search
string or an argument to an edit directive such as [1mr [22min vi-
mode. The action associated with this trap can change the
value of the entered key to cause the key to perform a
different operation.
When the [1mKEYBD [22mtrap is entered, the [1m.sh.edtext [22mvariable
contains the contents of the current input line and the
[1m.sh.edcol [22mvariable gives the current cursor position within
this line. The [1m.sh.edmode [22mvariable contains the [1mESC[0m
character when the trap is entered from [1mvi [22minsert mode.
Otherwise, this value is null. The [1m.sh.edchar [22mvariable
contains the character or escape sequence that caused the
trap. A key sequence is either a single character, [1mESC[0m
followed by a single character, or [1mESC[ [22mfollowed by a single
character. In the [1mvi [22medit mode, the characters after the
[1mESC [22mmust be entered within half a second after the [1mESC[22m. The
value of [1m.sh.edchar [22mat the end of the trap will be used as
the input sequence.
Using the associative array facility of [1mksh [22mdescribed later,
and the function facility of [1mksh[22m, it is easy to write a
single trap so that keys can be bound dynamically. For
example,
[1mtypeset -A Keytable[0m
[1mtrap 'eval "${Keytable[${.sh.edchar}]}"' KEYBD[0m
[1mfunction keybind # key action[0m
[1m{[0m
[1mtypeset key=$(print -f "%q" "$2")[0m
[1mcase $# in[0m
[1m2) Keytable[$1]='.sh.edchar=${.sh.edmode}'"$key"[0m
[1m;;[0m
[1m1) unset Keytable[$1][0m
[1m;;[0m
[1m*) print -u2 "Usage: $0 key [action]"[0m
[1m;;[0m
[1mesac[0m
[1m}[0m
[4m2.6[24m [4mJob[24m [4mControl[0m
The job control mechanism is almost identical to the version
introduced in [1mcsh [22mof the Berkeley UNIX operating system,
version 4.1 and later. The job control feature allows the
user to stop and restart programs, and to move programs to
and from the foreground and the background. It will only
work on systems that provide support for these features.
However, even systems without job control have a [1mmonitor[0m
option which, when enabled, will report the progress of
background jobs and enable the user to [1mkill [22mjobs by job
number or job name.
An interactive shell associates a [4mjob[24m with each pipeline
typed in from the terminal and assigns it a small integer
number called the job number. If the job is run
asynchronously, the job number is printed at the terminal.
At any given time, only one job owns the terminal, i.e.,
keyboard signals are only sent to the processes in one job.
When [1mksh [22mcreates a foreground job, it gives it ownership of
the terminal. If you are running a job and wish to stop it
you hit the key [1m^Z [22m(control-[1mZ[22m) which sends a [1mSTOP [22msignal to
all processes in the current job. The shell receives
notification that the processes have stopped and takes back
control of the terminal.
There are commands to continue programs in the foreground
and background. There are several ways to refer to jobs.
The character [1m% [22mintroduces a job name. You can refer to
jobs by name or number as described in the manual page. The
built-in command [1mbg [22mallows you to continue a job in the
background, while the built-in command [1mfg [22mallows you to
continue a job in the foreground even though you may have
started it in the background.
A job being run in the background will stop if it tries to
read from the terminal. It is also possible to stop
background jobs that try to write on the terminal by setting
the terminal options appropriately.
There is a built-in command [1mjobs [22mthat lists the status of
all running and stopped jobs. In addition, you are informed
of the change of state (running or stopped) of any
background jobs just before each prompt. If you want to be
notified about background job completions as soon as they
occur without waiting for a prompt, then use the [1mnotify[0m
option. When you try to exit the shell while jobs are
stopped or running, you will receive a message from [1mksh[22m. If
you ignore this message and try to exit again, all stopped
processes will be terminated. In addition, for login
shells, the [1mHUP [22msignal will be sent to all background jobs
unless the job has been disowned with the [1mdisown [22mcommand.
A built-in version of [1mkill [22mmakes it possible to use [4mjob[0m
numbers as targets for signals. Signals can be selected by
number or name. The name of the signal is the name found in
the [4minclude[24m file [1m/usr/include/sys/signal.h [22mwith the prefix
[1mSIG [22mremoved. The [1m-l [22moption of [1mkill [22mprovides a means to map
individual signal names to and from signal number. In
addition, if no signal name or number is given, [1mkill -l[0m
generates a list of valid signal names.
[4m2.7[24m [4mChanging[24m [4mDirectories[0m
By default, [1mksh [22mmaintains a logical view of the file system
hierarchy which makes symbolic links transparent. For
systems that have symbolic links, this means that if [1m/bin [22mis
a symbolic link to [1m/usr/bin [22mand you change directory to
[1m/bin[22m, [1mpwd [22mwill indicate that you are in [1m/bin[22m, not [1m/usr/bin[22m.
[1mpwd -P [22mgenerates the physical pathname of the present
working directory by resolving all the symbolic links. By
default, the [1mcd [22mcommand will take you where you expect to go
even if you cross symbolic links. A subsequent [1mcd .. [22min the
example above will place you in [1m/[22m, not [1m/usr[22m. On systems
with symbolic links, [1mcd -P [22mcauses [1m.. [22mto be treated
physically.
[1mksh [22mremembers your last directory in the variable [1mOLDPWD[22m.
The [1mcd [22mbuilt-in can be given with argument [1m- [22mto return to
the previous directory and print the name of the directory.
Note that [1mcd - [22mdone twice returns you to the starting
directory, not the second previous directory. A directory
[4mstack[24m manager has been written as shell [4mfunctions[24m to [4mpush[0m
and [4mpop[24m directories from the stack.
[4m2.8[24m [4mPrompts[0m
When [1mksh [22mreads commands from a terminal, it issues a prompt
whenever it is ready to accept more input and then waits for
the user to respond. The [1mTMOUT [22mvariable can be set to be
the number of seconds that the shell will wait for input
before terminating. A 60 second warning message is printed
before terminating.
The shell uses two prompts. The primary prompt, defined by
the value of the [1mPS1 [22mvariable, is issued at the start of
each command. The secondary prompt, defined by the value of
the [1mPS2 [22mvariable, is issued when more input is needed to
complete a command.
[1mksh [22mallows the user to specify a list of files or
directories to check before issuing the [1mPS1 [22mprompt. The
variable [1mMAILPATH [22mis a colon ( [1m: [22m) separated list of file
names to be checked for changes periodically. The user is
notified before the next prompt. Each of the names in this
list can be followed by a [1m? [22mand a message to be given when
a change has been detected in the file. The prompt will be
evaluated for parameter expansion, command substitution and
arithmetic expansion which are described later. The
parameter [1m$_ [22mwithin a mail message will evaluate to the name
of the file that has changed. The parameter [1mMAILCHECK [22mis
used to specify the minimal interval in seconds before new
mail is checked for.
In addition to replacing each [1m! [22min the prompt with the
command number, [1mksh [22mexpands the value of the [1mPS1 [22mvariable
for parameter expansions, arithmetic expansions, and command
substitutions as described below to generate the prompt.
The expansion characters that are to be applied when the
prompt is issued must be quoted to prevent the expansions
from occurring when assigning the value to [1mPS1. [22mFor
example, [1mPS1="$PWD" [22mcauses [1mPS1 [22mto be set to the value of [1mPWD[0m
at the time of the assignment whereas [1mPS1='$PWD' [22mcauses [1mPWD[0m
to be expanded at the time the prompt is issued.
Command substitution may require a separate process to
execute and cause the prompt display to be somewhat slow,
especially when the return key is pressed several times in a
row. Therefore, its use within [1mPS1 [22mis discouraged. Some
variables are maintained by [1mksh [22mso that their values can be
used with [1mPS1. [22mThe [1mPWD [22mvariable stores the pathname of the
current working directory. The value of [1mSECONDS [22mvariable is
the value of the most recent assignment plus the elapsed
time. By default, the time is measured in milli-seconds,
but since [1mSECONDS [22mis a floating point variable, the number
of places after the decimal point in the expanded value can
be specified with [1mtypeset -F[4m[22mplaces[24m [1mSECONDS[22m. In a roundabout
way, this variable can be used to generate a time stamp into
the [1mPS1 [22mprompt without creating a process at each prompt.
The following code explains how you can do this on System V.
On BSD, you need a different command to initialize the
[1mSECONDS [22mvariable.
[1m# . this script and use $TIME as part of your PS1 string to[0m
[1m# get the time of day in your prompt[0m
[1mtypeset -RZ2 _x1 _x2 _x3[0m
[1m(( SECONDS=$(date '+3600*%H+60*%M+%S') ))[0m
[1m_s='_x1=(SECONDS/3600)%24,_x2=(SECONDS/60)%60,_x3=SECONDS%60,0'[0m
[1mTIME='"${_d[_s]}$_x1:$_x2:$_x3"'[0m
[1m# PS1=${TIME}whatever[0m
[4m2.9[24m [4mTilde[24m [4msubstitution[0m
The character [1m~ [22mat the beginning of a word has special
meaning to [1mksh[22m. If the characters after the [1m~ [22mup to a [1m/[0m
match a user login name in the password database, then the [1m~[0m
and the name are replaced by that user's login directory.
If no match is found, the original word is unchanged. A [1m~[0m
by itself, or in front of a [1m/[22m, is replaced by the value of
the [1mHOME [22mparameter. A [1m~ [22mfollowed by a [1m+ [22mor [1m- [22mis replaced by
the value of [1m$PWD [22mor [1m$OLDPWD [22mrespectively.
[4m2.10[24m [4mOutput[24m [4mformats[0m
The output of built-in commands and traces have values
quoted so that they can be re-input to the shell. This
makes it easy to cut and paste shell output on systems which
use a pointing device such as a mouse. In addition, output
can be saved in a file for reuse.
[4m2.11[24m [4mThe[24m [1mENV [4m[22mfile[0m
When an interactive [1mksh [22mstarts, it evaluates the [1m$ENV[0m
variable to arrive at a file name. If this value is not
null, [1mksh [22mattempts to read and process commands in a file by
this name. Earlier versions of [1mksh [22mread the [1mENV [22mfile for
all invocations of the shell primarily to allow function
definitions to be available for all shell invocations. The
function search path, [1mFPATH[22m, described later, eliminated the
primary need for this capability and it was removed because
the high performance cost was no longer deemed acceptable.
[4m3.[24m [4mPROGRAMMING[24m [4mLANGUAGE[0m
The KornShell vastly extends the set of applications that
can be implemented efficiently at the shell level. It does
this by providing simple yet powerful mechanisms to perform
arithmetic, pattern matching, substring generation, and
arrays. Users can write applications as separate functions
that can be defined in the same file or in a library of
functions stored in a directory and loaded on demand.
[4m3.1[24m [4mString[24m [4mProcessing[0m
The shell is primarily a string processing language. By
default, variables hold variable length strings. There are
no limits to the length of strings. Storage management is
handled by the shell automatically. Declarations are not
required. With most programming languages, string constants
are designated by enclosing characters in single quotes or
double quotes. Since most of the words in the language are
strings, the shell requires quotes only when a string
contains characters that are normally processed specially by
the shell, but their literal meaning is intended. However,
since the shell is a string processing language, and some
characters can occur as literals and as language
metacharacters, quoting is an important part of the
language.
There are four quoting mechanisms in [1mksh[22m. The simplest is
to enclose a sequence of characters inside single quotes.
All characters between a pair of single quotes have their
literal meaning; the single quote itself cannot appear. A [1m$[0m
immediately preceding a single quoted string causes all the
characters until the matching single quote to be interpreted
as an ANSI-C language string. Thus, [1m'\n' [22mrepresents
characters [1m\ [22mand [1mn[22m, whereas, [1m$'\n' [22mrepresents the new-line
character. Double quoted strings remove the special meaning
of all characters except [1m$[22m, [1m`[22m, and [1m\[22m, so that parameter
expansion and command substitution (defined below) are
performed. The final mechanism for quoting a character is
by preceding it with the escape character [1m\[22m. This mechanism
works outside of quoted strings and for the characters [1m$[22m, [1m`[22m,
[1m"[22m, and [1m\ [22min double quoted strings.
Variables are designated by one or more strings of
alphanumeric characters beginning with an alphabetic
character separated by a [1m.[22m. Upper and lower case characters
are distinct, so that the variable [1mA [22mand [1ma [22mare names of
different variables. There is no limit to the length of the
name of a variable. You do not have to declare variables.
You can assign a value to a variable by writing the name of
the variable, followed by an equal sign, followed by a
character string that represents its value. To create a
variable whose name contains a [1m.[22m, the variable whose name
consists of the characters before the last [1m. [22mmust already
exist. You reference a variable by putting the name inside
curly braces and preceding the braces with a dollar sign.
The braces may be omitted when the name is alphanumeric. If
[1mx [22mand [1my [22mare two shell variables, then to define a new
variable, [1mz[22m, whose value is the concatenation of the values
of [1mx [22mand [1my[22m, you just say [1mz=$x$y[22m. It is that easy.
The [1m$ [22mcan be thought of as meaning "value of." You can also
capture the output of any command with the notation
[1m$([4m[22mcommand[24m[1m)[4m[22m.[24m This is referred to as command substitution.
For example, [1mx=$(date) [22massigns the output from the [1mdate[0m
command to the variable [1mx[22m. Command substitution in the
Bourne shell is denoted by enclosing the command between
backquotes, ([1m``[22m). This notation suffers from some
complicated quoting rules. Thus, it is hard to write [1msed[0m
patterns which contains back slashes within command
substitution. Putting the pattern in single quotes is of
little help. [1mksh [22maccepts the Bourne shell command
substitution syntax for backward compatibility. The
[1m$([4m[22mcommand[24m[1m) [22mnotation allows the [4mcommand[24m itself to contain
quoted strings even if the substitution occurs within double
quotes. Nesting is legal.
The special command substitution of the form [1m$(cat file) [22mcan
be replaced by [1m$(< file)[22m, which is faster because the [1mcat[0m
command doesn't have to run.
[4m3.2[24m [4mShell[24m [4mParameters[24m [4mand[24m [4mVariables[0m
There are three types of parameters used by [1mksh[22m, special
parameters, positional parameters, and named parameters
which are called variables. [1mksh [22mdefines the same special
parameters, [1m0[22m, [1m*[22m, [1m@[22m, [1m#[22m, [1m?[22m, [1m$[22m, [1m![22m, and [1m-[22m, as in the Bourne
shell.
Positional parameters are set when the shell is invoked, as
arguments to the [1mset [22mbuilt-in, and by calls to functions
(see below) and [1m. [22mprocedures. They are named by numbers
starting at 1.
The third type of parameter is a variable. As mentioned
earlier, [1mksh [22muses variables whose names consist of one or
more alpha-numeric strings separated by a [1m.[22m. There is no
need to specify the [4mtype[24m of a variable in the shell because,
by default, variables store strings of arbitrary length and
values will automatically be converted to numbers when used
in an arithmetic context. However, [1mksh [22mvariables can have
one or more [4mattributes[24m that control the internal
representation of the variable, the way the variable is
printed, and its access or scope. In addition, [1mksh [22mallows
variables to represent arrays of values and references to
other variables. The [1mtypeset [22mbuilt-in command of [1mksh[0m
assigns attributes to variables. Two of the attributes,
[4mreadonly[24m and [4mexport[24m, are available in the Bourne shell.
Most of the remaining attributes are discussed here. The
complete list of attributes appears in the manual. The
[1munset [22mbuilt-in of [1mksh [22mremoves values and attributes of
variables. When a variable is exported, certain of its
attributes are also exported.
Whenever a value is assigned to a variable, the value is
transformed according to the attributes of the variable.
Changing the attribute of a variable can change its value.
The attributes [1m-L [22mand [1m-R [22mare for left and right field
justification respectively. They are useful for aligning
columns in a report. For each of these attributes, a width
can be defined explicitly or else it is defined the first
time an assignment is made to the variable. Each assignment
causes justification of the field, truncating if necessary.
Assignment to fixed sized variables provides one way to
generate a substring consisting of a fixed number of
characters from the beginning or end of a string. Other
methods are discussed later.
The attributes [1m-u [22mand [1m-l [22mare used for upper case and lower
case formatting, respectively. Since it makes no sense to
have both attributes on simultaneously, turning on either of
these attributes turns the other off. The following script,
using [1mread [22mand [1mprint [22mwhich are described later, provides an
example of the use of shell variables with attributes. This
script reads a file of lines each consisting of five fields
separated by [1m: [22mand prints fields 4 and 2 in upper case in
columns 1-15, left justified, and columns 20-25 right-
justified respectively.
[1mtypeset -uL15 f4 # 15 character left justified[0m
[1mtypeset -uR6 f2 # 6 character right justified[0m
[1mIFS=: # set field separator to :[0m
[1mwhile read -r f1 f2 f3 f4 f5 # read line, split into fields[0m
[1mdo print -r -- "$f4 $f2" # print fields 4 and 2[0m
[1mdone[0m
The [1m-i[22m, [1m-E[22m, and [1m-F[22m, attributes are used to represent
numbers. Each can be followed by a decimal number. The [1m-i[0m
attribute causes the value to be represented as an integer
and it can be followed by a number representing the numeric
base when expanding its value. Whenever a value is assigned
to an integer variable, it is evaluated as an arithmetic
expression and then truncated to an integer.
The [1m-E [22mattribute causes the value to be represented in
scientific notation whenever its value is expanded. The
number following the [1m-E [22mdetermines the number of significant
figures, and defaults to 6. The [1m-F [22mattribute causes the
value to be represented with a fixed number of places after
the decimal point. Assignments to variables with the [1m-E [22mor
[1m-F [22mattributes cause the evaluation of the right hand side of
the assignment.
[1mksh [22mallows one-dimensional [4marrays[24m in addition to simple
variables. There are two types of arrays; associative
arrays and indexed arrays. The subscript for an associative
array is an arbitrary string, whereas the subscript for an
indexed array is an arithmetic expression that is evaluated
to yield an integer index. Any variable can become an
indexed array by referring to it with an integer [4msubscript[24m.
All elements of an array need not exist. Subscripts for
arrays must evaluate to an integer between 0 and some
maximum value, otherwise an error results. The maximum
value may vary from one machine to another but is at least
4095. Evaluation of subscripts is described in the next
section. Attributes apply to the whole array.
Assignments to array variables can be made to individual
elements via parameter assignment commands or the [1mtypeset[0m
built-in. Additionally, values can be assigned sequentially
with compound assignment as described below, or by the [1m-A[0m
[4mname[24m option of the [1mset [22mcommand. Referencing of subscripted
variables requires the character [1m$[22m, but also requires braces
around the array element name. The braces are needed to
avoid conflicts with the file name generation mechanism.
The form of any array element reference is:
[1m${[4m[22mname[24m[1m[[4m[22msubscript[24m[1m]}[0m
Subscript values of [1m* [22mand [1m@ [22mcan be used to generate all
elements of an array, as they are used for expansion of
positional parameters. The list of currently defined
subscripts for a given variable can be generated with
[1m${![4m[22mname[24m[1m[@]}[4m[22m,[24m or [1m${![4m[22mname[24m[1m[*]}[4m[22m.[0m
The [1m-n [22mor [4mnameref[24m attribute causes the variable to be
treated as a reference to the variable defined by its value.
Once this attribute is set, all references to this variable
become references to the variable named by the value of this
variable. For example, if [1mfoo=bar[22m, then setting the
reference attribute on [1mfoo [22mwill cause all subsequent
references to [1mfoo [22mto behave as the variable whose name is
[1m$foo [22mwas referenced, which in this case is the variable [1mbar[22m.
Unsetting this attribute breaks the association. Reference
variables are usually used inside functions whose arguments
are the names of shell variables. The names for reference
variables cannot contain a [1m.[22m. Whenever a shell variable is
referenced, the portion of the variable up to the first [1m.[0m
is checked to see whether it matches the name of a reference
variable. If it does, then the name of the variable
actually used consists of the concatenation of the name of
the variable defined by the reference plus the remaining
portion of the original variable name. For example, using
the predefined alias, [1malias nameref='typeset -n'[22m,
[1m.bar.home.bam="hello world"[0m
[1mnameref foo=.bar.home[0m
[1mprint ${foo.bam}[0m
[1mhello world[0m
[4m3.3[24m [4mCompound[24m [4mAssignment[0m
Compound assignments are used to assign values to arrays and
compound data structures. The syntax for a compound
assignment is [4mname[24m[1m=([4m[22massignment-list[24m[1m) [22mwhere [4mname[24m is the name
of the variable to which you want to assign values. No
space is permitted between the variable name and the [1m= [22mbut
can appear between the [1m= [22mand the open parenthesis. New-
lines can appear between the parentheses.
The [4massignment-list[24m can be in several different forms
yielding different results. If [4massignment-list[24m is simply a
list of words, then the words are processed as they are with
the [1mfor [22mcommand and assigned sequentially as an indexed
array. For example,
[1mfoo=( * )[0m
creates an indexed array [1mfoo [22mand assigns the file names in
the current directory to each index starting at zero.
The second form for [4massignment-list[24m is a list of assignments
of the special form [1m[[4m[22mword[24m[1m]=[4m[22mword[24m. No space is permitted
before or after the [1m=[22m. In this case, the variable given by
[4mname[24m becomes an associative array with the given arguments
as subscripts. For example,
[1mbar=( [color]=red [shape]=box )[0m
creates an associate array named [1mbar [22mwhose subscripts are
[1mcolor [22mand [1mshape[22m.
The third form for [4massignment-list[24m is a list of normal
assignments, including compound assignments. These
assignments cause sub-variables to be assigned corresponding
to the given assignments. In addition to assignments, the
[4massignment-list[24m can contain [1mtypeset [22mcommands. In addition
to creating sub-variables, the effect of a compound
assignment is to make the value of the original variable be
a parenthesized assignment list of its components. For
example, the assignment
[1mfoo=([0m
[1mleft=bar[0m
[1mtypeset -i count=3[0m
[1mpoint=([0m
[1mx=50[0m
[1my=60[0m
[1m)[0m
[1mcolors=( red green yellow )[0m
[1mright=bam[0m
[1m)[0m
is equivalent to the assignments
[1mfoo.left=bar[0m
[1mfoo.count=3[0m
[1mfoo.point.x=50[0m
[1mfoo.point.y=60[0m
[1mfoo.colors=( red green yellow )[0m
[1mfoo.right=bam[0m
In addition, the value of [1m"$foo" [22mis
[1m([0m
[1mcolors=( red green yellow )[0m
[1mleft=bar[0m
[1mtypeset -i count=3[0m
[1mpoint=([0m
[1my=60[0m
[1mx=50[0m
[1m)[0m
[1mright=bam[0m
[1m)[0m
[4m3.4[24m [4mSubstring[24m [4mGeneration[0m
The expansion of a variable or parameter can be modified so
that only a portion of the value results. It is often
necessary to extract a portion of a shell variable or a
portion of an array. There are several parameter expansion
operators that can do this. One method to generate a
substring is with an expansion of the form
[1m${[4m[22mname[24m[1m:[4m[22moffset[24m[1m:[4m[22mlength[24m[1m} [22mwhere [4moffset[24m is an arithmetic
expression that defines the offset of the first character
starting from 0, and [4mlength[24m is an arithmetic expression that
defines the length of the substring. If [1m:[4m[22mlength[24m is omitted,
the length of the value of [4mname[24m starting at [4moffset[24m is used.
The [1m:[4m[22moffset[24m[1m:[4m[22mlength[24m operators can also be applied to array
expansions and to parameters [1m* [22mand [1m@ [22mto generate portions of
an array. For example, the expansion,
[1m${[4m[22mname[24m[1m[@]:[4m[22moffset[24m[1m:[4m[22mlength[24m[1m}[22m, yields up to [4mlength[24m elements of
the array [4mname[24m starting at the element [4moffset[24m.
The other parameter expansion modifiers use shell patterns
to describe portions of the string to modify and delete. A
description of shell patterns is contained below. When
these modifiers are applied to special parameters [1m@ [22mand [1m* [22mor
to array parameters given as [4mname[24m[1m[@] [22mor [4mname[24m[1m[*][22m, the
operation is performed on each element. There are four
parameter expansion modifiers that strip off leading and
trailing substrings during parameter expansion by removing
the characters matching a given pattern. An expansion of
the form [1m${[4m[22mname[24m[1m#[4m[22mpattern[24m[1m} [22mcauses the smallest matching prefix
of the value of [4mname[24m to be removed. The largest prefix
matching [4mpattern[24m is removed by using [1m## [22minstead of [1m#[22m.
Similarly, an expansion of the form [1m${[4m[22mname[24m[1m%[4m[22mpattern[24m[1m} [22mcauses
the smallest matching substring at the end of [4mname[24m to be
removed. Again, using [1m%% [22minstead of [1m%[22m, causes the largest
matching trailing substring to be deleted. For example, if
the shell variable [1mfile [22mhas value [1mfoo.c[22m, then the expression
[1m${file%.c}.o [22mhas value [1mfoo.o[22m.
The value of an expansion can be changed by specifying a
pattern that matches the part that needs to be changed after
the the parameter expansion modifier [1m/[22m. An expansion of the
form [1m${[4m[22mname[24m[1m/[4m[22mpattern[24m[1m/[4m[22mstring[24m[1m} [22mreplaces the first match of
[4mpattern[24m with the value of variable [4mname[24m to [4mstring[24m. The
second [1m/ [22mis not necessary when [4mstring[24m is null. The
expansion [1m${[4m[22mname[24m[1m//[4m[22mpattern[24m[1m/[4m[22mstring[24m[1m} [22mchanges all occurrences of
the [4mpattern[24m into [4mstring[24m. The parameter expansion modifiers
[1m/# [22mand [1m/% [22mcause the matching pattern to be anchored to the
beginning and end respectively.
Finally, there are parameter expansion modifiers that yield
the name of the variable, the string length of the value, or
the number of elements of an array. [1m${![4m[22mname[24m[1m} [22myields the
name of the variable which will be [4mname[24m itself except when
[4mname[24m is a reference variable. In this case it will yield
the name of the variable it refers to. When applied to an
array variable, [1m${![4m[22mname[24m[1m[@]} [22mand [1m${![4m[22mname[24m[1m[*]} [22mgenerate the
names of all subscripts. [1m${#[4m[22mname[24m[1m} [22mwill be the length in
bytes of [1m$[4m[22mname[24m. For an array variable [1m${#[4m[22mname[24m[1m[*]} [22mgives the
number of elements in the array.
[4m3.5[24m [4mArithmetic[24m [4mEvaluation[0m
For the most part, the shell is a string processing
language. However, the need for arithmetic has long been
obvious. Many of the characters that are special to the
Bourne shell are needed as arithmetic operators. To make
arithmetic easy to use, and to maintain compatibility with
the Bourne shell, [1mksh [22muses matching [1m(( [22mand [1m)) [22mto delineate
arithmetic expressions. While single parentheses might have
been more desirable, these already mean [4msubshell[24m so that
another notation was required. The arithmetic expression
inside the double parentheses follows the same syntax,
associativity and precedence as the ANSI-C[15] programming
language. The characters between the matching double
parentheses are processed with the same rules used for
double quotes so that spaces can be used to aid readability
without additional quoting.
All arithmetic evaluations are performed using double
precision floating point arithmetic. Floating point
constants follow the same rules as the ANSI-C programming
language. Integer arithmetic constants are written as
[4mbase[24m[1m#[4m[22mnumber,[0m
where [4mbase[24m is a decimal integer between two and sixty-four
and [4mnumber[24m is any non-negative number. Base ten is used
when no base is specified. The digits are represented by
the characters [1m0-9a-zA-Z_@[22m. For bases less than or equal to
36, upper and lower case characters can be used
interchangeably to represent the digits from 10 thru 35.
Arithmetic expressions are made from constants, variables,
and operators. Parentheses may be used for grouping. The
contents inside the double parentheses are processed with
the same expansions as occurs in a double quoted string, so
that all [1m$ [22mexpansions are performed before the expression is
evaluated. However, there is usually no need to use the [1m$[0m
to get the value of a variable because the arithmetic
evaluator replaces the name of the variable by its value
within an arithmetic expression. The [1m$ [22mcannot be used when
the variable is the subject of assignment or an increment
operation. As a rule it is better not to use [1m$ [22min front of
variables in an arithmetic expression.
An arithmetic command of the form [1m(( ... )) [22mis a command
that evaluates the enclosed arithmetic expression. For
example, the command
[1m(( x++ ))[0m
can be used to increment the variable [1mx[22m, assuming that [1mx[0m
contains some numerical value. The arithmetic command is
true (return value 0), when the resulting expression is non-
zero, and false (return value 1) when the expression
evaluates to zero. This makes the command easy to use with
the [1mif [22mand [1mwhile [22mcompound commands.
The [1mfor [22mcompound command has been extended for use in
arithmetic contexts. The syntax,
[1mfor (( [4m[22mexpr1[24m[1m; [4m[22mexpr2[24m [1m; [4m[22mexpr3[24m [1m))[0m
can be used as the first line of a [1mfor [22mloop with the same
semantics as the [1mfor [22mstatement in the ANSI-C programming
language.
Arithmetic evaluations can also be performed as part of the
evaluation of a command line. The syntax [1m$(( ... )) [22mexpands
to the value of the enclosed arithmetic expression. This
expansion can occur wherever parameter expansion is
performed. For example using the [1mksh [22mcommand [1mprint[0m
(described later)
[1mprint $((2+2))[0m
prints the number 4.
The following script prints the first [4mn[24m lines of its
standard input onto its standard output, where [4mn[24m can be
supplied as an optional argument whose default value is 20.
[1minteger n=${1-20} # set n[0m
[1mwhile (( n-- >=0 )) && read -r line # at most n lines[0m
[1mdo print -r -- "$line"[0m
[1mdone[0m
[4m3.6[24m [4mShell[24m [4mExpansions[0m
The commands you enter from the terminal or from a script
are divided into words and each word undergoes several
expansions to generate the command name and its arguments.
This is done in two phases. The first phase recognizes
reserved words, spaces and operators to decide where command
boundaries lie. Alias substitutions take place during this
phase. The second phase performs expansions in the
following order:
o Tilde substitution, parameter expansion, arithmetic
expansion, and command substitution are performed from
left to right. The option [1m-u [22mor [1mnounset[22m, will cause an
error to occur when any variable that is not set is
expanded.
o The characters that result from parameter expansion and
command substitution above are checked with the
characters in the [1mIFS [22mvariable for possible field
splitting. (See a description of [1mread [22mbelow to see how
[1mIFS [22mis used.) Setting [1mIFS [22mto a null value causes field
splitting to be skipped.
o Pathname generation (as described below) is performed
on each of the fields. Any field that doesn't match a
pathname is left alone. The option, [1m-f [22mor [1mnoglob[22m, is
used to disable pathname generation.
[4m3.7[24m [4mPattern[24m [4mMatching[0m
The shell is primarily a string processing language and uses
patterns for matching file names as well as for matching
strings. The characters [1m?[22m, [1m*[22m, and [1m[ [22mare processed specially
by the shell when not quoted. These characters are used to
form patterns that match strings. Patterns are used by the
shell to match pathnames, to specify substrings, and for
[1mcase [22mcommands. The character [1m? [22mmatches any one character.
The character [1m* [22mmatches zero or more characters. The
character sequence [1m[[22m...[1m] [22mdefines a character class that
matches any character contained within [1m[][22m. A range of
characters can be specified by putting a [1m- [22mbetween the first
and last character of the range. An exclamation mark, [1m![22m,
immediately after the [1m[[22m, means match all characters except
the characters specified. For example, the pattern
[1ma?c*.[!a-z] [22mmatches any string beginning with an [1ma[22m, whose
third character is a [1mc[22m, and that ends in [1m. [22m(dot) followed
by any character except the lower case letters, [1ma-z[22m. The
sequence [1m[:alpha:] [22minside a character class, matches any set
of characters in the ANSI-C [1malpha [22mclass. Similarly,
[1m[:[4m[22mclass[24m[1m:] [22mmatches each of the characters in the given [4mclass[0m
for all the ANSI-C character classes. For example,
[1m[[:alnum:]_] [22mmatches any alpha-numeric character or the
character [1m_[22m.
[1mksh [22mtreats strings of the form [1m([4m[22mpattern-list[24m [1m)[22m, where
[4mpattern-list[24m is a list of one or more patterns separated by
a [1m|[22m, specially when preceded by [1m*[22m, [1m?[22m, [1m+[22m, [1m@[22m, or [1m![22m. A [1m?[0m
preceding [1m([4m[22mpattern-list[24m[1m) [22mmeans that the pattern list
enclosed in [1m() [22mis optional. An [1m@([4m[22mpattern-list[24m[1m) [22mmatches any
pattern in the list of patterns enclosed in [1m()[22m. A
[1m*([4m[22mpattern-list[24m[1m) [22mmatches any string that contains zero or
more of each of the enclosed patterns, whereas [1m+([4m[22mpattern-[0m
[4mlist[24m[1m) [22mrequires a match of one or more of any of the given
patterns. For instance, the pattern [1m+([0-9])?(.) [22mmatches
one or more digits optionally followed by a [1m.[22m(dot). A
[1m!([4m[22mpattern-list[24m[1m) [22mmatches anything except any of the given
patterns. For example, [1mprint !(*.o) [22mdisplays all file names
in the current directory that do not end in [1m.o[22m.
When patterns are used to generate pathnames when expanding
commands several other rules apply. A separate match is
made for each file name component of the pathname. Read
permission is required for any portion of the pathname that
contains any special pattern character. Search permission
is required for every component except possibly the last.
By default, file names in each directory that begin with [1m.[0m
are skipped when performing a match. If the pattern to be
matched starts with a leading [1m.[22m, then only files beginning
with a [1m.[22m, are examined when reading each directory to find
matching files. If the [1mFIGNORE [22mvariable is set, then only
files that do not match this pattern are considered. This
overrides the special meaning of [1m. [22min a pattern and in a
file name.
If the [1mmarkdirs [22moption is set, each matching pathname that
is the name of a directory has a trailing [1m/ [22mappended to the
name.
[4m3.8[24m [4mConditional[24m [4mExpressions[0m
The Bourne shell uses the [1mtest [22mcommand, or the equivalent [1m[[0m
command, to test files for attributes and to compare strings
or numbers. The problem with [1mtest [22mis that the shell has
expanded the words of the [1mtest [22mcommand and split them into
arguments before [1mtest [22mbegins execution. [1mtest [22mcannot
distinguish between operators and operands. In most cases
[1mtest "$1" [22mwill test whether argument 1 is non-null.
However, if argument 1 is [1m-f[22m, then [1mtest [22mwill treat [1m-f [22mas an
operator and yield a syntax error. One of the most frequent
errors with [1mtest [22moccurs when its operands are not within
double quotes. In this case, the argument may expand to
more than a single argument or to no argument at all. In
either case this will likely cause a syntax error. What
makes this most insidious is that these errors are
frequently data dependent. A script that appears to run
correctly may abort if given unexpected data.
To get around these problems, [1mksh [22mhas a compound command for
conditional expression testing as part of the language. The
reserved words [1m[[ [22mand [1m]] [22mdelimit the range of the command.
Because they are reserved words, not operator characters,
they require spaces to separate them from arguments. The
words between [1m[[ [22mand [1m]] [22mare not processed for field
splitting or for pathname generation. In addition, since
[1mksh [22mdetermines the operators before parameter expansion,
expansions that yield no argument cause no problem. The
operators within [1m[[[22m...[1m]] [22mare almost the same as those for
the [1mtest [22mcommand. All unary operators are of the form
[1m-[4m[22mletter[24m and are followed by a single operand. Instead of [1m-a[0m
and [1m-o[22m, [1m[[[22m...[1m]] [22muses [1m&& [22mand [1m|| [22mto indicate "and" and "or".
Parentheses are used without quoting for grouping.
The right hand side of the string comparison operators [1m==[0m
and [1m!= [22mtakes a pattern and tests whether the left hand
operand matches this pattern. Quoting the pattern results
is a string comparison rather than the pattern match. The
operators [1m< [22mand [1m> [22mwithin [1m[[[22m...[1m]] [22mdesignate lexicographical
comparison.
In addition there are several other new comparison
primitives. The binary operators [1m-ot [22mand [1m-nt [22mcompare the
modification times of two files to see which file is [4molder[0m
[4mthan[24m or [4mnewer[24m [4mthan[24m the other. The binary operator [1m-ef [22mtests
whether two files have the same device and i-node number,
i. e., a link to the same file.
The unary operator [1m-L [22mreturns true if its operand is a
symbolic link. The unary operator [1m-O [22m([1m-G[22m) returns true if
the owner (or group) of the file operand matches that of the
caller. The unary operator [1m-o [22mreturns true when its operand
is the name of an option that is currently on.
The following script illustrates some of the uses of
[1m[[[22m...[1m]][22m. The reference manual contains the complete list of
operators.
[1mfor i[0m
[1mdo # execute foo for numeric directory[0m
[1mif [[ -d $i && $i == +([0-9]) ]][0m
[1mthen foo[0m
[1m# otherwise if writable or executable file and not mine[0m
[1melif [[ (-w $i||-x $i) && ! -O $i ]][0m
[1mthen bar[0m
[1mfi[0m
[1mdone[0m
[4m3.9[24m [4mInput[24m [4mand[24m [4mOutput[0m
[1mksh [22mhas extended I/O capabilities to enhance the use of the
shell as a programming language. As with the Bourne shell,
you use the I/O redirection operator, [1m<[22m, to control where
input comes from, and the I/O redirection operator, [1m>[22m, to
control where output goes to. Each of these operators can
be preceded with a single digit that specifies a file unit
number to associate with the file stream. Ordinarily you
specify these I/O redirection operators with a specific
command to which it applies. However, if you specify I/O
redirections with the [1mexec [22mcommand, and don't specify
arguments to [1mexec[22m, then the I/O redirection applies to the
current program. For example, the command [1mexec < foobar[0m
opens file [1mfoobar [22mfor reading. The [1mexec [22mcommand is also
used to close files. A file descriptor unit can be opened
as a copy of an existing file descriptor unit by using
either of the [1m<& [22mor [1m>& [22moperators and putting the file
descriptor unit of the original file after the [1m&[22m. Thus,
[1m2>&1 [22mmeans open standard error (file descriptor 2) as a copy
of standard output (file descriptor 1). A file descriptor
value of [1m- [22mafter the [1m& [22mindicates that the file should be
closed. To close file unit 5, specify [1mexec 5<&-[22m. There are
two additional redirection operators with [1mksh [22mand the POSIX
shell that are not part of the Bourne shell. The [1m>|[0m
operator overrides the effect of the [1mnoclobber [22moption
described earlier. The [1m<> [22moperator causes a file to be
opened for both reading and writing.
[1mksh [22mrecognizes certain pathnames and treats them specially.
Pathnames of the form [1m/dev/fd/[4m[22mn[24m are treated as equivalent to
the file defined by file descriptor [4mn[24m. These name can be
used as the script argument to [1mksh [22mand in conditional
testing as described above. On underlying systems that
support [1m/dev/fd [22min the file system, these names can be
passed to other commands. Pathnames of the form
[1m/dev/tcp/[4m[22mhostid[24m[1m/[4m[22mport[24m and [1m/dev/udp/[4m[22mhostid[24m[1m/[4m[22mport[24m can be used to
create [1mtcp [22mand [1mudp [22mconnections to services given by the
[4mhostid[24m number and [4mport[24m number. The [4mhostid[24m cannot use
symbolic values. In practice these numbers are typically
generated by command substitution. For example,
[1mexec 5> /dev/tcp/$(service name) [22mwould open file descriptor
5 for sending messages to hostid and port number defined by
the output of [1mservice name[22m.
The Bourne shell has a built-in command [1mread [22mfor reading
lines from standard input (file descriptor 0) and splitting
it into fields based on the value of the [1mIFS [22mvariable, and a
command [1mecho [22mto write strings to standard output. (On some
systems, [1mecho [22mis not a built-in command and incurs
considerable overhead to use.) Unfortunately, neither of
these commands is able to perform some very basic tasks.
For example. with the Bourne shell, the [1mread [22mbuilt-in
cannot read a single line that ends in [1m\[22m. With [1mksh [22mthe [1mread[0m
built-in has a [1m-r [22moption to remove the special meaning for [1m\[0m
which allows it to be treated as a regular character rather
than the line continuation character. With the Bourne
shell, there is no simple way to have more than one file
open at any time for reading. [1mksh [22mhas options on the [1mread[0m
command to specify the file descriptor for the input. The
fields that are read from a line can be stored into an
indexed array with the [1m-A [22moption to read. This allows a
line to be split into an arbitrary number of fields.
The way the Bourne shell uses the [1mIFS [22mvariable to split
lines into fields greatly limits its utility. Often data
files consist of lines that use a character such as [1m: [22mto
delimit fields with two adjacent delimiters that denote a
null field. The Bourne shell treats adjacent delimiters as
a single field delimiter. With [1mksh[22m, delimiters that are
considered white space characters have the behavior of the
Bourne shell, but other adjacent delimiters separate null
fields.
The [1mread [22mcommand is often used in scripts that interact with
the user by prompting the user and then requesting some
input. With the Bourne shell two commands are needed; one
to prompt the user, the other to read the reply. [1mksh [22mallows
these two commands to be combined. The first argument of
the [1mread [22mcommand can be followed by a [1m? [22mand a prompt string
which is used whenever the input device is a terminal.
Because the prompt is associated with the [1mread [22mbuilt-in, the
built-in command line editors will be able to re-output the
prompt whenever the line needs to be refreshed when reading
from a terminal device.
With the Bourne shell, there is no way to set a time limit
for waiting for the user response to read. The [1m-t [22moption to
[1mread [22mtakes a floating point argument that gives the time in
seconds, or fractions of seconds that the shell should wait
for a reply.
The version of the [1mecho [22mcommand in System V treats certain
sequences beginning with [1m\ [22mas control sequences. This makes
it hard to output strings without interpretation. Most BSD
derived systems do not interpret [1m\ [22mcontrol sequences.
Unfortunately, the BSD versions of [1mecho [22maccepts a [1m-n [22moption
to prevent a trailing new-line, but has no way to cause the
string [1m-n [22mto be printed. Neither of these versions is
adequate. Also, because they are incompatible, it is very
hard to write portable shell scripts using [1mecho[22m. The [1mksh[0m
built-in, [1mprint[22m, outputs characters to the terminal or to a
file and subsumes the functions of all versions of [1mecho[22m.
Ordinarily, escape sequences in arguments beginning with [1m\[0m
are processed the same as for the System V [1mecho [22mcommand.
However [1mprint [22mfollows the standard conventions for options
and has options that make [1mprint [22mvery versatile. The [1m-r[0m
option can be used to output the arguments without any
special meaning. The [1m-n [22moption can be used here to suppress
the trailing new-line that is ordinarily appended. As with
[1mread[22m, it is possible to specify the file descriptor number
as an option to the command to avoid having to use
redirection operators with each occurrence of the command.
The IEEE POSIX shell and utilities standard committee was
unable to reconcile the differences between the System V and
BSD versions of [1mecho[22m. They introduced a new command named
[1mprintf [22mwhich takes an ANSI-C format string and a list of
options and outputs the strings using the ANSI-C formatting
rules. Since [1mksh [22mis POSIX conforming, it accepts [1mprintf[22m.
However, there is a [1m-f [22moptions to [1mprint [22mthat can be used to
specify a format string which processes the arguments the
same way that [1mprintf [22mdoes.
The format processing for [1mprint [22mand [1mprintf [22mhas been extended
slightly. There are three additional formatting directives.
The [1m%b [22mformat causes the [1m\ [22mescape sequences to be expanded
as they are with the System V [1mecho [22mcommand. The [1m%q [22mformat
causes quotes to be placed on the output as required so that
it can be used as shell input. Special characters in the
output of most [1mksh [22mbuilt-in commands and in the output from
an execution trace are quoted in an equivalent fashion. The
[1m%P [22mformat causes an extended regular expression string to be
converted into a shell pattern. This is useful for writing
shell applications that have to accept regular expressions
as input. Finally, the escape sequence [1m\E [22mwhich expands to
the terminal escape character (octal 033) has been added.
The shell is frequently used as a programming language for
interactive dialogues. The [1mselect [22mstatement has been added
to the language to make it easier to present menu selection
alternatives to the user and evaluate the reply. The list
of alternatives is numbered and put in columns. A user
settable prompt, [1mPS3[22m, is issued and if the answer is a
number corresponding to one of the alternatives, the select
loop variable is set to this value. In any case, the [1mREPLY[0m
variable is used to store the user entered reply. The shell
variables [1mLINES [22mand [1mCOLUMNS [22mare used to control the layout
of select lists.
[4m3.10[24m [4mOption[24m [4mParsing[0m
The [1mgetopts [22mbuilt-in command can be used to process command
arguments in a manner consistent with the way [1mksh [22mdoes for
its own built-in commands.
The [1mgetopts [22mbuilt-in allows users to specify options as
separate arguments or to group options that do not take
arguments together. Options that require arguments do not
require space to separate them from the option argument.
The [1mOPTARG [22mvariable stores the value of the option argument
after finding a variable that takes an argument. The [1mOPTIND[0m
variable holds the index of the current options argument.
After processing options, the arguments should be shifted by
[1mOPTIND-1 [22mto make the remaining arguments be [1m"$@"[22m.
The [1mgetopts [22margument description allows additional
information to be specified along with the options that is
used to generate [4musage[24m messages for incorrect arguments and
for the option argument [1m-?[22m. The example in the APPENDIX
uses [1mgetopts [22mto process its arguments.
[4m3.11[24m [4mCo-process[0m
[1mksh [22mcan spawn a [4mco-process[24m by adding a [1m|& [22mafter a command.
This process will be run with its standard input and its
standard output connected to the shell. The built-in
command [1mprint [22mwith the [1m-p [22moption will write into the
standard input of this process and the built-in command [1mread[0m
with the [1m-p [22moption will read from the output of this
process.
In addition, the I/O redirection operators [1m<& [22mand [1m>& [22mcan be
used to move the input or output pipe of the co-process to a
numbered file descriptor. Use [1mexec 3>& p [22mto move the input
of the co-process to file descriptor [1m3[22m. After you have
connected to file descriptor [1m3[22m, you can direct the output of
any command to the co-process by running [4mcommand[24m [1m>&3[22m. Also,
by moving the input of the co-process to a numbered
descriptor, it is possible to run a second co-process. The
output of both co-processes will be the file descriptor
associated with [1mread -p[22m. You can use [1mexec 4<& p [22mto cause
the output of these co-processes to go to file descriptor [1m4[0m
of the shell. Once you have moved the pipe to descriptor [1m4[22m,
it is possible to connect a server to the co-process by
running [4mcommand[24m [1m4<& p [22mor to close the co-process pipe with
[1mexec 4<& -[22m.
[4m3.12[24m [4mFunctions[0m
Function definitions are of the form
[1mfunction [4m[22mname[0m
[1m{[0m
any shell script
[1m}[0m
A function whose name contains a [1m. [22mis called a [4mdiscipline[0m
function. The portion of the name after the last [1m. [22mis the
name of the discipline. Discipline functions named [1mget[22m,
[1mset[22m, and [1munset [22mcan be assigned to any variable to intercept
lookups, assignments and unsetting of the variable defined
by the portion of the name before the last [1m.[22m. Applications
can create additional disciplines for variables that are
created as part of user defined built-ins. The portion of
the name before the last [1m. [22mmust refer to the name of an
existing variable. Thus, if [1mp [22mis a reference to [1mPATH[22m, then
the function name [1mp.get [22mand [1mPATH.get [22mrefer to the same
function.
The function is invoked either by specifying [4mname[24m as the
command name and optionally following it with arguments or
by using it as an option to the [1m. [22mbuilt-in command.
Positional parameters are saved before each function call
and restored when completed. The arguments that follow the
function name on the calling line become positional
parameters inside the function. The [1mreturn [22mbuilt-in can be
used to cause the function to return to the statement
following the point of invocation.
Functions can also be defined with the System V notation,
[4mname[24m [1m()[0m
[1m{[0m
any shell script
[1m}[0m
Functions defined with this syntax cannot be used as the
first argument to a [1m. [22mprocedure. [1mksh [22maccepts this notation
for compatibility only. There is no need to use this
notation when writing [1mksh [22mscripts.
Functions defined with the [1mfunction [4m[22mname[24m syntax and invoked
by name are executed in the current shell environment and
can share named variables with the calling program.
Options, other than execution trace [1m-x[22m, set by the calling
program are passed down to a function. The options are not
shared with the function so that any options set within a
function are restored when the function exits. Traps
ignored by the caller are ignored within the function and
cannot be enabled. Traps caught by the calling program are
reset to their default action within the function. In most
instances, the default action is to cause the function to
terminate. A trap on [1mEXIT [22mdefined within a function
executes after the function completes but before the caller
resumes. Therefore, any variable assignments and any
options set as part of a trap action will be effective after
the caller resumes.
By default, variables are inherited by the function and
shared by the calling program. However, for functions
defined with the [1mfunction [4m[22mname[24m syntax that are invoked by
name, environment substitutions preceding the function call
apply only to the scope of the function call. Also,
variables whose names do not contain a [1m. [22mthat are defined
with the [1mtypeset [22mbuilt-in command are local to the function
that they are declared in. Thus, for the function defined
[1mfunction name[0m
[1m{[0m
[1mtypeset -i x=10[0m
[1mlet z=x+y[0m
[1mprint $z[0m
[1m}[0m
invoked as [1my=13 name[22m, [1mx [22mand [1my [22mare local variables with
respect to the function [1mname [22mwhile [1mz [22mis global.
Functions defined with the [4mname[24m[1m() [22msyntax, and functions
invoked as an argument to the [1m. [22mcommand, share everything
other than positional parameters with the caller.
Assignments that precede the call remain in effect after the
function completes.
Alias and function names are not passed down to shell
scripts or carried across separate invocations of [1mksh[22m. The
[1m$FPATH [22mvariable gives a colon separated list of directories
that is searched for function definitions when trying to
resolve the command name. Whenever a file name contained in
[1m$FPATH [22mis found, the complete file is read and all functions
contained within become defined.
Calls that reference functions can be recursive. Except for
special built-ins, function names take precedence over
built-in names and names of programs when used as command
names. To write a replacement function that invokes the
command that you wish to replace, you can use the [1mcommand[0m
built-in command. The arguments to [1mcommand [22mare the name and
arguments of the program you want to execute. For example
to write a [1mcd [22mfunction which changes the directory and
prints out the directory name, you can write
[1mfunction cd[0m
[1m{[0m
[1mif command cd "$@"[0m
[1mthen print -r -- $PWD[0m
[1mfi[0m
[1m}[0m
The [1mFPATH [22mvariable is a colon separated list that [1mksh [22muses
to search for function definitions. When [1mksh [22mencounters an
autoload function, it runs the [1m. [22mcommand on the script
containing the function, and then executes the function.
For interactive shells, function definitions may also be
placed in the [1mENV [22mfile. However, this causes the shell to
take longer to begin executing.
[4m3.13[24m [4mProcess[24m [4mSubstitution[0m
This feature is only available on versions of the UNIX
operating system which support the [1m/dev/fd [22mdirectory for
naming open files. Each command argument of the form
[1m<([4m[22mlist[24m[1m) [22mor [1m>([4m[22mlist[24m[1m) [22mwill run process [4mlist[24m asynchronously
connected to some file in the [1m/dev/fd [22mdirectory. The name
of this file will become the argument to the command. If
the form with [1m> [22mis selected then writing on this file will
provide input for [4mlist[24m. If [1m< [22mis used, then the file passed
as an argument will contain the output of the [4mlist[24m process.
For example,
[1mpaste <(cut -f1 [4m[22mfile1[24m[1m) <(cut -f2 [4m[22mfile2[24m[1m) | tee >([4m[22mprocess1[24m[1m) >([4m[22mprocess2[24m[1m)[0m
extracts fields 1 and 3 from the files [4mfile1[24m and [4mfile2[0m
respectively, places the results side by side, and sends it
to the processes [4mprocess1[24m and [4mprocess2[24m, as well as putting
it onto the standard output. Note that the file which is
passed as an argument to the command is a UNIX system
[4mpipe[24m(2) so that the programs that expect to [4mlseek[24m(2) on the
file will not work.
[4m3.14[24m [4mFinding[24m [4mCommands[0m
The addition of aliases, functions, and more built-ins has
made it substantially more difficult to know what a given
command name really means.
Commands that begin with reserved words are an integral part
of the shell language itself and typically define the
control flow of the language. Some control flow commands
are not reserved words in the language but are [4mspecial[0m
built-ins. Special built-ins are built-ins that are
considered a part of the language rather than user definable
commands. The best examples of commands that fit this
description are [1mbreak [22mand [1mcontinue[22m. Because they are not
reserved words, they can be the result of shell expansions
and are not effected by quoting. These commands have the
following special properties:
o Assignments that precede them apply to the current
shell process, not just to the given command.
o An error in the format of these commands cause a shell
script or function that contains them to abort.
o They cannot be overridden by shell functions.
Other commands are built-in because they perform side
effects on the current environment that would be nearly
impossible to implement otherwise. Built-ins such as [1mcd [22mand
[1mread [22mare examples of such built-ins. These built-ins are
semantically equivalent to commands that are not built-in
except that they don't take a path search to locate.
A third reason to have a command built-in is so that it will
be unaffected by the setting of the [1mPATH [22mvariable. The
[1mprint [22mcommand fits this category. Scripts that use [1mprint[0m
will be portable to all sites that run [1mksh[22m.
The final reason for having a command be a built-in is for
performance. On most systems it is more than an order of
magnitude faster to initiate a command that is built-in than
to create a separate process to run the command. Examples
that fit this category are [1mtest [22mand [1mpwd[22m.
Given a command name [1mksh [22mdecides what it means using the
following order:
o Reserved words define commands that form part of the
shell grammar. They cannot be quoted.
o Alias substitutions occur first as part of the reading
of commands. Using quotes in the command name will
prevent alias substitutions.
o Special built-ins.
o Functions.
o Commands that are built-in that are not associated with
a pathname such as [1mcd [22mand [1mprint[22m.
o If the command name contains a [1m/[22m, the program or script
corresponding to the given name is executed.
o A path search locates the pathname corresponding to the
command. If the pathname where it is found matches the
pathname associated with a built-in command, the built-
in command is executed. If the directory where the
command is found is listed in the [1mFPATH [22mvariable, the
file is read into the shell like a dot script, and a
function by that name is invoked. Once a pathname is
found, [1mksh [22mremembers its location and only checks
relative directories in [1mPATH [22mthe next time the command
name is used. Assigning a value to [1mPATH [22mcauses [1mksh [22mto
forget the location of all command names.
o The [1mFPATH [22mvariable is searched and files found are
treated as described above.
The first argument of the [1mcommand [22mbuilt-in, described
earlier, skips the checks for reserved words and for
function definitions. In all other ways, [1mcommand [22mbehaves
like a built-in that is not associated with a pathname. As
a result, if the first argument of [1mcommand [22mis a special
built-in, the special properties of this built-in do not
apply. For example, whereas, [1mexec 3< foo [22mwill cause a
script containing it to abort if the open fails,
[1mcommand exec 3< foo [22mresults in a non-zero exit status but
does not abort the script.
You can get a complete list of the special built-in commands
with [1mbuiltin -s[22m. In addition [1mbuiltin [22mwithout arguments
gives a list of the current built-ins and the pathname that
they are associated with. A built-in can be bound to
another pathname by giving the pathname for the built-in.
The basename of this path must be the name of an existing
built-in for this to succeed. Specifying the name of the
built-in without a pathname causes this built-in to be found
before a path search. A built-in can be deleted with the
[1m-d [22moption.
On systems with run time loading of libraries, built-in
commands can be added with the [1mbuiltin [22mcommand. Each
command that is to be built-in must be written as a C
function whose name is of the form [1mb_[4m[22mname[24m, where [4mname[24m is the
name of the built-in that is to be added. The function has
the same argument calling convention as [1mmain[22m. The lower
eight bits of the return value become the exit status for
this built-in. Builtins are added by specifying the
pathname of the library as an argument to the [1m-f [22moption of
[1mbuiltin[22m.
The built-in command, [1mwhence[22m, when used with the [1m-v [22moption,
tells how a given command is bound. A line is printed for
each argument to [1mwhence [22mtelling what would happen if this
argument were used as a command name. It reports on
reserved words, aliases, built-ins, and functions. If the
command is none of the above, it follows the path search
rules and prints the full path-name, if any, otherwise it
prints an error message.
[4m3.15[24m [4mSymbolic[24m [4mNames[0m
To avoid implementation dependencies, [1mksh [22maccepts and
generates symbolic names for built-ins that use numerical
values in the Bourne shell. The [1m-S [22moption of the [1mumask[0m
built-in command accepts and displays default file creation
permissions symbolically. It uses the same symbolic
notation as the [1mchmod [22mcommand.
The [1mtrap [22mand [1mkill [22mbuilt-in commands allows the signal names
to be given symbolically. The names of signals and traps
corresponding to signals are the same as the signal name
with the [1mSIG [22mprefix removed. The trap [1m0 [22mis named [1mEXIT[22m.
[4m3.16[24m [4mAdditional[24m [4mVariables[0m
In addition to the variables discussed earlier, [1mksh [22mhas
other variables that it handles specially. The variable
[1mRANDOM [22mproduces a random number in the range 0 to 32767 each
time it is referenced. Assignment to this variable sets the
seed for the random number generator.
The parameter [1mPPID [22mis used to generate the process id of the
process which invoked this shell.
[4m3.17[24m [4mAdded[24m [4mTraps[0m
A new trap named [1mERR [22mhas been added. This trap is invoked
whenever the shell would exit if the [1m-e [22moption were set.
This trap is used by Fourth Generation Make[16] which runs
[1mksh [22mas a co-process.
A trap named [1mDEBUG [22mgets executed after each command. This
trap can be used for debugging and other purposes.
The [1mKEYBD [22mtrap was described earlier.
[4m3.18[24m [4mDebugging[0m
The primary method for debugging Bourne shell scripts is to
use the [1m-x [22moption to enable the execution trace. After all
the expansions have been performed, but before each command
is executed, the trace writes to standard error the name and
arguments of each command preceded by a [1m+[22m. While the trace
is very useful, there is no way to find out what line of
source a given trace line corresponds to. With [1mksh [22mthe [1mPS4[0m
variable is evaluated for parameter expansion and is
displayed before each command, instead of the [1m+[22m.
The [1mLINENO [22mvariable is set to the current line number
relative to the beginning of the current script or function.
It is most useful as part of the [1mPS4 [22mprompt.
The [1mDEBUG [22mtrap can be used to write a break point shell
debugger in [1mksh[22m. An example of such a debugger is
[1mkshdb[22m.[17]
[4m3.19[24m [4mTiming[24m [4mCommands[0m
Finding the time it takes to execute commands has been a
serious problem with the Bourne shell. Since the [1mtime[0m
command is not part of the language, it is necessary to
write a script in order to time a [1mfor [22mor [1mwhile [22mloop. The
extra time in invoking the shell and processing the script
is accumulated along with the time to execute the script.
More seriously, the Bourne shell does not give correct times
for pipelines. The reason for this is that the times for
some members of a pipeline are not counted when computing
the time. As an extreme example, running [1mtime [22mon the script
[1mcat < /dev/null | sort -u bigfile | wc[0m
with the Bourne shell will show very little user and system
time no matter how large [1mbigfile [22mis.
To correct these problems, a reserved word [1mtime [22mhas been
added to replace the [1mtime [22mcommand. Any function, command or
pipeline can be preceded by this reserved word to obtain
information about the elapsed, user, and system times.
Since I/O redirections bind to the command, not to [1mtime[22m,
parentheses should be used to redirect the timing
information which is normally printed on file descriptor 2.
[4m4.[24m [4mSECURITY[0m
There are several documented problems associated with the
security of shell procedures[18]. These security holes
occur primarily because a user can manipulate the
[4menvironment[24m to subvert the intent of a [4msetuid[24m shell
procedure. Sometimes, shell procedures are initiated from
binary programs, without the author's awareness, by library
routines which invoke shells to carry out their tasks. When
the binary program is run [4msetuid[24m then the shell procedure
runs with the permissions afforded to the owner of the
binary file.
In the Bourne shell, the [1mIFS [22mparameter is used to split each
word into separate command arguments. If a user knows that
some [4msetuid[24m program will run [1msh -c /bin/pwd [22m(or any other
command in [1m/bin[22m) then the user sets and exports [1mIFS=/[22m.
Instead of running [1m/bin/pwd [22mthe shell will run [1mbin [22mwith [1mpwd[0m
as an argument. The user puts his or her own [1mbin [22mprogram
into the current directory. This program can create a copy
of the shell, make this shell [4msetuid[24m, and then run the
[1m/bin/pwd [22mprogram so that the original program continues to
run successfully. This kind of penetration is not possible
with [1mksh [22msince the [1mIFS [22mparameter only splits arguments that
result from command or parameter substitution.
Some [4msetuid[24m programs run programs using [4msystem()[24m without
giving the full pathname. If the user sets the [1mPATH[0m
variable so that the desired command will be found in his or
her local bin, then the same technique described above can
be employed to compromise the security of the system. To
close up this and other security holes, [1mksh [22mresets the
effective user id to the real user id and the effective
group id to the real group id unless the [4mprivileged[24m option
([1m-p[22m) is specified at invocation. In this mode, the
[1mprivileged [22mmode, the [1m.profile [22mand [1mENV [22mfiles are not
processed. Instead, the file [1m/etc/suid_profile [22mis read and
executed. This gives an administrator control over the
environment to set the [1mPATH [22mvariable or to log setuid shell
invocations. Clearly security of the system is compromised
if [1m/etc [22mor this file is publicly writable.
Some versions of the UNIX operating system look for the
characters [1m#! [22mas the first two characters of an executable
file. If these characters are found, then the next word on
this line is taken as the interpreter to invoke for this
command and the interpreter is [4mexec[24med with the name of the
script as argument zero and argument one. If the [4msetuid[24m or
[4msetgid[24m bits are on for this file, then the interpreter is
run with the effective uid and/or gid set accordingly. This
scheme has three major drawbacks. First of all, putting the
pathname of the interpreter into the script makes the script
less portable since the interpreter may be installed in a
different directory on another system. Secondly, using the
[1m#! [22mnotation forces an [1mexec [22mof the interpreter even when the
call is invoked from the interpreter which it must exec.
This is inefficient since [1mksh [22mcan handle a failed exec much
faster than starting up again. More importantly, [4msetuid[24m and
[4msetgid[24m procedures provide an easy target for intrusion. By
linking a [4msetuid[24m or [4msetgid[24m procedure to a name beginning
with a [1m- [22mthe interpreter is fooled into thinking that it is
being invoked with a command line option rather than the
name of a file. When the interpreter is the shell, the user
gets a privileged interactive shell. There is code in [1mksh[0m
to guard against this simple form of intrusion.
A more reliable way to handle [4msetuid[24m and [4msetgid[24m procedures
is provided with [1mksh[22m. The technique does not require any
changes to the operating system and provides better
security. Another advantage to this method is that it also
allows scripts which have execute permission but no read
permission to run. Taking away read permission makes
scripts more secure.
The method relies on a setuid [1mroot [22mprogram to authenticate
the request and exec the shell with the correct mode bits to
carry out the task. This shell is invoked with the
requested file already open for reading. A script which
cannot be opened for reading or which has its setuid and/or
setgid bits turned on causes this setuid [1mroot [22mprogram to get
[1mexec[22med. For security reasons, this program is given the
full pathname [1m/etc/suid_exec[22m. A description of the
implementation of the [1m/etc/suid_exec [22mprogram can be found in
a separate paper[19].
[4m5.[24m [4mCODE[24m [4mCHANGES[0m
[1mksh [22mis written in ANSI-C as a reusable library. The code
can be compiled with C++ and older K&R C as well. The code
uses the IEEE POSIX 1003.1 and ISO 9945-1 standard[20]
wherever possible so that [1mksh [22mshould be able to run on any
POSIX compliant system. In addition, it is possible to
compile [1mksh [22mfor older systems.
Unlike earlier version of the Bourne shell, [1mksh [22mtreats eight
bit characters transparently without stripping off the
leading bit. There is also a compile time switch to enable
handling multi-byte and multi-width characters sets.
On systems with dynamic libraries, it is possible to add
built-in commands at run time with the built-in command
[1mbuiltin [22mdescribed earlier. It is also possible to embed [1mksh[0m
in applications in a manner analogous to [1mtcl[22m.
[4m6.[24m [4mEXAMPLE[0m
An example of a [1mksh [22mscript is included in the Appendix.
This one page program is a variant of the UNIX system
[1mgrep[22m(1) program. Pattern matching for this version of [1mgrep[0m
means shell patterns.
The first half uses the [1mgetopts [22mcommand to find the option
flags. Nearly all options have been implemented. The
second half goes through each line of each file to look for
a pattern match.
This program is not intended to serve as a replacement for
[1mgrep [22mwhich has been highly tuned for performance. It does
illustrate the programming power of [1mksh[22m. Note that no
auxiliary processes are spawned by this script. It was
written and debugged in under two hours. While performance
is acceptable for small files, this program runs at only one
tenth the speed of [1mgrep [22mfor large files.
[4m7.[24m [4mPERFORMANCE[0m
[1mksh [22mexecutes many scripts faster than the System V Bourne
shell; in some cases more than 10 times as fast. The
primary reason for this is that [1mksh [22mcreates fewer processes.
The time to execute a built-in command or a function is one
or two orders of magnitude faster than performing a [1mfork[22m()
and [1mexec[22m() to create a separate process. Command
substitution and commands inside parentheses are performed
without creating another process, unless necessary to
preserve correct behavior.
Another reason for improved performance is the use of the
[1msfio[22m[21], library for I/O. The [1msfio [22mlibrary buffers all I/O
and buffers are flushed only when required. The algorithms
used in [1msfio [22mperform better than traditional versions of
standard I/O so that programs that spend most of their time
formatting output may actually perform better than versions
written in C.
Several of the internal algorithms have been changed so that
the number of subroutine calls has been substantially
reduced. [1mksh [22muses variable sized hash tables for variables.
Scripts that rely heavily on referencing variables execute
faster. More processing is performed while reading the
script so that execution time is saved while running loops.
These changes are not noticeable for scripts that [1mfork() [22mand
run processes, but they reduce the time that it takes to
interpret commands by more than a factor of two.
Most importantly, [1mksh [22mprovide mechanisms to write
applications that do not require as many processes. The
arithmetic provided by the shell eliminates the need for the
[1mexpr [22mcommand. The pattern matching and substring
capabilities eliminate the need to use [1msed [22mor [1mawk [22mto process
strings.
The architecture of [1mksh [22mmakes it easy to make commands
built-ins without changing the semantics at all. Systems
that have run-time binding of libraries allow applications
to be sped up by supplying the critical programs as shell
built-in commands. Implementations on other systems can add
built-in commands at compile time. The procedure for
writing built-in commands that can be loaded at run time is
in a separate document.[22],
[4m8.[24m [4mCONCLUSION[0m
The 1988 version of [1mksh [22mhas tens of thousands of regular
users and is a suitable replacement for the Bourne shell.
The 1993 version of [1mksh [22mis essentially upward compatible
with both the 1988 version of [1mksh [22mand with the recent IEEE
POSIX and ISO shell standard. The 1993 version offers many
advantages for programming applications, and it has been
rewritten so that it can be used in embedded applications.
It also offers improved performance.
MH-11267-DGK-dgk David G. Korn
[4mAPPENDIX[0m
[4mREFERENCES[0m
7. S. R. Bourne, [4mAn[24m [4mIntroduction[24m [4mto[24m [4mthe[24m [4mUNIX[24m [4mShell[24m, Bell
System Technical Journal, Vol. 57, No. 6, Part 2, pp.
1947-1972, July 1978.
8. W. Joy, [4mAn[24m [4mIntroduction[24m [4mto[24m [4mthe[24m [4mC[24m [4mShell[24m, Unix Program-
mer's Manual, Berkeley Software Distribution, Universi-
ty of California, Berkeley, 1980.
9. Morris Bolsky and David Korn, [4mThe[24m [4mKornShell[24m [4mCommand[24m [4mand[0m
[4mProgramming[24m [4mLanguage[24m, Prentice Hall, 1989.
10. Jason Levitt, [4mThe[24m [4mKorn[24m [4mShell:[24m [4mAn[24m [4mEmerging[24m [4mStandard[24m,
UNIX/World, pp. 74-81, September 1986.
11. Rich Bilancia, [4mProficiency[24m [4mand[24m [4mPower[24m [4mare[24m [4mYours[24m [4mWith[24m [4mthe[0m
[4mKorn[24m [4mShell[24m, UNIX/World, pp. 103-107, September 1987.
12. John Sebes, [4mComparing[24m [4mUNIX[24m [4mShells,[24m UNIX Papers, Edited
by the Waite Group, Howard W. Sams & Co., 1987.
13. T. A. Dolotta and J. R. Mashey, [4mUsing[24m [4mthe[24m [4mshell[24m [4mas[24m [4ma[0m
[4mPrimary[24m [4mProgramming[24m [4mTool,[24m Proc. 2nd. Int. Conf. on
Software Engineering, 1976, pages 169-176.
14. J. S. Pendergrast, [4mWKSH[24m [4m-[24m [4mKorn[24m [4mShell[24m [4mwith[24m [4mX-Windows[0m
[4mSupport[24m, USL. 1991.
15. American National Standard for Information Systems -
Programming Language - C, ANSI X3.159-1989.
16. G. S. Fowler, [4mThe[24m [4mFourth[24m [4mGeneration[24m [4mMake,[24m Proceedings
of the Portland USENIX meeting, pp. 159-174, 1985.
17. Bill Rosenblatt, [4mDebugging[24m [4mShell[24m [4mScripts[24m [4mwith[24m [1mkshdb[22m,
Unix World, Volume X, No. 5, 1993.
18. F. T. Grampp and R. H. Morris, [4mUNIX[24m [4mOperating[24m [4mSystem[0m
[4mSecurity,[24m AT&T Bell Labs Tech. Journal, Vol. 63, No. 8,
Part 2, pp. 1649-1671, 1984.
19. D. G Korn [4mParlez-vous[24m [4mKanji?[24m TM-59554-860602-03, 1986.
20. [4mPOSIX[24m [4m-[24m [4mPart[24m [4m1:[24m [4mSystem[24m [4mApplication[24m [4mProgram[24m [4mInterface,[0m
IEEE Std 1003.1-1990, ISO/IEC 9945-1:1990.
21. David Korn and Kiem-Phong Vo, [4mSFIO[24m [4m-[24m [4mA[24m [4mSafe/Fast[0m
[4mString/File[24m [4mI/O,[24m Proceedings of the Summer Usenix, pp.
235-255, 1991.
22. David Korn, [4mGuidelines[24m [4mfor[24m [4mwriting[24m [1mksh-93 [4m[22mbuilt-in[24m [4mcom-[0m
[4mmands,[24m to be published, 1994.
ACC SHELL 2018