ACC SHELL
[1mGuidelines for writing ksh-93 built-in commands[0m
[4mDavid[24m [4mG.[24m [4mKorn[0m
[4m1.[24m [4mINTRODUCTION[0m
A built-in command is executed without creating a separate
process. Instead, the command is invoked as a C function by
[1mksh[22m. If this function has no side effects in the shell
process, then the behavior of this built-in is identical to
that of the equivalent stand-alone command. The primary
difference in this case is performance. The overhead of
process creation is eliminated. For commands of short
duration, the effect can be dramatic. For example, on SUN
OS 4.1, the time do run [1mwc [22mon a small file of about 1000
bytes, runs about 50 times faster as a built-in command.
In addition, built-in commands that have side effects on the
shell environment can be written. This is usually done to
extend the application domain for shell programming. For
example, an X-windows extension that makes heavy use of the
shell variable namespace was added as a group of built-ins
commands that are added at run time. The result is a
windowing shell that can be used to write X-windows
applications.
While there are definite advantages to adding built-in
commands, there are some disadvantages as well. Since the
built-in command and [1mksh [22mshare the same address space, a
coding error in the built-in program may affect the behavior
of [1mksh[22m; perhaps causing it to core dump or hang. Debugging
is also more complex since your code is now a part of a
larger entity. The isolation provided by a separate process
guarantees that all resources used by the command will be
freed when the command completes. Also, since the address
space of [1mksh [22mwill be larger, this may increase the time it
takes [1mksh [22mto fork() and exec() a non-builtin command. It
makes no sense to add a built-in command that takes a long
time to run or that is run only once, since the performance
benefits will be negligible. Built-ins that have side
effects in the current shell environment have the
disadvantage of increasing the coupling between the built-in
and [1mksh [22mmaking the overall system less modular and more
monolithic.
Despite these drawbacks, in many cases extending [1mksh [22mby
adding built-in commands makes sense and allows reuse of the
shell scripting ability in an application specific domain.
This memo describes how to write [1mksh [22mextensions.
[4m2.[24m [4mWRITING[24m [4mBUILT-IN[24m [4mCOMMANDS[0m
There is a development kit available for writing [1mksh [22mbuilt-
ins. The development kit has three directories, [1minclude[22m,
[1mlib[22m, and [1mbin[22m. The [1minclude [22mdirectory contains a sub-
directory named [1mast [22mthat contains interface prototypes for
functions that you can call from built-ins. The [1mlib[0m
directory contains the [1mast [22mlibrary1 and a library named
[1mlibcmd [22mthat contains a version of several of the standard
POSIX[1] utilities that can be made run time built-ins. It
is best to set the value of the environment variable
[1mPACKAGE_ast [22mto the pathname of the directory containing the
development kit. Users of [1mnmake[22m[2] 2.3 and above will then
be able to use the rule
[1m:PACKAGE: ast[0m
in their makefiles and not have to specify any [1m-I [22mswitches
to the compiler.
A built-in command has a calling convention similar to the
[1mmain [22mfunction of a program,
[1mint main(int argc, char *argv[])[22m.
However, instead of [1mmain[22m, you must use the function name
[1mb_[4m[22mname[24m, where [4mname[24m is the name of the built-in you wish to
define. The built-in function takes a third [1mvoid* [22margument
which you can define as [1mNULL[22m. Instead of [1mexit[22m, you need to
use [1mreturn [22mto terminate your command. The return value,
will become the exit status of the command.
The steps necessary to create and add a run time built-in
are illustrated in the following simple example. Suppose,
you wish to add a built-in command named [1mhello [22mwhich
requires one argument and prints the word hello followed by
its argument. First, write the following program in the
file [1mhello.c[22m:
[1m#include <stdio.h>[0m
[1mint b_hello(int argc, char *argv[], void *context)[0m
[1m{[0m
[1mif(argc != 2)[0m
[1m{[0m
[1mfprintf(stderr,"Usage: hello arg\n");[0m
[1mreturn(2);[0m
[1m}[0m
[1mprintf("hello %s\n",argv[1]);[0m
[1mreturn(0);[0m
[1m}[0m
Next, the program needs to be compiled. On some systems it
is necessary to specify a compiler option to produce
position independent code for dynamic linking. If you do
not compile with [1mnmake [22mit is important to specify the a
special include directory when compiling built-ins.
[1mcc -pic -I$PACKAGE_ast/include -c hello.c[0m
since the special version of [1m<stdio.h> [22min the development
kit is required. This command generates [1mhello.o [22min the
current directory.
____________________
1. [1mast [22mstands for Advanced Software Technology
On some systems, you cannot load [1mhello.o [22mdirectly, you must
build a shared library instead. Unfortunately, the method
for generating a shared library differs with operating
system. However, if you are building with the ATT [1mnmake[0m
program you can use the [1m:LIBRARY: [22mrule to specify this in a
system independent fashion. In addition, if you have
several built-ins, it is desirable to build a shared library
that contains them all.
The final step is using the built-in. This can be done with
the [1mksh [22mcommand [1mbuiltin[22m. To load the shared library
[1mhello.so [22mand to add the built-in [1mhello[22m, invoke the command,
[1mbuiltin -f hello hello[0m
The suffix for the shared library can be omitted in which
case the shell will add an appropriate suffix for the system
that it is loading from. Once this command has been
invoked, you can invoke [1mhello [22mas you do any other command.
It is often desirable to make a command [4mbuilt-in[24m the first
time that it is referenced. The first time [1mhello [22mis
invoked, [1mksh [22mshould load and execute it, whereas for
subsequent invocations [1mksh [22mshould just execute the built-in.
This can be done by creating a file named [1mhello [22mwith the
following contents:
[1mfunction hello[0m
[1m{[0m
[1munset -f hello[0m
[1mbuiltin -f hello hello[0m
[1mhello "$@"[0m
[1m}[0m
This file [1mhello [22mneeds to be placed in a directory that is in
your [1mFPATH [22mvariable. In addition, the full pathname for
[1mhello.so [22mshould be used in this script so that the run time
loader will be able to find this shared library no matter
where the command [1mhello [22mis invoked.
[4m3.[24m [4mCODING[24m [4mREQUIREMENTS[24m [4mAND[24m [4mCONVENTIONS[0m
As mentioned above, the entry point for built-ins must be of
the form [1mb_[4m[22mname[24m. Your built-ins can call functions from the
standard C library, the [1mast [22mlibrary, interface functions
provided by [1mksh[22m, and your own functions. You should avoid
using any global symbols beginning with [1msh_[22m, [1mnv_[22m, and [1med_[0m
since these are used by [1mksh [22mitself. In addition, [1m#define[0m
constants in [1mksh [22minterface files, use symbols beginning with
[1mSH_ [22mto that you should avoid using names beginning with [1mSH_[22m.
[4m3.1[24m [4mHeader[24m [4mFiles[0m
The development kit provides a portable interface to the C
library and to libast. The header files in the development
kit are compatible with K&R C[3], ANSI-C[4], and C++[5].
The best thing to do is to include the header file
[1m<shell.h>[22m. This header file causes the [1m<ast.h> [22mheader, the
[1m<error.h> [22mheader and the [1m<stak.h> [22mheader to be included as
well as defining prototypes for functions that you can call
to get shell services for your builtins. The header file
[1m<ast.h> [22mprovides prototypes for many [1mlibast [22mfunctions and
all the symbol and function definitions from the ANSI-C
headers, [1m<stddef.h>[22m, [1m<stdlib.h>[22m, [1m<stdarg.h>[22m, [1m<limits.h>[22m, and
[1m<string.h>[22m. It also provides all the symbols and
definitions for the POSIX[6] headers [1m<sys/types.h>[22m,
[1m<fcntl.h>[22m, and [1m<unistd.h>[22m. You should include [1m<ast.h>[0m
instead of one or more of these headers. The [1m<error.h>[0m
header provides the interface to the error and option
parsing routines defined below. The [1m<stak.h> [22mheader
provides the interface to the memory allocation routines
described below.
Programs that want to use the information in [1m<sys/stat.h>[0m
should include the file [1m<ls.h> [22minstead. This provides the
complete POSIX interface to [1mstat() [22mrelated functions even on
non-POSIX systems.
[4m3.2[24m [4mInput/Output[0m
[1mksh [22muses [1msfio[22m, the Safe/Fast I/O library[7], to perform all
I/O operations. The [1msfio [22mlibrary, which is part of [1mlibast[22m,
provides a superset of the functionality provided by the
standard I/O library defined in ANSI-C. If none of the
additional functionality is required, and if you are not
familiar with [1msfio [22mand you do not want to spend the time
learning it, then you can use [1msfio [22mvia the [1mstdio [22mlibrary
interface. The development kit contains the header
[1m<stdio.h> [22mwhich maps [1mstdio [22mcalls to [1msfio [22mcalls. In most
instances the mapping is done by macros or inline functions
so that there is no overhead. The man page for the [1msfio[0m
library is in an Appendix.
However, there are some very nice extensions and performance
improvements in [1msfio [22mand if you plan any major extensions I
recommend that you use it natively.
[4m3.3[24m [4mError[24m [4mHandling[0m
For error messages it is best to use the [1mast [22mlibrary
function [1merrormsg() [22mrather that sending output to [1mstderr [22mor
the equivalent [1msfstderr [22mdirectly. Using [1merrormsg() [22mwill
make error message appear more uniform to the user.
Furthermore, using [1merrormsg() [22mshould make it easier to do
error message translation for other locales in future
versions of [1mksh[22m.
The first argument to [1merrormsg() [22mspecifies the dictionary in
which the string will be searched for translation. The
second argument to [1merrormsg() [22mcontains that error type and
value. The third argument is a [4mprintf[24m style format and the
remaining arguments are arguments to be printed as part of
the message. A new-line is inserted at the end of each
message and therefore, should not appear as part of the
format string. The second argument should be one of the
following:
[1mERROR_exit([4m[22mn[24m[1m)[22m: If [4mn[24m is not-zero, the builtin will exit value
[4mn[24m after printing the message.
[1mERROR_system([4m[22mn[24m[1m)[22m: Exit builtin with exit value [4mn[24m after
printing the message. The message will display the
message corresponding to [1merrno [22menclosed within [1m[ ] [22mat
the end of the message.
[1mERROR_usage([4m[22mn[24m[1m)[22m: Will generate a usage message and exit. If
[4mn[24m is non-zero, the exit value will be 2. Otherwise the
exit value will be 0.
[1mERROR_debug([4m[22mn[24m[1m)[22m: Will print a level [4mn[24m debugging message and
will then continue.
[1mERROR_warn([4m[22mn[24m[1m)[22m: Prints a warning message. [4mn[24m is ignored.
[4m3.4[24m [4mOption[24m [4mParsing[0m
The first thing that a built-in should do is to check the
arguments for correctness and to print any usage messages on
standard error. For consistency with the rest of [1mksh[22m, it is
best to use the [1mlibast [22mfunctions [1moptget() [22mand [1moptusage()[22mfor
this purpose. The header [1m<error.h> [22mincluded prototypes for
these functions. The [1moptget() [22mfunction is similar to the
System V C library function [1mgetopt()[22m, but provides some
additional capabilities. Built-ins that use [1moptget()[0m
provide a more consistent user interface.
The [1moptget() [22mfunction is invoked as
[1mint optget(char *argv[], const char *optstring)[0m
where [1margv [22mis the argument list and [1moptstring [22mis a string
that specifies the allowable arguments and additional
information that is used to format [4musage[24m messages. In fact
a complete man page in [1mtroff [22mor [1mhtml [22mcan be generated by
passing a usage string as described by the [1mgetopts [22mcommand.
Like [1mgetopt()[22m, single letter options are represented by the
letter itself, and options that take a string argument are
followed by the [1m: [22mcharacter. Option strings have the
following special characters:
[1m: [22mUsed after a letter option to indicate that the option
takes an option argument. The variable [1mopt_info.arg[0m
will point to this value after the given argument is
encountered.
[1m# [22mUsed after a letter option to indicate that the option
can only take a numerical value. The variable
[1mopt_info.num [22mwill contain this value after the given
argument is encountered.
[1m? [22mUsed after a [1m: [22mor [1m# [22m(and after the optional [1m?[22m) to
indicate the the preceding option argument is not
required.
[1m[[22m...[1m] [22mAfter a [1m: [22mor [1m#[22m, the characters contained inside the
brackets are used to identify the option argument when
generating a [4musage[24m message.
[4mspace[24m The remainder of the string will only be used when
generating usage messages.
The [1moptget() [22mfunction returns the matching option letter if
one of the legal option is matched. Otherwise, [1moptget()[0m
returns
[1m':' [22mIf there is an error. In this case the variable
[1mopt_info.arg [22mcontains the error string.
[1m0 [22mIndicates the end of options. The variable
[1mopt_info.index [22mcontains the number of arguments
processed.
[1m'?' [22mA usage message has been required. You normally call
[1moptusage() [22mto generate and display the usage message.
The following is an example of the option parsing portion of
the [1mwc [22mutility.
[1m#include <shell.h>[0m
[1mwhile(1) switch(n=optget(argv,"xf:[file]"))[0m
[1m{[0m
[1mcase 'f':[0m
[1mfile = opt_info.arg;[0m
[1mbreak;[0m
[1mcase ':':[0m
[1merror(ERROR_exit(0), opt_info.arg);[0m
[1mbreak;[0m
[1mcase '?':[0m
[1merror(ERROR_usage(2), opt_info.arg);[0m
[1mbreak;[0m
[1m}[0m
[4m3.5[24m [4mStorage[24m [4mManagement[0m
It is important that any memory used by your built-in be
returned. Otherwise, if your built-in is called frequently,
[1mksh [22mwill eventually run out of memory. You should avoid
using [1mmalloc() [22mfor memory that must be freed before
returning from you built-in, because by default, [1mksh [22mwill
terminate you built-in in the event of an interrupt and the
memory will not be freed.
The best way to to allocate variable sized storage is
through calls to the [1mstak [22mlibrary which is included in
[1mlibast [22mand which is used extensively by [1mksh [22mitself. Objects
allocated with the [1mstakalloc() [22mfunction are freed when you
function completes or aborts. The [1mstak [22mlibrary provides a
convenient way to build variable length strings and other
objects dynamically. The man page for the [1mstak [22mlibrary is
contained in the Appendix.
Before [1mksh [22mcalls each built-in command, it saves the current
stack location and restores it after it returns. It is not
necessary to save and restore the stack location in the [1mb_[0m
entry function, but you may want to write functions that use
this stack are restore it when leaving the function. The
following coding convention will do this in an efficient
manner:
[4myourfunction[24m[1m()[0m
[1m{[0m
[1mchar *savebase;[0m
[1mint saveoffset;[0m
[1mif(saveoffset=staktell())[0m
[1msavebase = stakfreeze(0);[0m
...
[1mif(saveoffset)[0m
[1mstakset(savebase,saveoffset);[0m
[1melse[0m
[1mstakseek(0);[0m
[1m}[0m
[4m4.[24m [4mCALLING[24m [1mksh [4m[22mSERVICES[0m
Some of the more interesting applications are those that
extend the functionality of [1mksh [22min application specific
directions. A prime example of this is the X-windows
extension which adds builtins to create and delete widgets.
The [1mnval [22mlibrary is used to interface with the shell name
space. The [1mshell [22mlibrary is used to access other shell
services.
[4m4.1[24m [4mThe[24m [4mnval[24m [4mlibrary[0m
A great deal of power is derived from the ability to use
portions of the hierarchal variable namespace provided by
[1mksh-93 [22mand turn these names into active objects.
The [1mnval [22mlibrary is used to interface with shell variables.
A man page for this file is provided in an Appendix. You
need to include the header [1m<nval.h> [22mto access the functions
defined in the [1mnval [22mlibrary. All the functions provided by
the [1mnval [22mlibrary begin with the prefix [1mnv_[22m. Each shell
variable is an object in an associative table that is
referenced by name. The type [1mNamval_t* [22mis pointer to a
shell variable. To operate on a shell variable, you first
get a handle to the variable with the [1mnv_open() [22mfunction and
then supply the handle returned as the first argument of the
function that provides an operation on the variable. You
must call [1mnv_close() [22mwhen you are finished using this handle
so that the space can be freed once the value is unset. The
two most frequent operations are to get the value of the
variable, and to assign value to the variable. The
[1mnv_getval() [22mreturns a pointer the the value of the variable.
In some cases the pointer returned is to a region that will
be overwritten by the next [1mnv_getval() [22mcall so that if the
value isn't used immediately, it should be copied. Many
variables can also generate a numeric value. The
[1mnv_getnum() [22mfunction returns a numeric value for the given
variable pointer, calling the arithmetic evaluator if
necessary.
The [1mnv_putval() [22mfunction is used to assign a new value to a
given variable. The second argument to [1mputval() [22mis the
value to be assigned and the third argument is a [4mflag[24m which
is used in interpreting the second argument.
Each shell variable can have one or more attributes. The
[1mnv_isattr() [22mis used to test for the existence of one or more
attributes. See the appendix for a complete list of
attributes.
By default, each shell variable passively stores the string
you give with with [1mnv_putval()[22m, and returns the value with
[1mgetval()[22m. However, it is possible to turn any node into an
active entity by assigning functions to it that will be
called whenever [1mnv_putval() [22mand/or [1mnv_getval() [22mis called.
In fact there are up to five functions that can associated
with each variable to override the default actions. The
type [1mNamfun_t [22mis used to define these functions. Only those
that are non-[1mNULL [22moverride the default actions. To override
the default actions, you must allocate an instance of
[1mNamfun_t[22m, and then assign the functions that you wish to
override. The [1mputval() [22mfunction is called by the
[1mnv_putval() [22mfunction. A [1mNULL [22mfor the [4mvalue[24m argument
indicates a request to unset the variable. The [4mtype[0m
argument might contain the [1mNV_INTEGER [22mbit so you should be
prepared to do a conversion if necessary. The [1mgetval()[0m
function is called by [1mnv_getval() [22mvalue and must return a
string. The [1mgetnum() [22mfunction is called by by the
arithmetic evaluator and must return double. If omitted,
then it will call [1mnv_getval() [22mand convert the result to a
number.
The functionality of a variable can further be increased by
adding discipline functions that can be associated with the
variable. A discipline function allows a script that uses
your variable to define functions whose name is
[4mvarname[24m[1m.[4m[22mdiscname[24m where [4mvarname[24m is the name of the variable,
and [4mdiscname[24m is the name of the discipline. When the user
defines such a function, the [1msettrap() [22mfunction will be
called with the name of the discipline and a pointer to the
parse tree corresponding to the discipline function. The
application determines when these functions are actually
executed. By default, [1mksh [22mdefines [1mget[22m, [1mset[22m, and [1munset [22mas
discipline functions.
In addition, it is possible to provide a data area that will
be passed as an argument to each of these functions whenever
any of these functions are called. To have private data,
you need to define and allocate a structure that looks like
[1mstruct [4m[22myours[0m
[1m{[0m
[1mNamfun_t fun;[0m
[4myour_data_fields[24m[1m;[0m
[1m};[0m
[4m4.2[24m [4mThe[24m [4mshell[24m [4mlibrary[0m
There are several functions that are used by [1mksh [22mitself that
can also be called from built-in commands. The man page for
these routines are in the Appendix.
The [1msh_addbuiltin() [22mfunction can be used to add or delete
builtin commands. It takes the name of the built-in, the
address of the function that implements the built-in, and a
[1mvoid* [22mpointer that will be passed to this function as the
third agument whenever it is invoked. If the function
address is [1mNULL[22m, the specified built-in will be deleted.
However, special built-in functions cannot be deleted or
modified.
The [1msh_fmtq() [22mfunction takes a string and returns a string
that is quoted as necessary so that it can be used as shell
input. This function is used to implement the [1m%q [22moption of
the shell built-in [1mprintf [22mcommand.
The [1msh_parse() [22mfunction returns a parse tree corresponding
to a give file stream. The tree can be executed by
supplying it as the first argument to the [1msh_trap() [22mfunction
and giving a value of [1m1 [22mas the second argument.
Alternatively, the [1msh_trap() [22mfunction can parse and execute
a string by passing the string as the first argument and
giving [1m0 [22mas the second argument.
The [1msh_isoption() [22mfunction can be used to set to see whether
one or more of the option settings is enabled.
[4mREFERENCES[0m
1. [4mPOSIX[24m [4m-[24m [4mPart[24m [4m2:[24m [4mShell[24m [4mand[24m [4mUtilities,[24m IEEE Std
1003.2-1992, ISO/IEC 9945-2:1993.
2. Glenn Fowler, Nmake reference needed
3. Brian W. Kernighan and Dennis M. Ritchie, [4mThe[24m [4mC[24m [4mPro-[0m
[4mgramming[24m [4mLanguage[24m, Prentice Hall, 1978.
4. American National Standard for Information Systems -
Programming Language - C, ANSI X3.159-1989.
5. Bjarne Stroustroup, [4mC++[24m, Addison Wesley, xxxx
6. [4mPOSIX[24m [4m-[24m [4mPart[24m [4m1:[24m [4mSystem[24m [4mApplication[24m [4mProgram[24m [4mInterface,[0m
IEEE Std 1003.1-1990, ISO/IEC 9945-1:1990.
7. David Korn and Kiem-Phong Vo, [4mSFIO[24m [4m-[24m [4mA[24m [4mSafe/Fast[24m [4mIn-[0m
[4mput/Output[24m [4mlibrary,[24m Proceedings of the Summer Usenix,
pp. , 1991.
ACC SHELL 2018