blob: ebb833c793bd4282dc5c0ab5d9a708508e0e1da9 [file] [log] [blame]
.ds lang COBOL
.ds gcobol GCC\ \*[lang]\ Front-end
.ds isostd ISO/IEC 1989:2023
.Dd \& February 2025
.Dt GCOBOL 1\& "GCC \*[lang] Compiler"
.Os Linux
.Sh NAME
.Nm gcobol
.Nd \*[gcobol]
.Sh SYNOPSIS
.Nm
.Op Fl D Ns Ar name Ns Oo Li = Ns Ar value Oc
.Op Fl E
.Op Fl fdefaultbyte Ns Li = Ns Ar value
.Op Fl fsyntax-only
.Op Fl I Ns Ar copybook-path
.Op Fl fmax-errors Ns Li = Ns Ar nerror
.Oo
.Fl nomain |
.Fl main Ar filename |
.Fl main Ns Li = Ns Ar filename
.Fl main Ns Li = Ns Ar filename:program-id
.Oc
.Op Fl fcobol-exceptions Ar exception Ns Op Ns \/, Ns Ar exception Ns ...
.Op Fl copyext Ar ext
.Op Fl ffixed-form | Fl ffree-form
.Op Fl findicator-column
.Op Fl finternal-ebcdic
.Op Fl dialect Ar dialect-name
.Op Fl include Ar filename
.Op Fl preprocess Ar preprocess-filter
.Op Fl fflex-debug
.Op Fl fyacc-debug
.Ar filename Op ...
.
.Sh DESCRIPTION
.Nm
compiles \*[lang] source code to object code, and optionally produces an
executable binary or shared object. As a GCC component, it accepts
all options that affect code-generation and linking. Options specific
to \*[lang] are listed below.
.Bl -tag -width "\0\0debug"
.It Fl main Ar filename
.Nm
will generate a
.Fn main
function as an entry point calling the first PROGRAM-ID in
.Ar filename .
.Pp
.Fl main
is the default. When none of
.Fl nomain ,
.Fl c ,
or
.Fl shared ,
is present, an implicit
.Fl main
is inserted into the command line ahead of the first source file name.
.It Fl main Ns Li = Ns Ar filename
The .o object module for
.Ar filename
will include a
.Fn main
entry point calling the first PROGRAM-ID in
.Ar filename
.It Fl main Ns Li = Ns Ar filename:program-id
The .o object module for
.Ar filename
will include a
.Fn main
entry point that calls the
.Ar program-id
entry point
.It Fl nomain
No
.Fn main
entry point will be generated by this
compilation. The
.Fl nomain
option is incompatible with
.Fl main ,
and is implied by
.Fl shared .
It is also implied by
.Fl c
when there is no
.Fl main
present.
.Pp
See below for examples showing the use of
.Fl main
and
.Fl nomain.
.It Fl D Ar name Ns Op Li = Ns Ar expr
Define a CDF name (for use with
.Sy >>IF )
to have the value of
.Ar expr .
.It Fl E
Write the CDF-processed \*[lang] input to standard output in
.Em "free-form reference format".
Certain non-\*[lang] markers are included in the
output to indicate where copybook files were included. For
line-number consistency with the input, blank lines are retained.
.Pp
Unlike the C compiler, This option does not prevent compilation.
To prevent compilation, use the option
.D1 Fl Sy fsyntax-only
also.
.It Fl fdefaultbyte Ns Li = Ns Ar value
Use
.Ar value ,
a number between 0 and 255, as the default value for all
WORKING-STORAGE data items that have no VALUE clause. By default,
alphanumeric data items are initialized with blanks, and numeric data
items are initialized to zero. This option overrides the default with
.Ar value .
.It Fl fsyntax-only
Invoke only the parser. Check the code for syntax errors, but don't do
anything beyond that.
.It Fl copyext Ar ext
For the CDF directive
.D1 COPY Ar name
if
.Ar name
is unquoted, several varieties of
.Ar name
are tried, as described below under
.Xr Copybooks Ns .
The
.Fl copyext
option extends the names searched to include
.Ar ext .
If
.Ar ext
is all uppercase or all lowercase, both forms are tried, with preference given to the one supplied. If
.Ar ext
is mixed-case, only that version is tried.
For example, with
.D1 Fl copyext Ar .abc
given the CDF directive
.D1 COPY name
.Nm
will add to possible names searched
.Ql name.abc
and
.Ql name.ABC
in that order.
.It Fl ffixed-form
Use strict
.Em "fixed-form reference format"
in reading the \*[lang] input:
72-character lines, with a 6-character sequence area, and an indicator
column. Data past column 72 are ignored.
.It Fl ffree-form
Force the \*[lang] input to be interpreted as
.Em "free-form reference format".
Line breaks are insignificant, except that
.Ql *
at the start of a line acts as a comment marker.
Equivalent to
.Fl indicator-column Ar 0 Ns Li .
.
.It Fl findicator-column
describes the location of the Indicator Area in a \*[lang] file
in
.Em "Reference Format" ,
where the first 6 columns \(em known as the
.Dq "Sequence Number Area"
\(em are ignored, and the 7th column \(em the Indicator
Area \(em may hold a character of significance to the compiler.
.Pp
Although
.Em "reference format" ,
strictly speaking, ignores data after column 72,
with this option
.Nm
accepts long \*[lang] lines, sometimes known as
.Em "extended source format" .
Text past column 72 is treated as ordinary \*[lang] text. (Line
continuation remains in effect, however,
provided no text appears
.Em past
column 72.)
.Pp
There is no maximum line length. Regardless of source code format,
the entire program could appear on one line.
.Pp
By default,
.Nm
auto-detects the source code format by examining the line that
contains the text "program-id". When there are characters on past column 72
on that line, the file is assumed to be in
.Em "extended source format",
with the indicator area in column 7.
Otherwise, columns 1-6 are examined. If those characters are all digits
or blanks, the file is assumed to be in
.Em "fixed-form reference format",
also with the indicator in column 7.
If not auto-detected as
.Em "fixed-form reference format"
or
.Em "extended source format",
the file is assumed to be in
.Em "free-form reference format".
.Pp
.
.It Fl fcobol-exceptions Ar exception Op Ns , Ns Ar exception Ns ...
By default, no exception condition is enabled (including fatal ones),
and by the ISO standard exception conditions are enabled only via the
CDF
.Sy "TURN"
directive. This option enables one or more exception conditions by
default, as though
.Sy TURN
had appeared at the top of the first source code file.
This option may also appear more than once on the command line.
.Pp
The value of
.Ar exception
is a Level 1, 2, or 3 exception condition name, as described by
\*[isostd].
.Ql EC-ALL
means enable all exceptions.
.Pp
The
.Fl fno-cobol-exceptions
form turns off
.Ar exception ,
just as though
.D1 >>TURN Ar exception CHECKING OFF
had appeared.
.Pp
Not all exception conditions are implemented. Any that are not
produce a warning message.
.
.It Fl fmax-errors Ns Li = Ns Ar nerror
.Ar nerror
represents the number of error messages produced. Without this option,
.Nm
attempts to recover from a syntax error by resuming compilation at the
next statement, continuing until end-of-file. With it,
.Nm
counts the messages as they're produced, and stops when
.Ar nerror
is reached.
.It Fl fstatic-call Ns , Fl fno-static-call
With
.Fl fno-static-call ,
.Nm
never uses static linking for
.D1 Sy CALL Ar program
By default, or with
.Fl fstatic-call ,
if
.Ar program
is an alphanumeric literal,
.Nm
uses static linkage, meaning the compiler produces an external symbol
.Ar program
for the linker to resolve.
(In the future, that will work with
.Sy CONSTANT
data items, too.) With static linkage, if
.Ar program
is not supplied by the source code module or another object file or library
at build time, the linker will produce an
.Dq "unresolved symbol"
error. With
.Fl fno-static-call ,
.Nm
always uses dynamic linking.
.Pp
This option affects the
.Sy CALL
statement for literals only. If
.Ar program
is a non-constant data item, it is always resolved using dynamic
linking, with
.Xr dlsym 3 Ns Li ,
because its value is determined at run time.
.It Fl dialect Ar dialect-name
By default,
.Nm
accepts \*[lang] syntax as defined by \*[isostd], with some
extensions for backward compatibility with COBOL-85. To make the
compiler more generally useful, some additional syntax is supported by
this option.
.Pp
The value of
.Ar dialect-name
may be
.Bl -tag -compact
.It ibm
to indicate IBM COBOL 6.3 syntax, specifically
.D1 STOP <number>.
.It gnu
to indicate GnuCOBOL syntax
.It mf
to indicate MicroFocus syntax, specifically
.Sy LEVEL 78
constants.
.El
.Pp
Only a few such non-standard constructs are accepted, and
.Nm
makes no claim to emulate other compilers. But to the extent that a
feature is popular but nonstandard, this option provides a way to
support it, or add it.
.
.It Fl include Ar filename
Process
.Ar filename
as if
.D1 COPY Dq Ar filename
appeared as the first line of
the primary source file. If
.Ar filename
is not an absolute path, the directory searched is the current working
directory, not the directory containing the main source file. The
name is used verbatim. No permutations are applied, and no
directories searched.
.Pp
If multiple
.Fl include
options are given, the files are included in
the order they appear on the command line.
.
.It Fl preprocess Ar preprocess-filter
After all CDF text-manipulation has been applied, and before the
prepared \*[lang] is sent to the
.Sy cobol1
compiler, the input may be
further altered by one or more filters. In the tradition of
.Xr sed 1 ,
each
.Ar preprocess-filter
reads from standard input and writes to standard output.
.Pp
To supply options to
.Ar preprocess-filter ,
use a comma-separated string, similar to how linker options are supplied to
.Fl Sy Wl .
(Do not put any spaces after the commas, because the shell will treat it as an option separator.)
.Nm
replaces each comma with a space when
.Ar preprocess-filter
is invoked. For example,
.D1 Fl preprocess Li tee,output.cbl
invokes
.Xr tee 1
with the output filename argument
.Pa output.cbl ,
causing a copy of the input to be written to the file.
.Pp
.Nm
searches the current working directory and the PATH environment
variable directories for an executable file whose name matches
.Ar preprocess-filter .
The first one found is used. If none is found, an error is reported
and the compiler is not invoked.
.Pp
The
.Fl preprocess
option may appear more than once on the command line. Each
.Ar preprocess-filter
is applied in turn, in order of appearance.
.Pp
The
.Ar preprocess-filter
should return a zero exit status, indicating success. If it returns a
nonzero exit status, an error is reported and the compiler is not
invoked.
.
.It Fl fflex-debug Ns Li , Fl fyacc-debug
produce messages useful for compiler development. The
.Fl fflex-debug
option prints the tokenized input stream. The
.Fl fyacc-debug
option shows the shift and reduce actions taken by the parser.
.El
.
.Sh COMPILATION SCENARIOS
.D1 gcobol Ar xyz.cob
.D1 gcobol -main Ar xyz.cob
.D1 gcobol -main= Ns Ar xyz.cob Ar xyz.cob
These are equivalent. The
.Ar xyz.cob
code is compiled and a
.Fn main
function is
inserted that calls the first PROGRAM-ID in the
.Ar xyz.cob
source file.
.Pp
.D1 gcobol -nomain Ar xyz.cob Ar elsewhere.o
The
.Fl nomain
option prevents a
.Fn main
function from being generated by the gcobol compiler.
A
.Fn main
entry point must be present in the
.Ar elsewhere.o
module; without it the
linker will report a
.Dq "missing main"
error.
.Pp
.D1 gcobol Ar aaa.cob Ar bbb.cob Ar ccc.cob
.D1 gcobol -main Ar aaa.cob Ar bbb.cob Ar ccc.cob
The two commands are equivalent. The three source code modules are compiled and
linked together along with a generated
.Fn main
function that calls the first
PROGRAM-ID in the
.Ar aaa.cob
module.
.Pp
.D1 gcobol Ar aaa.cob Ar bbb.cob Fl main Ar ccc.cob
.D1 gcobol -main Ns = Ns Ar ccc.cob Ar aaa.cob Ar bbb.cob Ar ccc.cob
These two commands have the same result: An
.Ar a.out
executable is created that
starts executing at the first PROGRAM-ID in
.Ar ccc.cob .
.Pp
.D1 gcobol -main Ns = Ns Ar bbb.cob:b-entry Ar aaa.cob Ar bbb.cob Ar ccc.cob
An
.Ar a.out
executable is created that starts executing at the PROGRAM-ID
.Ar "b-entry" .
.Pp
.D1 gcobol -c Ar aaa.cob
.D1 gcobol -c -main Ar bbb.cob
.D1 gcobol -c Ar ccc.cob
.D1 gcobol Ar aaa.o Ar bbb.o Ar ccc.o
The first three commands each create a .o file. The
.Ar bbb.o
file will contain a
.Fn main
entry point that calls the first PROGRAM-ID in
.Ar bbb .
The fourth links the three .o files into an
.Ar a.out .
.
.Sh EBCDIC
The
.Fl finternal-ebcdic
option is useful when working with mainframe \*[lang] programs intended
for EBCDIC-encoded files. With this option, while the \*[lang] text
remains in ASCII, the character literals and field initial values
produce EBCDIC strings in the compiled binary, and any character data
read from a file are interpreted as EBCDIC data. The file data are
not
.Em converted ;
rather, the file is assumed to use EBCDIC representation. String
literals in the \*[lang] text
.Em are
converted, so that they can be compared meaningfully with data in the file.
.Pp
Only file data and character literals are affected. Data read from
and written to the environment, or taken from the command line, are
interpreted according the
.Xr locale 7
in force during execution. The same is true of
.Sy ACCEPT
and
.Sy DISPLAY .
Names known to the operating system, such as file names and the names
of environment variables, are processed verbatim.
.Pp
At the present time, this is an all-or-nothing setting. Support for
.Sy USAGE
and
.Sy CODESET ,
which would allow conversion between encodings, remains a future goal.
.Pp
See also
.Sx "Feature-set Variables" ,
below.
.
.Sh REDEFINES ... USAGE POINTER
Per ISO, an item that
.Sy REDEFINES
another may not be larger than the item it redefines, unless that item
has LEVEL 01 and is not EXTERNAL. In
.Nm ,
using
.Fl dialect Ar ibm ,
this rule is relaxed for
.Sy REDEFINES
with
.Sy USAGE POINTER
whose redefined member is a 4-byte
.Sy USAGE COMP-5
(usually
.Sy PIC S9(8) Ns ),
or vice-versa.
In that case, the redefined member is re-sized to be 8 bytes, to
accommodate the pointer. This feature allows pointer arithmetic on a
64-bit system with source code targeted at a 32-bit system.
.Pp
See also
.Sx "Feature-set Variables" ,
below.
.
.Sh IMPLEMENTATION NOTES
.Nm
is a gcc compiler, and follows gcc conventions where applicable.
Sometimes those conventions (and user expectations) conflict with
common Mainframe practice. Unless required of the compiler by the ISO
specification, any such conflicts are resolved in favor of gcc.
.Ss Linking
Unlike, C, the \*[lang]
.Sy CALL
statement implies dynamic linking, because for
.D1 Sy CALL Ar program
.Ar program
can be a variable whose value is determined at runtime.
However, the parameter may also be compile-time constant, either an
alphanumeric literal, or a
.Sy CONSTANT
data item.
.Pp
.Nm
supports static linking where possible, unless defeated by
.Fl fno-static-call .
If the parameter value is known at compile time, the compiler produces
an external reference to be resolved by the linker. The referenced
program is normally supplied via an object module, a static library,
or a shared object. If it is not supplied, the linker will report an
.Dq "unresolved symbol"
error, either at build time or, if using a shared object, when the
program is executed. This feature informs the programmer of the error
at the earliest opportunity.
.Pp
Programs that are expected to execute
correctly in the presence of an unresolved symbol (perhaps because the
program logic won't require that particular
.Sy CALL )
can use the
.Fl no-static-call
option. That forces all
.Sy CALL
statements to be resolved dynamically, at runtime.
.ig
Programs that are expected to execute
correctly in the presence of an unresolved symbol (perhaps because the
program logic won't require that particular
.Sy CALL )
can use linker options to produce an executable anyway.
.Pp
One corner case yet remains. The
.Sy CALL
statement includes an
.Sy "ON ERROR"
clause whose purpose is to handle errors arising when the called program is not found.
Control is transferred to the
.Sy "ON ERROR"
clause when the
.Sy EC-PROGRAM-NOT-FOUND
exception condition is raised. That exception condition is not raised in
.Nm
when:
.Bl -bullet -compact
.It
the
.Sy CALL
parameter
is known at compile time, i.e., is an alphanumeric literal or
.Sy CONSTANT
data item, and
.It
the executable was generated with the linker option to ignore unresolved symbols.
.El
In that case, the program is terminated with a signal. No recovery with
.Sy "ON ERROR"
is possible.
.Pp
Should your program meet those particular conditions, all is not lost.
There are workarounds, and an option could be added to use dynamic
linking for all
.Sy CALL
statement, regardless of compile-time constants.
..
.
.Ss Implemented Exception Conditions
By default, per ISO, no EC is enabled. Implemented ECs may be enabled
on the command line or via the
.Sy TURN
directive. Any attempt to enable an EC that is not implemented is
treated as an error.
.Pp
An enabled EC not handled by a
.Sy DECLARATIVE
is written to the system log and to standard error. (The authors
intend to make that an option.) A fatal EC not handled with
.Sy RESUME
ends with a call to
.Xr abort 3
and process termination.
.Pp
Not all Exception Conditions are implemented. Any attempt to enable
an EC that that is not implemented produces a warning message.
The following are implemented:
.Pp
.Bl -tag -offset 5n -compact
.It EC-FUNCTION-ARGUMENT
for the following functions:
.Bl -item -compact
.It
ACOS
.It
ANNUITY
.It
ASIN
.It
LOG
.It
LOG10
.It
PRESENT-VALUE
.It
SQRT
.El
.It EC-SORT-MERGE-FILE-OPEN
.It EC-BOUND-SUBSCRIPT
subscript not an integer, less than 1, or greater than occurs
.It EC-BOUND-REF-MOD
refmod start not an integer, start less than 1, start greater than
variable size, length not an integer, length less than 1, and
start+length exceeds variable size
.It EC-BOUND-ODO
DEPENDING not an integer, greater than occurs upper limit,
less than occurs lower limit, and subscript greater than DEPENDING for sending item
.It EC-SIZE-ZERO-DIVIDE
for both fixed-point and floating-point division
.It EC-SIZE-TRUNCATION
.It EC-SIZE-EXPONENTIATION
.El
.Pp
As of this writing, no \*[lang] compiler documents a complete
implementation of \*[isostd] Exception Conditions.
.Nm
will give priority to those ECs that the user community deems most
valuable.
.
.Sh EXTENSIONS TO ISO \*[lang]
Standard \*[lang] has no provision for environment variables as defined
by Unix and Windows, or command-line arguments.
.Nm
supports them using syntax similar to that of GnuCOBOL. ISO and IBM
also define incompatible ways to return the program's exit status to
the operating system.
.Nm
supports IBM syntax.
.
.Ss Environment Variables
To read an environment variable:
.Pp
.D1 ACCEPT Ar target Li FROM ENVIRONMENT Ar envar
.Pp
where
.Ar target
is a data item defined in
.Sy "DATA DIVISION" ,
and
.Ar envar
names an environment variable.
.Ar envar
may be a string literal or alphanumeric data item whose value is the
name of an environment variable. The value of the named environment
variable is moved to
.Ar target .
The rules are the same as for
.Sy MOVE .
.Pp
To write an environment variable:
.Pp
.D1 SET ENVIRONMENT Ar envar Li TO Ar source
.Pp
where
.Ar source
is a data item defined in
.Sy DATA DIVISION ,
and
.Ar envar
names an environment variable.
.Ar envar
again may be a string literal or alphanumeric data item whose value is the
name of an environment variable. The value of the named environment
variable is set to the value of
.Ar source .
.
.Ss Command-line Arguments
To read command-line arguments, use the registers
.Sy COMMAND-LINE
and
.Sy COMMAND-LINE-COUNT
in an
.Sy ACCEPT
statement (only).
Used without a subscript,
.Sy COMMAND-LINE
returns the whole command line as a single string. With a subscript,
.Sy COMMAND-LINE
is a table of command-line arguments. For example, if the
program is invoked as
.sp
.D1 Sy ./program Fl i Ar input Ar output
.sp
then
.sp
.D1 ACCEPT target FROM COMMAND-LINE(3)
.sp
moves
.Ar input
into
.Ar target .
The program name is the first thing in the whole command line and is
found in COMMAND-LINE(1)
.Sy COMMAND-LINE
table.
.Pp
To discover how many arguments were provided on the command line, use
.sp
.D1 ACCEPT Ar target Li FROM COMMAND-LINE-COUNT
.sp
If
.Sy ACCEPT
refers to a nonexistent environment variable or command-line
argument, the target is set to
.Sy LOW-VALUES .
.Pp
The system command line parameters can also be accessed through the LINKAGE
SECTION in the program where execution starts. The data structure looks like
this:
.Bd -literal
linkage section.
01 argc pic 999.
01 argv.
02 argv-table occurs 1 to 100 times depending on argc.
03 argv-element pointer.
01 argv-string pic x(100) .
.Ed
and the code to access the third parameter looks like this
.Bd -literal
procedure division using by value argc by reference argv.
set address of argv-string to argv-element(3)
display argv-string
.Ed
.
.Ss #line directive
The parser accepts lines in the form
.D1 #line Ar lineno Dq Ar filename Ns .
The effect is to set the current line number to
.Ar lineno
and the current input filename to
.Ar filename .
Preprocessors may use this directive to control the filename and line
numbers reported in error messages and in the debugger.
.
.Ss SELECT ... ASSIGN TO
In the phrase
.sp
.D1 ASSIGN TO Ar filename
.sp
.Ar filename
may appear in quotes or not. If quoted, it represents a filename as
known to the operating system. If unquoted, it names either a data
element or an environment variable containing the name of a file.
If
.Ar filename
matches the name of a data element, that element is used. If not,
resolution of
.Ar filename
is deferred until runtime, when the name must appear in the program's
environment.
.
.Sh ISO \*[lang] Implementation Status
.Ss USAGE Data Types
.Nm
supports the following
.Sy USAGE IS
clauses:
.Bl -tag -compact -width POINTER\0
.It Sy INDEX
for use as an index in a table.
.It Sy POINTER
for variables whose value is the address of an external function,
.Sy PROGRAM-ID ,
or data item. Assignment is via the
.Sy SET
statement.
.It Sy BINARY, Sy COMP , Sy COMPUTATIONAL, Sy COMP-4, Sy COMPUTATIONAL-4
big-endian integer, 1 to 16 bytes, per PICTURE.
.It Sy COMP-1 , Sy COMPUTATIONAL-1 , Sy FLOAT-BINARY-32
IEEE 754 single-precision (4-byte) floating point, as provided by the
hardware.
.It Sy COMP-2 , Sy COMPUTATIONAL-2 , Sy FLOAT-BINARY-64
IEEE 754 double-precision (8-byte) floating point, as provided by the
hardware.
.It Sy COMP-3 , Sy COMPUTATIONAL-3, Sy PACKED-DECIMAL
currently unimplemented.
.It Sy COMP-5 , Sy COMPUTATIONAL-5
little-endian integer, 1 to 16 bytes, per
.Sy PICTURE.
.It Sy FLOAT-BINARY-128 , FLOAT-EXTENDED
implements 128-bit floating point, per IEEE 754.
.El
.Pp
.Nm
supports ISO integer
.Sy BINARY-<type>
types, most of which alias
.Sy COMP-5.
.
.hw unsigned
.sp
.TS
LB LB LB LB
LB LB LB LB
L L L L .
COMP-5 Compatible
Picture BINARY Type Bytes Value
T{
BINARY-CHAR [UNSIGNED]
T} 1 0 \(em 256
S9(1...4) T{
BINARY-CHAR SIGNED
T} 1 -128 \(em +127
\09(1...4) T{
BINARY-SHORT [UNSIGNED]
T} 2 0 \(em 65535
S9(1...4) T{
BINARY-SHORT SIGNED
T} 2 -32768 \(em +32767
\09(5...9) T{
BINARY-LONG [UNSIGNED]
T} 4 0 \(em 4,294,967,295
S9(5...9) T{
BINARY-LONG SIGNED
T} 4 T{
-2,147,483,648 \(em +2,147,483,647
T}
\09(10...18) T{
BINARY-LONG-LONG [UNSIGNED]
T} 8 T{
0 \(em 18,446,744,073,709,551,615
T}
S9(10...18) T{
BINARY-LONG-LONG SIGNED
T} 8 T{
-9,223,372,036,854,775,808 \(em +9,223,372,036,854,775,807
T}
.TE
.Pp
These define a size (in bytes) and cannot be
used with a
.Sy PICTURE
clause.
Per the ISO standard,
.Sy SIGNED
is the default for the
.Sy "BINARY-" Ns Ar type
aliases.
.Pp
All computation \(em both integer and floating point \(em is done
using 128-bit intermediate forms.
.
.Ss Environment Names
In
.Nm
.sp
.Dl DISPLAY UPON
.sp
maps
.Sy SYSOUT
and
.Sy STDOUT
to standard output, and
.Sy SYSPUNCH ,
.Sy SYSPCH
and
.Sy STDERR
to standard error.
.
.Ss Exit Status
.Nm
supports the ISO syntax for returning an exit status to the operating system,
.Pp
.D1 STOP RUN Oo WITH Oc Bro NORMAL | ERROR Brc Oo STATUS Oc Ar status
.Pp
In addition,
.Nm
also supports the IBM syntax for returning an exit status to
the operating system. Use the
.Sy RETURN-CODE
register:
.Bd -literal -offset indent
MOVE ZERO TO RETURN-CODE.
GOBACK.
.Ed
.Pp
The
.Sy RETURN-CODE
register is defined as a 4-byte binary integer.
.ig
.Pp
The ISO standard supports an extended form of
.Sy GOBACK :
.Pp
.D1 GOBACK {ERROR | NORMAL} WITH Ar status
.Pp
where
.Ar status
is a numeric data item or literal. This syntax has the same effect as:
.Bd -literal -offset indent
MOVE status TO RETURN-CODE.
GOBACK.
.Ed
The use of
.Sy ERROR
or
.Sy NORMAL
has no effect; the two are interchangeable.
..
.
.Sh COMPILER-DIRECTING FACILITY
The CDF should be used with caution because no comprehensive test
suite has been identified.
.
.Ss CDF Text Manipulation
.Bl -tag -width >>DEFINE
.It Sy COPY Ar copybook Li Oo OF|BY Ar library Oc Oo Sy REPLACING ... Oc
If
.Ar copybook
is a literal, it treated a literal filename, which either does or does not exist. If
.Ar copybook
is a \*[lang] word,
.Nm
looks first for an environment variable named
.Va copybook
and, if found, uses the contents of that variable as the name of the
copybook file. If that file does not exist, it continues looking for
a file named one of:
.sp
.Bl -bullet -compact -offset 5n
.It
.Pa copybook
(literally)
.It
.Pa copybook.cpy
.It
.Pa copybook.CPY
.It
.Pa copybook.cbl
.It
.Pa copybook.CBL
.It
.Pa copybook.cob
.It
.Pa copybook.COB
.El
.sp
in that order. It looks first in the same directory as the source
code file, and then in any
.Ar copybook-path
named with the
.Fl I
option.
.
.\" FIXME: need escape mechanism for directories with ':' in the name.
.Ar copybook-path
may (like the shell's
.Ev PATH
variable) be a colon-separated list.
The
.Fl I
option may occur multiple times on the command line. Each successive
.Ar copybook-path
is concatenated to previous ones.
Relative paths (having no leading
.Ql / Ns
\&)
are searched relative to the compiler's current working directory.
.Pp
For example,
.D1 \&
.D1 Fl I Li /usr/local/include:include
.D1 \&
searches first the directory where the \*[lang] program is found, next in
.Pa /usr/local/include ,
and finally in an
.Pa include
subdirectory of the directory from which
.Nm
was invoked.
.Pp
For the
.Sy REPLACING
phrase, both the modern pseudo-text and the \*[lang]/85 forms are
recognized. (The older forms are used in the NIST CCVS/85 test suite.)
.It Sy REPLACE ...
.Nm
supports the full ISO
.Sy REPLACE
syntax.
.El
.
.Ss CDF Directives
.\"Bl -tag -width >>PROPAGATE
.Bl -tag -width >>DEFINE
.It >> Ns Sy DEFINE Ar name Sy AS Bro Ar expression Li | Sy PARAMETER Brc Op Sy OVERRIDE
Define
.Ar name
as a compilation variable to have the value
.Ar expression .
If
.Ar name
was previously defined,
.Sy OVERRIDE
is required, else the directive is invalid.
.Sy AS PARAMETER
is accepted, but has no effect in
.Nm .
.
.It >> Ns Sy DEFINE Ar name AS Sy OFF
releases the definition
.Ar name ,
making it subsequently invalid for use.
.\" ISO requires AS; cdf.y does not.
.
.It >> Ns Sy IF Ar cce Ar text Oo >> Ns Sy ELSE Ar alt-text Oc Li >> Ns Sy END-IF
evaluates
.Ar cce ,
a
.Em "constant conditional expression\/" ,
for conditional compilation.
If a name,
.Ar cce
may be defined with the
.Fl D
command-line parameter. If true, the \*[lang] text
.Ar text
is compiled. If false,
.Ar else-text ,
if present, is compiled.
.Bo Sy IS Bo Sy NOT Bc Bc Sy DEFINED
is supported. Boolean literals are not supported.
.
.It >> Ns Sy EVALUATE
Not implemented.
.It >> Ns Sy CALL-CONVENTION Ar convention
.Ar convention
may be one of:
.Bl -tag -compact
.It Sy \*[lang]
Use standard \*[lang] case-insensitive symbol-name matching. For
.Sy CALL Dq Ar name ,
.Ar name
is rendered by the compiler in lowercase.
.It Sy C
Use case-sensitive symbol-name matching. The
.Sy CALL
target is not changed in any way; it is used verbatim.
.It Sy VERBATIM
An alias for >>\c
.Sy "CALL-CONVENTION C" .
.El
.It >> Ns Sy COBOL-WORDS EQUATE Ar keyword Sy WITH Ar alias
makes
.Ar alias
a synonym for
.Ar keyword .
.It >> Ns Sy COBOL-WORDS UNDEFINE Ar keyword
.Ar keyword
is removed from the \*[lang] grammar. Use of it in a program will provoke
a syntax error from the compiler.
.It >> Ns Sy COBOL-WORDS SUBSTITUTE Ar keyword Sy BY Ar new-word
.Ar keyword
is deleted as a keyword from the grammar, replaced by
.Ar new-word .
.Ar keyword
may thereafter be used as a user-defined word.
.It >> Ns Sy COBOL-WORDS RESERVE Ar new-word
Treat
.Ar new-word
as a \*[lang] keyword. It cannot be used by the program, either as a
keyword or as a user-defined word.
.
.It >> Ns Sy DISPLAY Ar string ...
Write
.Ar string
to standard error as a warning message.
.It >> Ns Sy SOURCE Ar format
.Ar format
may be one of:
.Bl -tag -compact
.It Sy FIXED
Source conforms to \*[lang]
.Em "fixed-form reference format"
with unlimited line length.
.It Sy FREE
Source conforms to \*[lang]
.Em "free-form reference format".
.Ql "*"
at the beginning of a line is recognized as a comment.
.El
.El
.Pp
.Bl -tag -width >>PROPAGATE -compact
.It >> Ns Sy FLAG-02
Not implemented.
.It >> Ns Sy FLAG-85
Not implemented.
.It >> Ns Sy FLAG-NATIVE-ARITHMETIC
Not implemented.
.It >> Ns Sy LEAP-SECOND
Not implemented.
.It >> Ns Sy LISTING
Not implemented.
.It >> Ns Sy PAGE
Not implemented.
.It >> Ns Sy PROPAGATE
Not implemented.
.It >> Ns Sy PUSH Ar directive
.It >> Ns Sy POP Ar directive
With
.Sy PUSH ,
push CDF state onto a stack.
With
.Sy POP ,
return to the prior pushed state.
.Ar directive
may be one of
.Bl -tag -compact
.It Sy CALL-CONVENTION
.It Sy COBOL-WORDS
.It Sy DEFINE
.It Sy SOURCE FORMAT
.It Sy TURN
.El
.
.It >> Ns Sy TURN Oo
.Ar ec Oo Ar file Li ... Oc ...
.Oc Sy CHECKING Bro Oo Sy ON Oc Oo Oo Sy WITH Oc Sy LOCATION Oc | Sy OFF Brc
Enable (or, with
.Sy OFF ,
disable) exception condition
.Ar ec
optionally associated with the file connectors
.Ar file .
If
.Sy LOCATION
is specified,
.Nm
reports at runtime the source filename and line number of the
statement that triggered the exception condition.
.El
.
.Ss Feature-set Variables
Some command-line options affect CDF
.Em "feature-set"
variables that are special to
.Nm .
They can be set and tested using
.Sy >>DEFINE
and
.Sy >>IF ,
and are distinguished by a leading
.Ql \&%
in the name, which is otherwise invalid in a \*[lang] identifier:
.Pp
.Bl -tag -compact
.It Sy %EBCDIC-MODE
is set by
.Fl finternal-ebcdic .
.It Sy %64-BIT-POINTER
is implied by
.Fl "dialect ibm" .
.El
.Pp
To set a feature-set variable, use
.Dl >>SET Ar feature Li [AS] {ON | OFF}
If
.Ar feature
is
.Sy %EBCDIC-MODE ,
the directive must appear before
.Sy PROGRAM-ID .
.Pp
To test a feature-set variable, use
.Dl >>IF Ar feature Li DEFINED
.
.Ss Intrinsic functions
.Nm
implements all intrinsic functions defined by \*[isostd], plus a few
others. They are listed alphabetically below.
.Bl -item -compact
.It
ABS ACOS ANNUITY ASIN ATAN
.It
BASECONVERT BIT-OF BIT-TO-CHAR BOOLEAN-OF-INTEGER BYTE-LENGTH
.It
CHAR CHAR-NATIONAL COMBINED-DATETIME CONCAT CONVERT COS CURRENT-DATE
.It
DATE-OF-INTEGER DATE-TO-YYYYMMDD DAY-OF-INTEGER DAY-TO-YYYYDDD DISPLAY-OF
.It
E EXCEPTION-FILE
EXCEPTION-FILE-N EXCEPTION-LOCATION EXCEPTION-LOCATION-N
EXCEPTION-STATEMENT EXCEPTION-STATUS EXP EXP10
.It
FACTORIAL FIND-STRING
FORMATTED-CURRENT-DATE FORMATTED-DATE FORMATTED-DATETIME
FORMATTED-TIME FRACTION-PART
.It
HEX-OF HEX-TO-CHAR HIGHEST-ALGEBRAIC
.It
INTEGER INTEGER-OF-BOOLEAN INTEGER-OF-DATE INTEGER-OF-DAY
INTEGER-OF-FORMATTED-DATE INTEGER-PART
.It
LENGTH LOCALE-COMPARE
LOCALE-DATE LOCALE-TIME LOCALE-TIME-FROM-SECONDS LOG LOG10 LOWER-CASE
LOWEST-ALGEBRAIC
.It
MAX MEAN MEDIAN MIDRANGE MIN MOD MODULE-NAME
.It
NATIONAL-OF NUMVAL NUMVAL-C NUMVAL-F ORD
.It
ORD-MAX ORD-MIN
.It
PI PRESENT-VALUE
.It
RANDOM RANGE REM REVERSE
.It
SECONDS-FROM-FORMATTED-TIME
SECONDS-PAST-MIDNIGHT SIGN SIN SMALLEST-ALGEBRAIC SQRT
STANDARD-COMPARE STANDARD-DEVIATION SUBSTITUTE SUM
.It
TAN TEST-DATE-YYYYMMDD TEST-DAY-YYYYDDD TEST-FORMATTED-DATETIME
TEST-NUMVAL TEST-NUMVAL-C TEST-NUMVAL-F TRIM
.It
ULENGTH UPOS UPPER-CASE
USUBSTR USUPPLEMENTARY UUID4 UVALID UWIDTH
.It
VARIANCE
.It
WHEN-COMPILED
.It
YEAR-TO-YYYY
.El
.
.Ss Binary floating point DISPLAY
How the DISPLAY presents binary floating point numbers depends on the value.
.Pp
When a value has six or fewer decimal digits to the left of the
decimal point, it is expressed as
.Em 123456.789... .
.Pp
When a value is less than 1 and has no more than three zeroes to the
right of the decimal point, it is expressed as
.Em 0.0001234... .
.Pp
Otherwise, exponential notation is used:
.Em 1.23456E+7 .
.Pp
In all cases, trailing zeroes on the right of the number are removed
from the displayed value.
.Pp
.Bl -tag -compact -width FLOAT-EXTENDED
.It COMP-1
displayed with 9 decimal digits.
.It COMP-2
displayed with 17 decimal digits.
.It FLOAT-EXTENDED
displayed with 36 decimal digits.
.El
.Pp
Those digit counts are consistent with the IEEE 754 requirements for
information interchange. As one example, the description for COMP-2
binary64 values (per Wikipedia).
.Pp
If an IEEE 754 double-precision number is converted to a decimal
string with at least 17 significant digits, and then converted back to
double-precision representation, the final result must match the
original number.
.Pp
17 digits was chosen so that the
.Sy DISPLAY
statement shows the contents
of a COMP-2 variable without hiding any information.
.
.Ss Binary floating point MOVE
During a
.Sy MOVE
statement, a floating-point value may be truncated. It will not be
unusual for Numeric Display values to be altered when moved through a
floating-point value.
.Pp
This program:
.Bd -literal
01 PICV999 PIC 9999V999.
01 COMP2 COMP-2.
PROCEDURE DIVISION.
MOVE 1.001 to PICV999
MOVE PICV999 TO COMP2
DISPLAY "The result of MOVE " PICV999 " TO COMP2 is " COMP2
MOVE COMP2 to PICV999
DISPLAY "The result of MOVE COMP2 TO PICV999 is " PICV999
.Ed
.Pp
generates this result:
.Bd -literal
The result of MOVE 0001.001 TO COMP2 is 1.00099999999999989
The result of MOVE COMP2 TO PICV999 is 0001.000
.Ed
.Pp
However, the internal implementation can produce results that might be seem surprising:
.Bd -literal
The result of MOVE 0055.110 TO COMP2 is 55.1099999999999994
The result of MOVE COMP2 TO PICV999 is 0055.110
.Ed
.Pp
The source of this inconsistency is the way
.Nm
stores and converts
numbers. Converting the floating-point value to the numeric display
value 0055110 is done by multiplying 55.109999...\& by 1,000 and then
truncating the result to an integer. And it turns out that even
though 55.11 can't be represented in floating-point as an exact value,
the product of the multiplication, 55110, is an exact value.
.Pp
In cases where it is important for conversions to have predictable
results, we need to be able to apply rounding, which can be done with
an arithmetic statement:
.Bd -literal
MOVE 1.001 to PICV999
MOVE PICV999 TO COMP2
DISPLAY "The result of MOVE " PICV999 " TO COMP2 is " COMP2
MOVE COMP2 to PICV999
DISPLAY "The result of MOVE COMP2 TO PICV999 is " PICV999
ADD COMP2 to ZERO GIVING PICV999 ROUNDED
DISPLAY "The result of ADD COMP2 to ZERO GIVING PICV999 ROUNDED is " PICV999
.sp
The result of MOVE 0001.001 TO COMP2 is 1.00099999999999989
The result of MOVE COMP2 TO PICV999 is 0001.000
The result of ADD COMP2 to ZERO GIVING PICV999 ROUNDED is 0001.001
.Ed
.Ss Binary floating point computation
.Nm
attempts to do internal computations using binary integers when
possible. Thus, simple arithmetic between binary values and numeric
display values conclude with binary intermediate results.
.Pp
If a floating-point value gets included in the mix of variables
specified for a calculation, then the intermediate result becomes a
128-bit floating-point value.
.
.Ss A warning about binary floating point comparison
The cardinal rule when doing comparisons involving floating-point
values: Never, ever, test for equality. It's just not worth the hassle.
.Pp
For example:
.Bd -literal
WORKING-STORAGE SECTION.
01 COMP1 COMP-1 VALUE 555.11.
01 COMP2 COMP-2 VALUE 555.11.
PROCEDURE DIVISION.
DISPLAY "COMPARE " COMP1 " with " COMP2
IF COMP1 EQUAL COMP2 DISPLAY "Equal" ELSE DISPLAY "Not equal" END-IF
.sp
MOVE COMP1 to COMP2
DISPLAY "COMPARE " COMP1 " with " COMP2
IF COMP1 EQUAL COMP2 DISPLAY "Equal" ELSE DISPLAY "Not equal" END-IF
.Ed
.Pp
the results:
.Bd -literal
COMPARE 555.1099854 with 555.110000000000014
Not equal
COMPARE 555.1099854 with 555.1099853515625
Equal
.Ed
.Pp
Why? Again, it has to do with the internals of
.Nm .
When differently sized floating-point values need to be compared, they
are first converted to 128-bit floats. And it turns out that when a
COMP1 is moved to a COMP2, and they are both converted to
FLOAT-EXTENDED, the two resulting values are (probably) equal.
.Pp
Avoid testing for equality unless you really know what you are doing
and you really test the code. And then avoid it anyway.
.Pp
Finally, it is observably the case that the
.Nm
implementations of floating-point conversions and comparisons don't
precisely match the behavior of other \*[lang] compilers.
.Pp
You have been warned.
.
.Sh ENVIRONMENT
.Bl -tag -width COBPATH
.It Ev COBPATH
If defined, specifies the directory paths to be used by the
.Nm
runtime library,
.Pa libgcobol.so ,
to locate shared objects.
Like
.Ev LD_LIBRARY_PATH ,
it may contain several directory names separated by a colon
.Pq Ql \&: .
.Ev COBPATH
is searched first, followed by
.Ev LD_LIBRARY_PATH .
.Pp
Each directory is searched for files whose name ends in
.Ql ".so" .
For each such file,
.Xr dlopen 3
is attempted, and, if successful
.Xr dlsym 3 .
No relationship is defined between the symbol's name and the filename.
.Pp
Without
.Ev COBPATH ,
binaries produced by
.Nm
behave as one might expect of any program compiled with gcc. Any
shared objects needed by the program are mentioned on the command line
with a
.Fl l Ns Ar library
option, and are found by following the executable's
.Pa RPATH
or otherwise per the configuration of the runtime linker,
.Xr ld.so 8 .
.
.It Ev UPSI
\*[lang] defines a User Programmable Status Indicator (UPSI) switch. In
.Nm ,
the settings are denoted
.Sy UPSI-0
through
.Sy UPSI-7 ,
where 0-7 indicates a bit position. The value of the UPSI switches is
taken from the
.Ev UPSI
environment variable, whose value is a string of up to eight 1's and
0's. The first character represents the value of
.Sy UPSI-0 ,
and missing values are assigned 0. For example,
.Sy UPSI=1000011
in the environment sets bits 0, 5, and 6 on, which means that
.Sy UPSI-0 ,
.Sy UPSI-5 ,
and
.Sy UPSI-6
are on.
.It Ev GCOBOL_TEMPDIR
causes any temporary files created during CDF processing to be written
to a file whose name is specified in the value of
.Ev GCOBOL_TEMPDIR .
If the value is just
.Dq / ,
the effect is different: each copybook read is reported on standard
error. This feature is meant to help diagnose mysterious copybook
errors.
.El
.
.Sh FILES
Executables produced by
.Nm
require the runtime support library
.Pa libgcobol ,
which is provided both as a static library and as a shared object.
.
.\" .Sh DIAGNOSTICS
.
.Sh COMPATIBILITY
The ISO standard leaves the default file organization up to the implementation; in
.Nm ,
the default is
.Sy "SEQUENTIAL" .
.
.Ss On-Disk Format
Any ability to use files produced by other \*[lang] compilers, or for
those compilers to use files produced by
.Nm ,
is the product of luck and intuition. Various compilers interpret the
ISO standard differently, and the standard's text is
not always definitive.
.Pp
For
.Sy "ORGANIZATION IS LINE SEQUENTIAL"
files (explicitly or by default),
.Nm ,
absent specific direction, produces an ordinary Linux text file: for
each WRITE, the data are written, followed by an ASCII NL (hex 0A)
character. On READ, the record is read up to the size of the
specified record or NL, whichever comes first. The NL is not included
in the data brought into the record buffer; it serves only as an
on-disk record-termination marker. Consequently,
.Sy SEQUENTIAL
and
.Sy "LINE SEQUENTIAL"
files work the same way: the \*[lang] program never sees the record
terminator.
.Pp
When
.Sy READ
and
.Sy WRITE
are used with
.Sy ADVANCING ,
however, the game changes. If
.Sy ADVANCING
is used with
.Sy "LINE SEQUENTIAL"
files,
it is honored by
.Nm .
.Pp
Other compilers may not do likewise.
According to ISO, in
.Sy WRITE
(14.9.47.3 General rules)
.Sy ADVANCING
is
.Em ignored
for files for which
.Dq "the physical file does not support vertical positioning" .
It further states that, in the absence of
.Sy ADVANCING ,
.Sy WRITE
proceeds as if
.Dq "as if the user has specified AFTER ADVANCING 1 LINE" .
Some other implementations interpret that to mean that the first
.Sy WRITE
to a
.Sy "LINE SEQUENTIAL"
file results in a leading NL on the first line, and no trailing NL on
the last line. Some furthermore
.Em prohibit
the use of
.Sy ADVANCING
with
.Sy "LINE SEQUENTIAL"
files.
.
.\" .Sh SEE ALSO
.
.Sh STANDARDS
The reference standard for
.Nm
is \*[isostd].
.Bl -bullet -compact
.It
If
.Nm
compiles code consistent with that standard, the resulting program
should execute correctly; any other result is a bug.
.It
If
.Nm
compiles code that does not comply with that standard, but runs correctly according to some other specification, that represents a non-standard extension. One day, the
.Fl pedantic
option will produce diagnostic messages for such code.
.It
If
.Nm
rejects code consistent with that standard, that represents an aspect
of \*[lang] that is (or is not) on the To Do list. If you would like
to see it compile, please get in touch with the developers.
.El
.
.Ss Status of NIST \*[lang] Compiler Verification Suite
.Bl -tag -compact -width "\0\0100% NC"
.It NC 100%
Nucleus
.It SQ 100%
Sequential I/O
.It RL 100%
Relative I/O
.It IX 100%
Indexed I/O
.It IC 100%
Inter-Program Communication
.It ST 100%
Sort-Merge
.It SM 100%
Source Text Manipulation RW \en Report Writer
.It CM
Communication
.It DB to do?
Debug
.It SG
Segmentation
.It IF 100%
Intrinsic Function
.El
.Pp
Where
.Nm
passes 100% of the tests in a module, we exclude the (few) tests for
obsolete features. The authors regard features that were obsolete in
1985 to be well and truly obsolete today, and did not implement them.
.
.Ss Notable deferred features
CCVS-85 modules not marked with above with any status (CM, and SG) are on the
.Dq "hard maybe"
list, meaning they await an interested party with real code using the feature.
.Pp
.Nm
does not implement Report Writer or Screen Section.
.
.Ss Beyond COBOL/85
.Nm
increasingly implements \*[isostd]. For example,
.Sy DECLARATIVES
is not tested by CCVS-85, but are implemented by
.Nm Ns .
Similarly, Exception Conditions were not defined in 1985, and
.Nm
contains a growing number of them.
.Pp
The authors are well aware that a complete, pure \*[lang]-85 compiler
won't compile most existing \*[lang] code. Every vendor offered (and
offers) extensions, and most environments rely on a variety of
preprocessors and ancillary systems defined outside the standard. The
express goal of adding an ISO \*[lang] front-end to GCC is to establish a
foundation on which any needed extensions can be built.
.
.Sh HISTORY
\*[lang], the language, may well be older than the reader. To the
author's knowledge, free \*[lang] compilers first began to appear in 2000.
Around that time an earlier \*[lang] for GCC project
.br
.Lk https://cobolforgcc.sourceforge.net/ cobolforgcc
met with some success, but was never officially merged into GCC.
.Pp
This compiler,
.Nm ,
was begun by
.Lk https://www.cobolworx.com/ COBOLworx
in the fall of 2021. The
project announced a complete implementation of the core language
features in December 2022.
.
.Sh AUTHORS
.Bl -tag -compact
.It "James K. Lowden"
(jklowden@cobolworx.com) is responsible for the parser.
.It "Robert Dubner"
(rdubner@cobolworx.com) is responsible for producing the GIMPLE tree,
which is input to the GCC back-end.
.El
.
.Sh CAVEATS
.Bl -bullet -compact
.It
.Nm
has been tested only on x64 and Apple M1 processors running Linux in
64-bit mode.
.It
The I/O support has not been extensively tested, and does not
implement or emulate many features related to VSAM and other mainframe
subsystems. While LINE-SEQUENTIAL files are ordinary text files that
can be manipulated with standard utilities, INDEXED and RELATIVE files
produced by
.Nm
are not compatible with that of any other \*[lang] compiler. Enhancements
to the I/O support will be readily available to the paying customer.
.El
.
.\" .Sh BUGS