| .. _The_Implementation_of_Standard_I/O: |
| |
| ********************************** |
| The Implementation of Standard I/O |
| ********************************** |
| |
| GNAT implements all the required input-output facilities described in |
| A.6 through A.14. These sections of the Ada Reference Manual describe the |
| required behavior of these packages from the Ada point of view, and if |
| you are writing a portable Ada program that does not need to know the |
| exact manner in which Ada maps to the outside world when it comes to |
| reading or writing external files, then you do not need to read this |
| chapter. As long as your files are all regular files (not pipes or |
| devices), and as long as you write and read the files only from Ada, the |
| description in the Ada Reference Manual is sufficient. |
| |
| However, if you want to do input-output to pipes or other devices, such |
| as the keyboard or screen, or if the files you are dealing with are |
| either generated by some other language, or to be read by some other |
| language, then you need to know more about the details of how the GNAT |
| implementation of these input-output facilities behaves. |
| |
| In this chapter we give a detailed description of exactly how GNAT |
| interfaces to the file system. As always, the sources of the system are |
| available to you for answering questions at an even more detailed level, |
| but for most purposes the information in this chapter will suffice. |
| |
| Another reason that you may need to know more about how input-output is |
| implemented arises when you have a program written in mixed languages |
| where, for example, files are shared between the C and Ada sections of |
| the same program. GNAT provides some additional facilities, in the form |
| of additional child library packages, that facilitate this sharing, and |
| these additional facilities are also described in this chapter. |
| |
| .. _Standard_I/O_Packages: |
| |
| Standard I/O Packages |
| ===================== |
| |
| The Standard I/O packages described in Annex A for |
| |
| * |
| Ada.Text_IO |
| * |
| Ada.Text_IO.Complex_IO |
| * |
| Ada.Text_IO.Text_Streams |
| * |
| Ada.Wide_Text_IO |
| * |
| Ada.Wide_Text_IO.Complex_IO |
| * |
| Ada.Wide_Text_IO.Text_Streams |
| * |
| Ada.Wide_Wide_Text_IO |
| * |
| Ada.Wide_Wide_Text_IO.Complex_IO |
| * |
| Ada.Wide_Wide_Text_IO.Text_Streams |
| * |
| Ada.Stream_IO |
| * |
| Ada.Sequential_IO |
| * |
| Ada.Direct_IO |
| |
| are implemented using the C |
| library streams facility; where |
| |
| * |
| All files are opened using `fopen`. |
| * |
| All input/output operations use `fread`/`fwrite`. |
| |
| There is no internal buffering of any kind at the Ada library level. The only |
| buffering is that provided at the system level in the implementation of the |
| library routines that support streams. This facilitates shared use of these |
| streams by mixed language programs. Note though that system level buffering is |
| explicitly enabled at elaboration of the standard I/O packages and that can |
| have an impact on mixed language programs, in particular those using I/O before |
| calling the Ada elaboration routine (e.g., adainit). It is recommended to call |
| the Ada elaboration routine before performing any I/O or when impractical, |
| flush the common I/O streams and in particular Standard_Output before |
| elaborating the Ada code. |
| |
| .. _FORM_Strings: |
| |
| FORM Strings |
| ============ |
| |
| The format of a FORM string in GNAT is: |
| |
| |
| :: |
| |
| "keyword=value,keyword=value,...,keyword=value" |
| |
| |
| where letters may be in upper or lower case, and there are no spaces |
| between values. The order of the entries is not important. Currently |
| the following keywords defined. |
| |
| |
| :: |
| |
| TEXT_TRANSLATION=[YES|NO|TEXT|BINARY|U8TEXT|WTEXT|U16TEXT] |
| SHARED=[YES|NO] |
| WCEM=[n|h|u|s|e|8|b] |
| ENCODING=[UTF8|8BITS] |
| |
| |
| The use of these parameters is described later in this section. If an |
| unrecognized keyword appears in a form string, it is silently ignored |
| and not considered invalid. |
| |
| .. _Direct_IO: |
| |
| Direct_IO |
| ========= |
| |
| Direct_IO can only be instantiated for definite types. This is a |
| restriction of the Ada language, which means that the records are fixed |
| length (the length being determined by ``type'Size``, rounded |
| up to the next storage unit boundary if necessary). |
| |
| The records of a Direct_IO file are simply written to the file in index |
| sequence, with the first record starting at offset zero, and subsequent |
| records following. There is no control information of any kind. For |
| example, if 32-bit integers are being written, each record takes |
| 4-bytes, so the record at index `K` starts at offset |
| (`K`-1)*4. |
| |
| There is no limit on the size of Direct_IO files, they are expanded as |
| necessary to accommodate whatever records are written to the file. |
| |
| .. _Sequential_IO: |
| |
| Sequential_IO |
| ============= |
| |
| Sequential_IO may be instantiated with either a definite (constrained) |
| or indefinite (unconstrained) type. |
| |
| For the definite type case, the elements written to the file are simply |
| the memory images of the data values with no control information of any |
| kind. The resulting file should be read using the same type, no validity |
| checking is performed on input. |
| |
| For the indefinite type case, the elements written consist of two |
| parts. First is the size of the data item, written as the memory image |
| of a `Interfaces.C.size_t` value, followed by the memory image of |
| the data value. The resulting file can only be read using the same |
| (unconstrained) type. Normal assignment checks are performed on these |
| read operations, and if these checks fail, `Data_Error` is |
| raised. In particular, in the array case, the lengths must match, and in |
| the variant record case, if the variable for a particular read operation |
| is constrained, the discriminants must match. |
| |
| Note that it is not possible to use Sequential_IO to write variable |
| length array items, and then read the data back into different length |
| arrays. For example, the following will raise `Data_Error`: |
| |
| |
| .. code-block:: ada |
| |
| package IO is new Sequential_IO (String); |
| F : IO.File_Type; |
| S : String (1..4); |
| ... |
| IO.Create (F) |
| IO.Write (F, "hello!") |
| IO.Reset (F, Mode=>In_File); |
| IO.Read (F, S); |
| Put_Line (S); |
| |
| |
| |
| On some Ada implementations, this will print `hell`, but the program is |
| clearly incorrect, since there is only one element in the file, and that |
| element is the string `hello!`. |
| |
| In Ada 95 and Ada 2005, this kind of behavior can be legitimately achieved |
| using Stream_IO, and this is the preferred mechanism. In particular, the |
| above program fragment rewritten to use Stream_IO will work correctly. |
| |
| .. _Text_IO: |
| |
| Text_IO |
| ======= |
| |
| Text_IO files consist of a stream of characters containing the following |
| special control characters: |
| |
| |
| :: |
| |
| LF (line feed, 16#0A#) Line Mark |
| FF (form feed, 16#0C#) Page Mark |
| |
| |
| A canonical Text_IO file is defined as one in which the following |
| conditions are met: |
| |
| * |
| The character `LF` is used only as a line mark, i.e., to mark the end |
| of the line. |
| |
| * |
| The character `FF` is used only as a page mark, i.e., to mark the |
| end of a page and consequently can appear only immediately following a |
| `LF` (line mark) character. |
| |
| * |
| The file ends with either `LF` (line mark) or `LF`-`FF` |
| (line mark, page mark). In the former case, the page mark is implicitly |
| assumed to be present. |
| |
| A file written using Text_IO will be in canonical form provided that no |
| explicit `LF` or `FF` characters are written using `Put` |
| or `Put_Line`. There will be no `FF` character at the end of |
| the file unless an explicit `New_Page` operation was performed |
| before closing the file. |
| |
| A canonical Text_IO file that is a regular file (i.e., not a device or a |
| pipe) can be read using any of the routines in Text_IO. The |
| semantics in this case will be exactly as defined in the Ada Reference |
| Manual, and all the routines in Text_IO are fully implemented. |
| |
| A text file that does not meet the requirements for a canonical Text_IO |
| file has one of the following: |
| |
| * |
| The file contains `FF` characters not immediately following a |
| `LF` character. |
| |
| * |
| The file contains `LF` or `FF` characters written by |
| `Put` or `Put_Line`, which are not logically considered to be |
| line marks or page marks. |
| |
| * |
| The file ends in a character other than `LF` or `FF`, |
| i.e., there is no explicit line mark or page mark at the end of the file. |
| |
| Text_IO can be used to read such non-standard text files but subprograms |
| to do with line or page numbers do not have defined meanings. In |
| particular, a `FF` character that does not follow a `LF` |
| character may or may not be treated as a page mark from the point of |
| view of page and line numbering. Every `LF` character is considered |
| to end a line, and there is an implied `LF` character at the end of |
| the file. |
| |
| .. _Stream_Pointer_Positioning: |
| |
| Stream Pointer Positioning |
| -------------------------- |
| |
| `Ada.Text_IO` has a definition of current position for a file that |
| is being read. No internal buffering occurs in Text_IO, and usually the |
| physical position in the stream used to implement the file corresponds |
| to this logical position defined by Text_IO. There are two exceptions: |
| |
| * |
| After a call to `End_Of_Page` that returns `True`, the stream |
| is positioned past the `LF` (line mark) that precedes the page |
| mark. Text_IO maintains an internal flag so that subsequent read |
| operations properly handle the logical position which is unchanged by |
| the `End_Of_Page` call. |
| |
| * |
| After a call to `End_Of_File` that returns `True`, if the |
| Text_IO file was positioned before the line mark at the end of file |
| before the call, then the logical position is unchanged, but the stream |
| is physically positioned right at the end of file (past the line mark, |
| and past a possible page mark following the line mark. Again Text_IO |
| maintains internal flags so that subsequent read operations properly |
| handle the logical position. |
| |
| These discrepancies have no effect on the observable behavior of |
| Text_IO, but if a single Ada stream is shared between a C program and |
| Ada program, or shared (using ``shared=yes`` in the form string) |
| between two Ada files, then the difference may be observable in some |
| situations. |
| |
| .. _Reading_and_Writing_Non-Regular_Files: |
| |
| Reading and Writing Non-Regular Files |
| ------------------------------------- |
| |
| A non-regular file is a device (such as a keyboard), or a pipe. Text_IO |
| can be used for reading and writing. Writing is not affected and the |
| sequence of characters output is identical to the normal file case, but |
| for reading, the behavior of Text_IO is modified to avoid undesirable |
| look-ahead as follows: |
| |
| An input file that is not a regular file is considered to have no page |
| marks. Any `Ascii.FF` characters (the character normally used for a |
| page mark) appearing in the file are considered to be data |
| characters. In particular: |
| |
| * |
| `Get_Line` and `Skip_Line` do not test for a page mark |
| following a line mark. If a page mark appears, it will be treated as a |
| data character. |
| |
| * |
| This avoids the need to wait for an extra character to be typed or |
| entered from the pipe to complete one of these operations. |
| |
| * |
| `End_Of_Page` always returns `False` |
| |
| * |
| `End_Of_File` will return `False` if there is a page mark at |
| the end of the file. |
| |
| Output to non-regular files is the same as for regular files. Page marks |
| may be written to non-regular files using `New_Page`, but as noted |
| above they will not be treated as page marks on input if the output is |
| piped to another Ada program. |
| |
| Another important discrepancy when reading non-regular files is that the end |
| of file indication is not 'sticky'. If an end of file is entered, e.g., by |
| pressing the :kbd:`EOT` key, |
| then end of file |
| is signaled once (i.e., the test `End_Of_File` |
| will yield `True`, or a read will |
| raise `End_Error`), but then reading can resume |
| to read data past that end of |
| file indication, until another end of file indication is entered. |
| |
| .. _Get_Immediate: |
| |
| Get_Immediate |
| ------------- |
| |
| .. index:: Get_Immediate |
| |
| Get_Immediate returns the next character (including control characters) |
| from the input file. In particular, Get_Immediate will return LF or FF |
| characters used as line marks or page marks. Such operations leave the |
| file positioned past the control character, and it is thus not treated |
| as having its normal function. This means that page, line and column |
| counts after this kind of Get_Immediate call are set as though the mark |
| did not occur. In the case where a Get_Immediate leaves the file |
| positioned between the line mark and page mark (which is not normally |
| possible), it is undefined whether the FF character will be treated as a |
| page mark. |
| |
| .. _Treating_Text_IO_Files_as_Streams: |
| |
| Treating Text_IO Files as Streams |
| --------------------------------- |
| |
| .. index:: Stream files |
| |
| The package `Text_IO.Streams` allows a Text_IO file to be treated |
| as a stream. Data written to a Text_IO file in this stream mode is |
| binary data. If this binary data contains bytes 16#0A# (`LF`) or |
| 16#0C# (`FF`), the resulting file may have non-standard |
| format. Similarly if read operations are used to read from a Text_IO |
| file treated as a stream, then `LF` and `FF` characters may be |
| skipped and the effect is similar to that described above for |
| `Get_Immediate`. |
| |
| .. _Text_IO_Extensions: |
| |
| Text_IO Extensions |
| ------------------ |
| |
| .. index:: Text_IO extensions |
| |
| A package GNAT.IO_Aux in the GNAT library provides some useful extensions |
| to the standard `Text_IO` package: |
| |
| * function File_Exists (Name : String) return Boolean; |
| Determines if a file of the given name exists. |
| |
| * function Get_Line return String; |
| Reads a string from the standard input file. The value returned is exactly |
| the length of the line that was read. |
| |
| * function Get_Line (File : Ada.Text_IO.File_Type) return String; |
| Similar, except that the parameter File specifies the file from which |
| the string is to be read. |
| |
| |
| .. _Text_IO_Facilities_for_Unbounded_Strings: |
| |
| Text_IO Facilities for Unbounded Strings |
| ---------------------------------------- |
| |
| .. index:: Text_IO for unbounded strings |
| |
| .. index:: Unbounded_String, Text_IO operations |
| |
| The package `Ada.Strings.Unbounded.Text_IO` |
| in library files `a-suteio.ads/adb` contains some GNAT-specific |
| subprograms useful for Text_IO operations on unbounded strings: |
| |
| |
| * function Get_Line (File : File_Type) return Unbounded_String; |
| Reads a line from the specified file |
| and returns the result as an unbounded string. |
| |
| * procedure Put (File : File_Type; U : Unbounded_String); |
| Writes the value of the given unbounded string to the specified file |
| Similar to the effect of |
| `Put (To_String (U))` except that an extra copy is avoided. |
| |
| * procedure Put_Line (File : File_Type; U : Unbounded_String); |
| Writes the value of the given unbounded string to the specified file, |
| followed by a `New_Line`. |
| Similar to the effect of `Put_Line (To_String (U))` except |
| that an extra copy is avoided. |
| |
| In the above procedures, `File` is of type `Ada.Text_IO.File_Type` |
| and is optional. If the parameter is omitted, then the standard input or |
| output file is referenced as appropriate. |
| |
| The package `Ada.Strings.Wide_Unbounded.Wide_Text_IO` in library |
| files :file:`a-swuwti.ads` and :file:`a-swuwti.adb` provides similar extended |
| `Wide_Text_IO` functionality for unbounded wide strings. |
| |
| The package `Ada.Strings.Wide_Wide_Unbounded.Wide_Wide_Text_IO` in library |
| files :file:`a-szuzti.ads` and :file:`a-szuzti.adb` provides similar extended |
| `Wide_Wide_Text_IO` functionality for unbounded wide wide strings. |
| |
| .. _Wide_Text_IO: |
| |
| Wide_Text_IO |
| ============ |
| |
| `Wide_Text_IO` is similar in most respects to Text_IO, except that |
| both input and output files may contain special sequences that represent |
| wide character values. The encoding scheme for a given file may be |
| specified using a FORM parameter: |
| |
| |
| :: |
| |
| WCEM=`x` |
| |
| |
| as part of the FORM string (WCEM = wide character encoding method), |
| where `x` is one of the following characters |
| |
| ========== ==================== |
| Character Encoding |
| ========== ==================== |
| *h* Hex ESC encoding |
| *u* Upper half encoding |
| *s* Shift-JIS encoding |
| *e* EUC Encoding |
| *8* UTF-8 encoding |
| *b* Brackets encoding |
| ========== ==================== |
| |
| The encoding methods match those that |
| can be used in a source |
| program, but there is no requirement that the encoding method used for |
| the source program be the same as the encoding method used for files, |
| and different files may use different encoding methods. |
| |
| The default encoding method for the standard files, and for opened files |
| for which no WCEM parameter is given in the FORM string matches the |
| wide character encoding specified for the main program (the default |
| being brackets encoding if no coding method was specified with -gnatW). |
| |
| |
| |
| *Hex Coding* |
| In this encoding, a wide character is represented by a five character |
| sequence: |
| |
| |
| :: |
| |
| ESC a b c d |
| |
| .. |
| |
| where `a`, `b`, `c`, `d` are the four hexadecimal |
| characters (using upper case letters) of the wide character code. For |
| example, ESC A345 is used to represent the wide character with code |
| 16#A345#. This scheme is compatible with use of the full |
| `Wide_Character` set. |
| |
| |
| *Upper Half Coding* |
| The wide character with encoding 16#abcd#, where the upper bit is on |
| (i.e., a is in the range 8-F) is represented as two bytes 16#ab# and |
| 16#cd#. The second byte may never be a format control character, but is |
| not required to be in the upper half. This method can be also used for |
| shift-JIS or EUC where the internal coding matches the external coding. |
| |
| |
| *Shift JIS Coding* |
| A wide character is represented by a two character sequence 16#ab# and |
| 16#cd#, with the restrictions described for upper half encoding as |
| described above. The internal character code is the corresponding JIS |
| character according to the standard algorithm for Shift-JIS |
| conversion. Only characters defined in the JIS code set table can be |
| used with this encoding method. |
| |
| |
| *EUC Coding* |
| A wide character is represented by a two character sequence 16#ab# and |
| 16#cd#, with both characters being in the upper half. The internal |
| character code is the corresponding JIS character according to the EUC |
| encoding algorithm. Only characters defined in the JIS code set table |
| can be used with this encoding method. |
| |
| |
| *UTF-8 Coding* |
| A wide character is represented using |
| UCS Transformation Format 8 (UTF-8) as defined in Annex R of ISO |
| 10646-1/Am.2. Depending on the character value, the representation |
| is a one, two, or three byte sequence: |
| |
| |
| :: |
| |
| 16#0000#-16#007f#: 2#0xxxxxxx# |
| 16#0080#-16#07ff#: 2#110xxxxx# 2#10xxxxxx# |
| 16#0800#-16#ffff#: 2#1110xxxx# 2#10xxxxxx# 2#10xxxxxx# |
| |
| .. |
| |
| where the `xxx` bits correspond to the left-padded bits of the |
| 16-bit character value. Note that all lower half ASCII characters |
| are represented as ASCII bytes and all upper half characters and |
| other wide characters are represented as sequences of upper-half |
| (The full UTF-8 scheme allows for encoding 31-bit characters as |
| 6-byte sequences, but in this implementation, all UTF-8 sequences |
| of four or more bytes length will raise a Constraint_Error, as |
| will all invalid UTF-8 sequences.) |
| |
| |
| *Brackets Coding* |
| In this encoding, a wide character is represented by the following eight |
| character sequence: |
| |
| |
| :: |
| |
| [ " a b c d " ] |
| |
| .. |
| |
| where `a`, `b`, `c`, `d` are the four hexadecimal |
| characters (using uppercase letters) of the wide character code. For |
| example, `["A345"]` is used to represent the wide character with code |
| `16#A345#`. |
| This scheme is compatible with use of the full Wide_Character set. |
| On input, brackets coding can also be used for upper half characters, |
| e.g., `["C1"]` for lower case a. However, on output, brackets notation |
| is only used for wide characters with a code greater than `16#FF#`. |
| |
| Note that brackets coding is not normally used in the context of |
| Wide_Text_IO or Wide_Wide_Text_IO, since it is really just designed as |
| a portable way of encoding source files. In the context of Wide_Text_IO |
| or Wide_Wide_Text_IO, it can only be used if the file does not contain |
| any instance of the left bracket character other than to encode wide |
| character values using the brackets encoding method. In practice it is |
| expected that some standard wide character encoding method such |
| as UTF-8 will be used for text input output. |
| |
| If brackets notation is used, then any occurrence of a left bracket |
| in the input file which is not the start of a valid wide character |
| sequence will cause Constraint_Error to be raised. It is possible to |
| encode a left bracket as ["5B"] and Wide_Text_IO and Wide_Wide_Text_IO |
| input will interpret this as a left bracket. |
| |
| However, when a left bracket is output, it will be output as a left bracket |
| and not as ["5B"]. We make this decision because for normal use of |
| Wide_Text_IO for outputting messages, it is unpleasant to clobber left |
| brackets. For example, if we write: |
| |
| |
| .. code-block:: ada |
| |
| Put_Line ("Start of output [first run]"); |
| |
| |
| we really do not want to have the left bracket in this message clobbered so |
| that the output reads: |
| |
| |
| :: |
| |
| Start of output ["5B"]first run] |
| |
| .. |
| |
| In practice brackets encoding is reasonably useful for normal Put_Line use |
| since we won't get confused between left brackets and wide character |
| sequences in the output. But for input, or when files are written out |
| and read back in, it really makes better sense to use one of the standard |
| encoding methods such as UTF-8. |
| |
| |
| For the coding schemes other than UTF-8, Hex, or Brackets encoding, |
| not all wide character |
| values can be represented. An attempt to output a character that cannot |
| be represented using the encoding scheme for the file causes |
| Constraint_Error to be raised. An invalid wide character sequence on |
| input also causes Constraint_Error to be raised. |
| |
| .. _Stream_Pointer_Positioning_1: |
| |
| Stream Pointer Positioning |
| -------------------------- |
| |
| `Ada.Wide_Text_IO` is similar to `Ada.Text_IO` in its handling |
| of stream pointer positioning (:ref:`Text_IO`). There is one additional |
| case: |
| |
| If `Ada.Wide_Text_IO.Look_Ahead` reads a character outside the |
| normal lower ASCII set (i.e., a character in the range: |
| |
| |
| .. code-block:: ada |
| |
| Wide_Character'Val (16#0080#) .. Wide_Character'Val (16#FFFF#) |
| |
| |
| then although the logical position of the file pointer is unchanged by |
| the `Look_Ahead` call, the stream is physically positioned past the |
| wide character sequence. Again this is to avoid the need for buffering |
| or backup, and all `Wide_Text_IO` routines check the internal |
| indication that this situation has occurred so that this is not visible |
| to a normal program using `Wide_Text_IO`. However, this discrepancy |
| can be observed if the wide text file shares a stream with another file. |
| |
| .. _Reading_and_Writing_Non-Regular_Files_1: |
| |
| Reading and Writing Non-Regular Files |
| ------------------------------------- |
| |
| As in the case of Text_IO, when a non-regular file is read, it is |
| assumed that the file contains no page marks (any form characters are |
| treated as data characters), and `End_Of_Page` always returns |
| `False`. Similarly, the end of file indication is not sticky, so |
| it is possible to read beyond an end of file. |
| |
| .. _Wide_Wide_Text_IO: |
| |
| Wide_Wide_Text_IO |
| ================= |
| |
| `Wide_Wide_Text_IO` is similar in most respects to Text_IO, except that |
| both input and output files may contain special sequences that represent |
| wide wide character values. The encoding scheme for a given file may be |
| specified using a FORM parameter: |
| |
| |
| :: |
| |
| WCEM=`x` |
| |
| |
| as part of the FORM string (WCEM = wide character encoding method), |
| where `x` is one of the following characters |
| |
| ========== ==================== |
| Character Encoding |
| ========== ==================== |
| *h* Hex ESC encoding |
| *u* Upper half encoding |
| *s* Shift-JIS encoding |
| *e* EUC Encoding |
| *8* UTF-8 encoding |
| *b* Brackets encoding |
| ========== ==================== |
| |
| |
| The encoding methods match those that |
| can be used in a source |
| program, but there is no requirement that the encoding method used for |
| the source program be the same as the encoding method used for files, |
| and different files may use different encoding methods. |
| |
| The default encoding method for the standard files, and for opened files |
| for which no WCEM parameter is given in the FORM string matches the |
| wide character encoding specified for the main program (the default |
| being brackets encoding if no coding method was specified with -gnatW). |
| |
| |
| |
| *UTF-8 Coding* |
| A wide character is represented using |
| UCS Transformation Format 8 (UTF-8) as defined in Annex R of ISO |
| 10646-1/Am.2. Depending on the character value, the representation |
| is a one, two, three, or four byte sequence: |
| |
| |
| :: |
| |
| 16#000000#-16#00007f#: 2#0xxxxxxx# |
| 16#000080#-16#0007ff#: 2#110xxxxx# 2#10xxxxxx# |
| 16#000800#-16#00ffff#: 2#1110xxxx# 2#10xxxxxx# 2#10xxxxxx# |
| 16#010000#-16#10ffff#: 2#11110xxx# 2#10xxxxxx# 2#10xxxxxx# 2#10xxxxxx# |
| |
| .. |
| |
| where the `xxx` bits correspond to the left-padded bits of the |
| 21-bit character value. Note that all lower half ASCII characters |
| are represented as ASCII bytes and all upper half characters and |
| other wide characters are represented as sequences of upper-half |
| characters. |
| |
| |
| *Brackets Coding* |
| In this encoding, a wide wide character is represented by the following eight |
| character sequence if is in wide character range |
| |
| |
| :: |
| |
| [ " a b c d " ] |
| |
| .. |
| |
| and by the following ten character sequence if not |
| |
| |
| :: |
| |
| [ " a b c d e f " ] |
| |
| .. |
| |
| where `a`, `b`, `c`, `d`, `e`, and `f` |
| are the four or six hexadecimal |
| characters (using uppercase letters) of the wide wide character code. For |
| example, `["01A345"]` is used to represent the wide wide character |
| with code `16#01A345#`. |
| |
| This scheme is compatible with use of the full Wide_Wide_Character set. |
| On input, brackets coding can also be used for upper half characters, |
| e.g., `["C1"]` for lower case a. However, on output, brackets notation |
| is only used for wide characters with a code greater than `16#FF#`. |
| |
| |
| If is also possible to use the other Wide_Character encoding methods, |
| such as Shift-JIS, but the other schemes cannot support the full range |
| of wide wide characters. |
| An attempt to output a character that cannot |
| be represented using the encoding scheme for the file causes |
| Constraint_Error to be raised. An invalid wide character sequence on |
| input also causes Constraint_Error to be raised. |
| |
| .. _Stream_Pointer_Positioning_2: |
| |
| Stream Pointer Positioning |
| -------------------------- |
| |
| `Ada.Wide_Wide_Text_IO` is similar to `Ada.Text_IO` in its handling |
| of stream pointer positioning (:ref:`Text_IO`). There is one additional |
| case: |
| |
| If `Ada.Wide_Wide_Text_IO.Look_Ahead` reads a character outside the |
| normal lower ASCII set (i.e., a character in the range: |
| |
| |
| .. code-block:: ada |
| |
| Wide_Wide_Character'Val (16#0080#) .. Wide_Wide_Character'Val (16#10FFFF#) |
| |
| |
| then although the logical position of the file pointer is unchanged by |
| the `Look_Ahead` call, the stream is physically positioned past the |
| wide character sequence. Again this is to avoid the need for buffering |
| or backup, and all `Wide_Wide_Text_IO` routines check the internal |
| indication that this situation has occurred so that this is not visible |
| to a normal program using `Wide_Wide_Text_IO`. However, this discrepancy |
| can be observed if the wide text file shares a stream with another file. |
| |
| .. _Reading_and_Writing_Non-Regular_Files_2: |
| |
| Reading and Writing Non-Regular Files |
| ------------------------------------- |
| |
| As in the case of Text_IO, when a non-regular file is read, it is |
| assumed that the file contains no page marks (any form characters are |
| treated as data characters), and `End_Of_Page` always returns |
| `False`. Similarly, the end of file indication is not sticky, so |
| it is possible to read beyond an end of file. |
| |
| .. _Stream_IO: |
| |
| Stream_IO |
| ========= |
| |
| A stream file is a sequence of bytes, where individual elements are |
| written to the file as described in the Ada Reference Manual. The type |
| `Stream_Element` is simply a byte. There are two ways to read or |
| write a stream file. |
| |
| * |
| The operations `Read` and `Write` directly read or write a |
| sequence of stream elements with no control information. |
| |
| * |
| The stream attributes applied to a stream file transfer data in the |
| manner described for stream attributes. |
| |
| .. _Text_Translation: |
| |
| Text Translation |
| ================ |
| |
| ``Text_Translation=xxx`` may be used as the Form parameter |
| passed to Text_IO.Create and Text_IO.Open. ``Text_Translation=xxx`` |
| has no effect on Unix systems. Possible values are: |
| |
| |
| * |
| ``Yes`` or ``Text`` is the default, which means to |
| translate LF to/from CR/LF on Windows systems. |
| |
| ``No`` disables this translation; i.e. it |
| uses binary mode. For output files, ``Text_Translation=No`` |
| may be used to create Unix-style files on |
| Windows. |
| |
| * |
| ``wtext`` translation enabled in Unicode mode. |
| (corresponds to _O_WTEXT). |
| |
| * |
| ``u8text`` translation enabled in Unicode UTF-8 mode. |
| (corresponds to O_U8TEXT). |
| |
| * |
| ``u16text`` translation enabled in Unicode UTF-16 |
| mode. (corresponds to_O_U16TEXT). |
| |
| |
| .. _Shared_Files: |
| |
| Shared Files |
| ============ |
| |
| Section A.14 of the Ada Reference Manual allows implementations to |
| provide a wide variety of behavior if an attempt is made to access the |
| same external file with two or more internal files. |
| |
| To provide a full range of functionality, while at the same time |
| minimizing the problems of portability caused by this implementation |
| dependence, GNAT handles file sharing as follows: |
| |
| * |
| In the absence of a ``shared=xxx`` form parameter, an attempt |
| to open two or more files with the same full name is considered an error |
| and is not supported. The exception `Use_Error` will be |
| raised. Note that a file that is not explicitly closed by the program |
| remains open until the program terminates. |
| |
| * |
| If the form parameter ``shared=no`` appears in the form string, the |
| file can be opened or created with its own separate stream identifier, |
| regardless of whether other files sharing the same external file are |
| opened. The exact effect depends on how the C stream routines handle |
| multiple accesses to the same external files using separate streams. |
| |
| * |
| If the form parameter ``shared=yes`` appears in the form string for |
| each of two or more files opened using the same full name, the same |
| stream is shared between these files, and the semantics are as described |
| in Ada Reference Manual, Section A.14. |
| |
| When a program that opens multiple files with the same name is ported |
| from another Ada compiler to GNAT, the effect will be that |
| `Use_Error` is raised. |
| |
| The documentation of the original compiler and the documentation of the |
| program should then be examined to determine if file sharing was |
| expected, and ``shared=xxx`` parameters added to `Open` |
| and `Create` calls as required. |
| |
| When a program is ported from GNAT to some other Ada compiler, no |
| special attention is required unless the ``shared=xxx`` form |
| parameter is used in the program. In this case, you must examine the |
| documentation of the new compiler to see if it supports the required |
| file sharing semantics, and form strings modified appropriately. Of |
| course it may be the case that the program cannot be ported if the |
| target compiler does not support the required functionality. The best |
| approach in writing portable code is to avoid file sharing (and hence |
| the use of the ``shared=xxx`` parameter in the form string) |
| completely. |
| |
| One common use of file sharing in Ada 83 is the use of instantiations of |
| Sequential_IO on the same file with different types, to achieve |
| heterogeneous input-output. Although this approach will work in GNAT if |
| ``shared=yes`` is specified, it is preferable in Ada to use Stream_IO |
| for this purpose (using the stream attributes) |
| |
| .. _Filenames_encoding: |
| |
| Filenames encoding |
| ================== |
| |
| An encoding form parameter can be used to specify the filename |
| encoding ``encoding=xxx``. |
| |
| * |
| If the form parameter ``encoding=utf8`` appears in the form string, the |
| filename must be encoded in UTF-8. |
| |
| * |
| If the form parameter ``encoding=8bits`` appears in the form |
| string, the filename must be a standard 8bits string. |
| |
| In the absence of a ``encoding=xxx`` form parameter, the |
| encoding is controlled by the ``GNAT_CODE_PAGE`` environment |
| variable. And if not set ``utf8`` is assumed. |
| |
| |
| |
| *CP_ACP* |
| The current system Windows ANSI code page. |
| |
| *CP_UTF8* |
| UTF-8 encoding |
| |
| This encoding form parameter is only supported on the Windows |
| platform. On the other Operating Systems the run-time is supporting |
| UTF-8 natively. |
| |
| .. _File_content_encoding: |
| |
| File content encoding |
| ===================== |
| |
| For text files it is possible to specify the encoding to use. This is |
| controlled by the by the ``GNAT_CCS_ENCODING`` environment |
| variable. And if not set ``TEXT`` is assumed. |
| |
| The possible values are those supported on Windows: |
| |
| |
| |
| *TEXT* |
| Translated text mode |
| |
| *WTEXT* |
| Translated unicode encoding |
| |
| *U16TEXT* |
| Unicode 16-bit encoding |
| |
| *U8TEXT* |
| Unicode 8-bit encoding |
| |
| This encoding is only supported on the Windows platform. |
| |
| .. _Open_Modes: |
| |
| Open Modes |
| ========== |
| |
| `Open` and `Create` calls result in a call to `fopen` |
| using the mode shown in the following table: |
| |
| +----------------------------+---------------+------------------+ |
| | `Open` and `Create` Call Modes | |
| +----------------------------+---------------+------------------+ |
| | | **OPEN** | **CREATE** | |
| +============================+===============+==================+ |
| | Append_File | "r+" | "w+" | |
| +----------------------------+---------------+------------------+ |
| | In_File | "r" | "w+" | |
| +----------------------------+---------------+------------------+ |
| | Out_File (Direct_IO) | "r+" | "w" | |
| +----------------------------+---------------+------------------+ |
| | Out_File (all other cases) | "w" | "w" | |
| +----------------------------+---------------+------------------+ |
| | Inout_File | "r+" | "w+" | |
| +----------------------------+---------------+------------------+ |
| |
| |
| If text file translation is required, then either ``b`` or ``t`` |
| is added to the mode, depending on the setting of Text. Text file |
| translation refers to the mapping of CR/LF sequences in an external file |
| to LF characters internally. This mapping only occurs in DOS and |
| DOS-like systems, and is not relevant to other systems. |
| |
| A special case occurs with Stream_IO. As shown in the above table, the |
| file is initially opened in ``r`` or ``w`` mode for the |
| `In_File` and `Out_File` cases. If a `Set_Mode` operation |
| subsequently requires switching from reading to writing or vice-versa, |
| then the file is reopened in ``r+`` mode to permit the required operation. |
| |
| .. _Operations_on_C_Streams: |
| |
| Operations on C Streams |
| ======================= |
| |
| The package `Interfaces.C_Streams` provides an Ada program with direct |
| access to the C library functions for operations on C streams: |
| |
| |
| .. code-block:: ada |
| |
| package Interfaces.C_Streams is |
| -- Note: the reason we do not use the types that are in |
| -- Interfaces.C is that we want to avoid dragging in the |
| -- code in this unit if possible. |
| subtype chars is System.Address; |
| -- Pointer to null-terminated array of characters |
| subtype FILEs is System.Address; |
| -- Corresponds to the C type FILE* |
| subtype voids is System.Address; |
| -- Corresponds to the C type void* |
| subtype int is Integer; |
| subtype long is Long_Integer; |
| -- Note: the above types are subtypes deliberately, and it |
| -- is part of this spec that the above correspondences are |
| -- guaranteed. This means that it is legitimate to, for |
| -- example, use Integer instead of int. We provide these |
| -- synonyms for clarity, but in some cases it may be |
| -- convenient to use the underlying types (for example to |
| -- avoid an unnecessary dependency of a spec on the spec |
| -- of this unit). |
| type size_t is mod 2 ** Standard'Address_Size; |
| NULL_Stream : constant FILEs; |
| -- Value returned (NULL in C) to indicate an |
| -- fdopen/fopen/tmpfile error |
| ---------------------------------- |
| -- Constants Defined in stdio.h -- |
| ---------------------------------- |
| EOF : constant int; |
| -- Used by a number of routines to indicate error or |
| -- end of file |
| IOFBF : constant int; |
| IOLBF : constant int; |
| IONBF : constant int; |
| -- Used to indicate buffering mode for setvbuf call |
| SEEK_CUR : constant int; |
| SEEK_END : constant int; |
| SEEK_SET : constant int; |
| -- Used to indicate origin for fseek call |
| function stdin return FILEs; |
| function stdout return FILEs; |
| function stderr return FILEs; |
| -- Streams associated with standard files |
| -------------------------- |
| -- Standard C functions -- |
| -------------------------- |
| -- The functions selected below are ones that are |
| -- available in UNIX (but not necessarily in ANSI C). |
| -- These are very thin interfaces |
| -- which copy exactly the C headers. For more |
| -- documentation on these functions, see the Microsoft C |
| -- "Run-Time Library Reference" (Microsoft Press, 1990, |
| -- ISBN 1-55615-225-6), which includes useful information |
| -- on system compatibility. |
| procedure clearerr (stream : FILEs); |
| function fclose (stream : FILEs) return int; |
| function fdopen (handle : int; mode : chars) return FILEs; |
| function feof (stream : FILEs) return int; |
| function ferror (stream : FILEs) return int; |
| function fflush (stream : FILEs) return int; |
| function fgetc (stream : FILEs) return int; |
| function fgets (strng : chars; n : int; stream : FILEs) |
| return chars; |
| function fileno (stream : FILEs) return int; |
| function fopen (filename : chars; Mode : chars) |
| return FILEs; |
| -- Note: to maintain target independence, use |
| -- text_translation_required, a boolean variable defined in |
| -- a-sysdep.c to deal with the target dependent text |
| -- translation requirement. If this variable is set, |
| -- then b/t should be appended to the standard mode |
| -- argument to set the text translation mode off or on |
| -- as required. |
| function fputc (C : int; stream : FILEs) return int; |
| function fputs (Strng : chars; Stream : FILEs) return int; |
| function fread |
| (buffer : voids; |
| size : size_t; |
| count : size_t; |
| stream : FILEs) |
| return size_t; |
| function freopen |
| (filename : chars; |
| mode : chars; |
| stream : FILEs) |
| return FILEs; |
| function fseek |
| (stream : FILEs; |
| offset : long; |
| origin : int) |
| return int; |
| function ftell (stream : FILEs) return long; |
| function fwrite |
| (buffer : voids; |
| size : size_t; |
| count : size_t; |
| stream : FILEs) |
| return size_t; |
| function isatty (handle : int) return int; |
| procedure mktemp (template : chars); |
| -- The return value (which is just a pointer to template) |
| -- is discarded |
| procedure rewind (stream : FILEs); |
| function rmtmp return int; |
| function setvbuf |
| (stream : FILEs; |
| buffer : chars; |
| mode : int; |
| size : size_t) |
| return int; |
| |
| function tmpfile return FILEs; |
| function ungetc (c : int; stream : FILEs) return int; |
| function unlink (filename : chars) return int; |
| --------------------- |
| -- Extra functions -- |
| --------------------- |
| -- These functions supply slightly thicker bindings than |
| -- those above. They are derived from functions in the |
| -- C Run-Time Library, but may do a bit more work than |
| -- just directly calling one of the Library functions. |
| function is_regular_file (handle : int) return int; |
| -- Tests if given handle is for a regular file (result 1) |
| -- or for a non-regular file (pipe or device, result 0). |
| --------------------------------- |
| -- Control of Text/Binary Mode -- |
| --------------------------------- |
| -- If text_translation_required is true, then the following |
| -- functions may be used to dynamically switch a file from |
| -- binary to text mode or vice versa. These functions have |
| -- no effect if text_translation_required is false (i.e., in |
| -- normal UNIX mode). Use fileno to get a stream handle. |
| procedure set_binary_mode (handle : int); |
| procedure set_text_mode (handle : int); |
| ---------------------------- |
| -- Full Path Name support -- |
| ---------------------------- |
| procedure full_name (nam : chars; buffer : chars); |
| -- Given a NUL terminated string representing a file |
| -- name, returns in buffer a NUL terminated string |
| -- representing the full path name for the file name. |
| -- On systems where it is relevant the drive is also |
| -- part of the full path name. It is the responsibility |
| -- of the caller to pass an actual parameter for buffer |
| -- that is big enough for any full path name. Use |
| -- max_path_len given below as the size of buffer. |
| max_path_len : integer; |
| -- Maximum length of an allowable full path name on the |
| -- system, including a terminating NUL character. |
| end Interfaces.C_Streams; |
| |
| |
| .. _Interfacing_to_C_Streams: |
| |
| Interfacing to C Streams |
| ======================== |
| |
| The packages in this section permit interfacing Ada files to C Stream |
| operations. |
| |
| |
| .. code-block:: ada |
| |
| with Interfaces.C_Streams; |
| package Ada.Sequential_IO.C_Streams is |
| function C_Stream (F : File_Type) |
| return Interfaces.C_Streams.FILEs; |
| procedure Open |
| (File : in out File_Type; |
| Mode : in File_Mode; |
| C_Stream : in Interfaces.C_Streams.FILEs; |
| Form : in String := ""); |
| end Ada.Sequential_IO.C_Streams; |
| |
| with Interfaces.C_Streams; |
| package Ada.Direct_IO.C_Streams is |
| function C_Stream (F : File_Type) |
| return Interfaces.C_Streams.FILEs; |
| procedure Open |
| (File : in out File_Type; |
| Mode : in File_Mode; |
| C_Stream : in Interfaces.C_Streams.FILEs; |
| Form : in String := ""); |
| end Ada.Direct_IO.C_Streams; |
| |
| with Interfaces.C_Streams; |
| package Ada.Text_IO.C_Streams is |
| function C_Stream (F : File_Type) |
| return Interfaces.C_Streams.FILEs; |
| procedure Open |
| (File : in out File_Type; |
| Mode : in File_Mode; |
| C_Stream : in Interfaces.C_Streams.FILEs; |
| Form : in String := ""); |
| end Ada.Text_IO.C_Streams; |
| |
| with Interfaces.C_Streams; |
| package Ada.Wide_Text_IO.C_Streams is |
| function C_Stream (F : File_Type) |
| return Interfaces.C_Streams.FILEs; |
| procedure Open |
| (File : in out File_Type; |
| Mode : in File_Mode; |
| C_Stream : in Interfaces.C_Streams.FILEs; |
| Form : in String := ""); |
| end Ada.Wide_Text_IO.C_Streams; |
| |
| with Interfaces.C_Streams; |
| package Ada.Wide_Wide_Text_IO.C_Streams is |
| function C_Stream (F : File_Type) |
| return Interfaces.C_Streams.FILEs; |
| procedure Open |
| (File : in out File_Type; |
| Mode : in File_Mode; |
| C_Stream : in Interfaces.C_Streams.FILEs; |
| Form : in String := ""); |
| end Ada.Wide_Wide_Text_IO.C_Streams; |
| |
| with Interfaces.C_Streams; |
| package Ada.Stream_IO.C_Streams is |
| function C_Stream (F : File_Type) |
| return Interfaces.C_Streams.FILEs; |
| procedure Open |
| (File : in out File_Type; |
| Mode : in File_Mode; |
| C_Stream : in Interfaces.C_Streams.FILEs; |
| Form : in String := ""); |
| end Ada.Stream_IO.C_Streams; |
| |
| |
| In each of these six packages, the `C_Stream` function obtains the |
| `FILE` pointer from a currently opened Ada file. It is then |
| possible to use the `Interfaces.C_Streams` package to operate on |
| this stream, or the stream can be passed to a C program which can |
| operate on it directly. Of course the program is responsible for |
| ensuring that only appropriate sequences of operations are executed. |
| |
| One particular use of relevance to an Ada program is that the |
| `setvbuf` function can be used to control the buffering of the |
| stream used by an Ada file. In the absence of such a call the standard |
| default buffering is used. |
| |
| The `Open` procedures in these packages open a file giving an |
| existing C Stream instead of a file name. Typically this stream is |
| imported from a C program, allowing an Ada file to operate on an |
| existing C file. |
| |