bfd/doc/bfdsumm.texi - binutils-gdb - Git at Google

 @c This summary of BFD is shared by the BFD and LD docs.
 @c Copyright (C) 2012-2021 Free Software Foundation, Inc.

 When an object file is opened, BFD subroutines automatically determine
 the format of the input object file.  They then build a descriptor in
 memory with pointers to routines that will be used to access elements of
 the object file's data structures.

 As different information from the object files is required,
 BFD reads from different sections of the file and processes them.
 For example, a very common operation for the linker is processing symbol
 tables.  Each BFD back end provides a routine for converting
 between the object file's representation of symbols and an internal
 canonical format. When the linker asks for the symbol table of an object
 file, it calls through a memory pointer to the routine from the
 relevant BFD back end which reads and converts the table into a canonical
 form.  The linker then operates upon the canonical form. When the link is
 finished and the linker writes the output file's symbol table,
 another BFD back end routine is called to take the newly
 created symbol table and convert it into the chosen output format.

 @menu
 * BFD information loss::	Information Loss
 * Canonical format::		The BFD	canonical object-file format
 @end menu

 @node BFD information loss
 @subsection Information Loss

 @emph{Information can be lost during output.} The output formats
 supported by BFD do not provide identical facilities, and
 information which can be described in one form has nowhere to go in
 another format. One example of this is alignment information in
 @code{b.out}. There is nowhere in an @code{a.out} format file to store
 alignment information on the contained data, so when a file is linked
 from @code{b.out} and an @code{a.out} image is produced, alignment
 information will not propagate to the output file. (The linker will
 still use the alignment information internally, so the link is performed
 correctly).

 Another example is COFF section names. COFF files may contain an
 unlimited number of sections, each one with a textual section name. If
 the target of the link is a format which does not have many sections (e.g.,
 @code{a.out}) or has sections without names (e.g., the Oasys format), the
 link cannot be done simply. You can circumvent this problem by
 describing the desired input-to-output section mapping with the linker command
 language.

 @emph{Information can be lost during canonicalization.} The BFD
 internal canonical form of the external formats is not exhaustive; there
 are structures in input formats for which there is no direct
 representation internally.  This means that the BFD back ends
 cannot maintain all possible data richness through the transformation
 between external to internal and back to external formats.

 This limitation is only a problem when an application reads one
 format and writes another.  Each BFD back end is responsible for
 maintaining as much data as possible, and the internal BFD
 canonical form has structures which are opaque to the BFD core,
 and exported only to the back ends. When a file is read in one format,
 the canonical form is generated for BFD and the application. At the
 same time, the back end saves away any information which may otherwise
 be lost. If the data is then written back in the same format, the back
 end routine will be able to use the canonical form provided by the
 BFD core as well as the information it prepared earlier.  Since
 there is a great deal of commonality between back ends,
 there is no information lost when
 linking or copying big endian COFF to little endian COFF, or @code{a.out} to
 @code{b.out}.  When a mixture of formats is linked, the information is
 only lost from the files whose format differs from the destination.

 @node Canonical format
 @subsection The BFD canonical object-file format

 The greatest potential for loss of information occurs when there is the least
 overlap between the information provided by the source format, that
 stored by the canonical format, and that needed by the
 destination format. A brief description of the canonical form may help
 you understand which kinds of data you can count on preserving across
 conversions.
 @cindex BFD canonical format
 @cindex internal object-file format

 @table @emph
 @item files
 Information stored on a per-file basis includes target machine
 architecture, particular implementation format type, a demand pageable
 bit, and a write protected bit.  Information like Unix magic numbers is
 not stored here---only the magic numbers' meaning, so a @code{ZMAGIC}
 file would have both the demand pageable bit and the write protected
 text bit set.  The byte order of the target is stored on a per-file
 basis, so that big- and little-endian object files may be used with one
 another.

 @item sections
 Each section in the input file contains the name of the section, the
 section's original address in the object file, size and alignment
 information, various flags, and pointers into other BFD data
 structures.

 @item symbols
 Each symbol contains a pointer to the information for the object file
 which originally defined it, its name, its value, and various flag
 bits.  When a BFD back end reads in a symbol table, it relocates all
 symbols to make them relative to the base of the section where they were
 defined.  Doing this ensures that each symbol points to its containing
 section.  Each symbol also has a varying amount of hidden private data
 for the BFD back end.  Since the symbol points to the original file, the
 private data format for that symbol is accessible.  @code{ld} can
 operate on a collection of symbols of wildly different formats without
 problems.

 Normal global and simple local symbols are maintained on output, so an
 output file (no matter its format) will retain symbols pointing to
 functions and to global, static, and common variables.  Some symbol
 information is not worth retaining; in @code{a.out}, type information is
 stored in the symbol table as long symbol names.  This information would
 be useless to most COFF debuggers; the linker has command-line switches
 to allow users to throw it away.

 There is one word of type information within the symbol, so if the
 format supports symbol type information within symbols (for example, COFF,
 Oasys) and the type is simple enough to fit within one word
 (nearly everything but aggregates), the information will be preserved.

 @item relocation level
 Each canonical BFD relocation record contains a pointer to the symbol to
 relocate to, the offset of the data to relocate, the section the data
 is in, and a pointer to a relocation type descriptor. Relocation is
 performed by passing messages through the relocation type
 descriptor and the symbol pointer. Therefore, relocations can be performed
 on output data using a relocation method that is only available in one of the
 input formats. For instance, Oasys provides a byte relocation format.
 A relocation record requesting this relocation type would point
 indirectly to a routine to perform this, so the relocation may be
 performed on a byte being written to a 68k COFF file, even though 68k COFF
 has no such relocation type.

 @item line numbers
 Object formats can contain, for debugging purposes, some form of mapping
 between symbols, source line numbers, and addresses in the output file.
 These addresses have to be relocated along with the symbol information.
 Each symbol with an associated list of line number records points to the
 first record of the list.  The head of a line number list consists of a
 pointer to the symbol, which allows finding out the address of the
 function whose line number is being described. The rest of the list is
 made up of pairs: offsets into the section and line numbers. Any format
 which can simply derive this information can pass it successfully
 between formats.
 @end table
	@c This summary of BFD is shared by the BFD and LD docs.
	@c Copyright (C) 2012-2021 Free Software Foundation, Inc.

	When an object file is opened, BFD subroutines automatically determine
	the format of the input object file. They then build a descriptor in
	memory with pointers to routines that will be used to access elements of
	the object file's data structures.

	As different information from the object files is required,
	BFD reads from different sections of the file and processes them.
	For example, a very common operation for the linker is processing symbol
	tables. Each BFD back end provides a routine for converting
	between the object file's representation of symbols and an internal
	canonical format. When the linker asks for the symbol table of an object
	file, it calls through a memory pointer to the routine from the
	relevant BFD back end which reads and converts the table into a canonical
	form. The linker then operates upon the canonical form. When the link is
	finished and the linker writes the output file's symbol table,
	another BFD back end routine is called to take the newly
	created symbol table and convert it into the chosen output format.

	@menu
	* BFD information loss:: Information Loss
	* Canonical format:: The BFD canonical object-file format
	@end menu

	@node BFD information loss
	@subsection Information Loss

	@emph{Information can be lost during output.} The output formats
	supported by BFD do not provide identical facilities, and
	information which can be described in one form has nowhere to go in
	another format. One example of this is alignment information in
	@code{b.out}. There is nowhere in an @code{a.out} format file to store
	alignment information on the contained data, so when a file is linked
	from @code{b.out} and an @code{a.out} image is produced, alignment
	information will not propagate to the output file. (The linker will
	still use the alignment information internally, so the link is performed
	correctly).

	Another example is COFF section names. COFF files may contain an
	unlimited number of sections, each one with a textual section name. If
	the target of the link is a format which does not have many sections (e.g.,
	@code{a.out}) or has sections without names (e.g., the Oasys format), the
	link cannot be done simply. You can circumvent this problem by
	describing the desired input-to-output section mapping with the linker command
	language.

	@emph{Information can be lost during canonicalization.} The BFD
	internal canonical form of the external formats is not exhaustive; there
	are structures in input formats for which there is no direct
	representation internally. This means that the BFD back ends
	cannot maintain all possible data richness through the transformation
	between external to internal and back to external formats.

	This limitation is only a problem when an application reads one
	format and writes another. Each BFD back end is responsible for
	maintaining as much data as possible, and the internal BFD
	canonical form has structures which are opaque to the BFD core,
	and exported only to the back ends. When a file is read in one format,
	the canonical form is generated for BFD and the application. At the
	same time, the back end saves away any information which may otherwise
	be lost. If the data is then written back in the same format, the back
	end routine will be able to use the canonical form provided by the
	BFD core as well as the information it prepared earlier. Since
	there is a great deal of commonality between back ends,
	there is no information lost when
	linking or copying big endian COFF to little endian COFF, or @code{a.out} to
	@code{b.out}. When a mixture of formats is linked, the information is
	only lost from the files whose format differs from the destination.

	@node Canonical format
	@subsection The BFD canonical object-file format

	The greatest potential for loss of information occurs when there is the least
	overlap between the information provided by the source format, that
	stored by the canonical format, and that needed by the
	destination format. A brief description of the canonical form may help
	you understand which kinds of data you can count on preserving across
	conversions.
	@cindex BFD canonical format
	@cindex internal object-file format

	@table @emph
	@item files
	Information stored on a per-file basis includes target machine
	architecture, particular implementation format type, a demand pageable
	bit, and a write protected bit. Information like Unix magic numbers is
	not stored here---only the magic numbers' meaning, so a @code{ZMAGIC}
	file would have both the demand pageable bit and the write protected
	text bit set. The byte order of the target is stored on a per-file
	basis, so that big- and little-endian object files may be used with one
	another.

	@item sections
	Each section in the input file contains the name of the section, the
	section's original address in the object file, size and alignment
	information, various flags, and pointers into other BFD data
	structures.

	@item symbols
	Each symbol contains a pointer to the information for the object file
	which originally defined it, its name, its value, and various flag
	bits. When a BFD back end reads in a symbol table, it relocates all
	symbols to make them relative to the base of the section where they were
	defined. Doing this ensures that each symbol points to its containing
	section. Each symbol also has a varying amount of hidden private data
	for the BFD back end. Since the symbol points to the original file, the
	private data format for that symbol is accessible. @code{ld} can
	operate on a collection of symbols of wildly different formats without
	problems.

	Normal global and simple local symbols are maintained on output, so an
	output file (no matter its format) will retain symbols pointing to
	functions and to global, static, and common variables. Some symbol
	information is not worth retaining; in @code{a.out}, type information is
	stored in the symbol table as long symbol names. This information would
	be useless to most COFF debuggers; the linker has command-line switches
	to allow users to throw it away.

	There is one word of type information within the symbol, so if the
	format supports symbol type information within symbols (for example, COFF,
	Oasys) and the type is simple enough to fit within one word
	(nearly everything but aggregates), the information will be preserved.

	@item relocation level
	Each canonical BFD relocation record contains a pointer to the symbol to
	relocate to, the offset of the data to relocate, the section the data
	is in, and a pointer to a relocation type descriptor. Relocation is
	performed by passing messages through the relocation type
	descriptor and the symbol pointer. Therefore, relocations can be performed
	on output data using a relocation method that is only available in one of the
	input formats. For instance, Oasys provides a byte relocation format.
	A relocation record requesting this relocation type would point
	indirectly to a routine to perform this, so the relocation may be
	performed on a byte being written to a 68k COFF file, even though 68k COFF
	has no such relocation type.

	@item line numbers
	Object formats can contain, for debugging purposes, some form of mapping
	between symbols, source line numbers, and addresses in the output file.
	These addresses have to be relocated along with the symbol information.
	Each symbol with an associated list of line number records points to the
	first record of the list. The head of a line number list consists of a
	pointer to the symbol, which allows finding out the address of the
	function whose line number is being described. The rest of the list is
	made up of pairs: offsets into the section and line numbers. Any format
	which can simply derive this information can pass it successfully
	between formats.
	@end table