| Haifa scheduler (haifa-sched.c, loop.[ch], unroll.[ch], genattrtab.c): |
| (contact law@cygnus.com before starting any serious haifa work) |
| |
| * Fix all the formatting problems. Simple, mindless work. |
| |
| * Fix/add comments throughout the code. Many of the comments are from |
| the old scheduler and are out of date and misleading. Many new hunks |
| of code don't have sufficient comments and documentation. Those which |
| do have comments need to be rewritten to use complete sentences and |
| proper formatting. |
| |
| * Someone needs make one (or more) passes over the scheduler as a whole to |
| just clean it up. Try to move the machine dependent bits into the target |
| files where they belong, avoid re-creating functions where or near |
| equivalents already exist (ie is_conditional_branch and friends), etc etc. |
| |
| * Document the new scheduling options. Remove those options which are |
| not really useful (like reverse scheduling for example). In general |
| the haifa scheduler adds _way_ too many options. I'm definitely of the |
| opinion that gcc already has too many -foptions, and haifa doesn't help |
| that situation. |
| |
| * Testing and benchmarking. Haifa has received little testing inside |
| Cygnus -- it needs to be throughly tested on a wide variety of platforms |
| which benefit from instruction scheduling (sparc, alpha, pa, ppc, mips, x86, |
| i960, m88k, sh, etc). It needs to be benchmarked -- my tests showed |
| haifa was very much a hit or miss in terms of performance improvements. |
| |
| Some benchmarks ran significantly fasters, other significantly slower. |
| We need to work on making haifa generate better overall code. |
| |
| We need to have some kind of docs for how to best describe a machine to |
| the haifa scheduler to get good performance. Some existing ports have |
| been tuned to deal with the old scheduler -- they may need to be tuned |
| to generate good schedules with haifa. |
| |
| |
| |
| |
| ------------- |
| |
| The old PROJECTS file. Stuff I know has been done has been deleted. |
| Stuff in progress has a contact name associated with it. |
| has been |
| |
| 1. Better optimization. |
| |
| * Constants in unused inline functions |
| |
| It would be nice to delay output of string constants so that string |
| constants mentioned in unused inline functions are never generated. |
| Perhaps this would also take care of string constants in dead code. |
| |
| The difficulty is in finding a clean way for the RTL which refers |
| to the constant (currently, only by an assembler symbol name) |
| to point to the constant and cause it to be output. |
| |
| * More cse |
| |
| The techniques for doing full global cse are described in the red |
| dragon book, or (a different version) in Frederick Chow's thesis from |
| Stanford. It is likely to be slow and use a lot of memory, but it |
| might be worth offering as an additional option. Contact dje@cygnus.com |
| before doing any work on CSE. |
| |
| * Optimize a sequence of if statements whose conditions are exclusive. |
| |
| It is possible to optimize |
| |
| if (x == 1) ...; |
| if (x == 2) ...; |
| if (x == 3) ...; |
| |
| into |
| |
| if (x == 1) ...; |
| else if (x == 2) ...; |
| else if (x == 3) ...; |
| |
| provided that x is not altered by the contents of the if statements. |
| |
| It's not certain whether this is worth doing. Perhaps programmers |
| nearly always write the else's themselves, leaving few opportunities |
| to improve anything. |
| |
| * Un-cse. |
| |
| Perhaps we should have an un-cse step right after cse, which tries to |
| replace a reg with its value if the value can be substituted for the |
| reg everywhere, if that looks like an improvement. Which is if the |
| reg is used only a few times. Use rtx_cost to determine if the |
| change is really an improvement. |
| |
| * Clean up how cse works. |
| |
| The scheme is that each value has just one hash entry. The |
| first_same_value and next_same_value chains are no longer needed. |
| |
| For arithmetic, each hash table elt has the following slots: |
| |
| * Operation. This is an rtx code. |
| * Mode. |
| * Operands 0, 1 and 2. These point to other hash table elements. |
| |
| So, if we want to enter (PLUS:SI (REG:SI 30) (CONST_INT 104)), we |
| first enter (CONST_INT 104) and find the entry that (REG:SI 30) now |
| points to. Then we put these elts into operands 0 and 1 of a new elt. |
| We put PLUS and SI into the new elt. |
| |
| Registers and mem refs would never be entered into the table as such. |
| However, the values they contain would be entered. There would be a |
| table indexed by regno which points at the hash entry for the value in |
| that reg. |
| |
| The hash entry index now plays the role of a qty number. |
| We still need qty_first_reg, reg_next_eqv, etc. to record which regs |
| share a particular qty. |
| |
| When a reg is used whose contents are unknown, we need to create a |
| hash table entry whose contents say "unknown", as a place holder for |
| whatever the reg contains. If that reg is added to something, then |
| the hash entry for the sum will refer to the "unknown" entry. Use |
| UNKNOWN for the rtx code in this entry. This replaces make_new_qty. |
| |
| For a constant, a unique hash entry would be made based on the |
| value of the constant. |
| |
| What about MEM? Each time a memory address is referenced, we need a |
| qty (a hash table elt) to represent what is in it. (Just as for a |
| register.) If this isn't known, create one, just as for a reg whose |
| contents are unknown. |
| |
| We need a way to find all mem refs that still contain a certain value. |
| Do this with a chain of hash elts (for memory addresses) that point to |
| locations that hold the value. The hash elt for the value itself should |
| point to the start of the chain. It would be good for the hash elt |
| for an address to point to the hash elt for the contents of that address |
| (but this ptr can be null if the contents have never been entered). |
| |
| With this data structure, nothing need ever be invalidated except |
| the lists of which regs or mems hold a particular value. It is easy |
| to see if there is a reg or mem that is equiv to a particular value. |
| If the value is constant, it is always explicitly constant. |
| |
| * Support more general tail-recursion among different functions. |
| |
| This might be possible under certain circumstances, such as when |
| the argument lists of the functions have the same lengths. |
| Perhaps it could be done with a special declaration. |
| |
| You would need to verify in the calling function that it does not |
| use the addresses of any local variables and does not use setjmp. |
| |
| * Put short statics vars at low addresses and use short addressing mode? |
| |
| Useful on the 68000/68020 and perhaps on the 32000 series, |
| provided one has a linker that works with the feature. |
| This is said to make a 15% speedup on the 68000. |
| |
| * Keep global variables in registers. |
| |
| Here is a scheme for doing this. A global variable, or a local variable |
| whose address is taken, can be kept in a register for an entire function |
| if it does not use non-constant memory addresses and (for globals only) |
| does not call other functions. If the entire function does not meet |
| this criterion, a loop may. |
| |
| The VAR_DECL for such a variable would have to have two RTL expressions: |
| the true home in memory, and the pseudo-register used temporarily. |
| It is necessary to emit insns to copy the memory location into the |
| pseudo-register at the beginning of the function or loop, and perhaps |
| back out at the end. These insns should have REG_EQUIV notes so that, |
| if the pseudo-register does not get a hard register, it is spilled into |
| the memory location which exists in any case. |
| |
| The easiest way to set up these insns is to modify the routine |
| put_var_into_stack so that it does not apply to the entire function |
| (sparing any loops which contain nothing dangerous) and to call it at |
| the end of the function regardless of where in the function the |
| address of a local variable is taken. It would be called |
| unconditionally at the end of the function for all relevant global |
| variables. |
| |
| For debugger output, the thing to do is to invent a new binding level |
| around the appropriate loop and define the variable name as a register |
| variable with that scope. |
| |
| * Live-range splitting. |
| |
| Currently a variable is allocated a hard register either for the full |
| extent of its use or not at all. Sometimes it would be good to |
| allocate a variable a hard register for just part of a function; for |
| example, through a particular loop where the variable is mostly used, |
| or outside of a particular loop where the variable is not used. (The |
| latter is nice because it might let the variable be in a register most |
| of the time even though the loop needs all the registers.) |
| |
| Contact meissner@cygnus.com before starting any work on live range |
| splitting. |
| |
| * Detect dead stores into memory? |
| |
| A store into memory is dead if it is followed by another store into |
| the same location; and, in between, there is no reference to anything |
| that might be that location (including no reference to a variable |
| address). |
| |
| * Loop optimization. |
| |
| Strength reduction and iteration variable elimination could be |
| smarter. They should know how to decide which iteration variables are |
| not worth making explicit because they can be computed as part of an |
| address calculation. Based on this information, they should decide |
| when it is desirable to eliminate one iteration variable and create |
| another in its place. |
| |
| It should be possible to compute what the value of an iteration |
| variable will be at the end of the loop, and eliminate the variable |
| within the loop by computing that value at the loop end. |
| |
| When a loop has a simple increment that adds 1, |
| instead of jumping in after the increment, |
| decrement the loop count and jump to the increment. |
| This allows aob insns to be used. |
| |
| * Using constraints on values. |
| |
| Many operations could be simplified based on knowledge of the |
| minimum and maximum possible values of a register at any particular time. |
| These limits could come from the data types in the tree, via rtl generation, |
| or they can be deduced from operations that are performed. For example, |
| the result of an `and' operation one of whose operands is 7 must be in |
| the range 0 to 7. Compare instructions also tell something about the |
| possible values of the operand, in the code beyond the test. |
| |
| Value constraints can be used to determine the results of a further |
| comparison. They can also indicate that certain `and' operations are |
| redundant. Constraints might permit a decrement and branch |
| instruction that checks zeroness to be used when the user has |
| specified to exit if negative. |
| |
| * Smarter reload pass. |
| |
| The reload pass as currently written can reload values only into registers |
| that are reserved for reloading. This means that in order to use a |
| register for reloading it must spill everything out of that register. |
| |
| It would be straightforward, though complicated, for reload1.c to keep |
| track, during its scan, of which hard registers were available at each |
| point in the function, and use for reloading even registers that were |
| free only at the point they were needed. This would avoid much spilling |
| and make better code. |
| |
| * Change the type of a variable. |
| |
| Sometimes a variable is declared as `int', it is assigned only once |
| from a value of type `char', and then it is used only by comparison |
| against constants. On many machines, better code would result if |
| the variable had type `char'. If the compiler could detect this |
| case, it could change the declaration of the variable and change |
| all the places that use it. |
| |
| * Better handling for very sparse switches. |
| |
| There may be cases where it would be better to compile a switch |
| statement to use a fixed hash table rather than the current |
| combination of jump tables and binary search. |
| |
| * Order of subexpressions. |
| |
| It might be possible to make better code by paying attention |
| to the order in which to generate code for subexpressions of an expression. |
| |
| * More code motion. |
| |
| Consider hoisting common code up past conditional branches or |
| tablejumps. |
| |
| * Trace scheduling. |
| |
| This technique is said to be able to figure out which way a jump |
| will usually go, and rearrange the code to make that path the |
| faster one. |
| |
| * Distributive law. |
| |
| The C expression *(X + 4 * (Y + C)) compiles better on certain |
| machines if rewritten as *(X + 4*C + 4*Y) because of known addressing |
| modes. It may be tricky to determine when, and for which machines, to |
| use each alternative. |
| |
| Some work has been done on this, in combine.c. |
| |
| * Can optimize by changing if (x) y; else z; into z; if (x) y; |
| if z and x do not interfere and z has no effects not undone by y. |
| This is desirable if z is faster than jumping. |
| |
| * For a two-insn loop on the 68020, such as |
| foo: movb a2@+,a3@+ |
| jne foo |
| it is better to insert dbeq d0,foo before the jne. |
| d0 can be a junk register. The challenge is to fit this into |
| a portable framework: when can you detect this situation and |
| still be able to allocate a junk register? |
| |
| 2. Simpler porting. |
| |
| Right now, describing the target machine's instructions is done |
| cleanly, but describing its addressing mode is done with several |
| ad-hoc macro definitions. Porting would be much easier if there were |
| an RTL description for addressing modes like that for instructions. |
| Tools analogous to genflags and genrecog would generate macros from |
| this description. |
| |
| There would be one pattern in the address-description file for each |
| kind of addressing, and this pattern would have: |
| |
| * the RTL expression for the address |
| * C code to verify its validity (since that may depend on |
| the exact data). |
| * C code to print the address in assembler language. |
| * C code to convert the address into a valid one, if it is not valid. |
| (This would replace LEGITIMIZE_ADDRESS). |
| * Register constraints for all indeterminates that appear |
| in the RTL expression. |
| |
| 3. Other languages. |
| |
| Front ends for Pascal, Fortran, Algol, Cobol, Modula-2 and Ada are |
| desirable. |
| |
| Pascal, Modula-2 and Ada require the implementation of functions |
| within functions. Some of the mechanisms for this already exist. |
| |
| 4. More extensions. |
| |
| * Generated unique labels. Have some way of generating distinct labels |
| for use in extended asm statements. I don't know what a good syntax would |
| be. |
| |
| * A way of defining a structure containing a union, in which the choice of |
| union alternative is controlled by a previous structure component. |
| |
| Here is a possible syntax for this. |
| |
| struct foo { |
| enum { INT, DOUBLE } code; |
| auto union { case INT: int i; case DOUBLE: double d;} value : code; |
| }; |
| |
| * Allow constructor expressions as lvalues, like this: |
| |
| (struct foo) {a, b, c} = foo(); |
| |
| This would call foo, which returns a structure, and then store the |
| several components of the structure into the variables a, b, and c. |
| |
| 5. Generalize the machine model. |
| |
| * Some new compiler features may be needed to do a good job on machines |
| where static data needs to be addressed using base registers. |
| |
| * Some machines have two stacks in different areas of memory, one used |
| for scalars and another for large objects. The compiler does not |
| now have a way to understand this. |
| |
| 6. Useful warnings. |
| |
| * Warn about statements that are undefined because the order of |
| evaluation of increment operators makes a big difference. Here is an |
| example: |
| |
| *foo++ = hack (*foo); |
| |
| 7. Better documentation of how GCC works and how to port it. |
| |
| Here is an outline proposed by Allan Adler. |
| |
| I. Overview of this document |
| II. The machines on which GCC is implemented |
| A. Prose description of those characteristics of target machines and |
| their operating systems which are pertinent to the implementation |
| of GCC. |
| i. target machine characteristics |
| ii. comparison of this system of machine characteristics with |
| other systems of machine specification currently in use |
| B. Tables of the characteristics of the target machines on which |
| GCC is implemented. |
| C. A priori restrictions on the values of characteristics of target |
| machines, with special reference to those parts of the source code |
| which entail those restrictions |
| i. restrictions on individual characteristics |
| ii. restrictions involving relations between various characteristics |
| D. The use of GCC as a cross-compiler |
| i. cross-compilation to existing machines |
| ii. cross-compilation to non-existent machines |
| E. Assumptions which are made regarding the target machine |
| i. assumptions regarding the architecture of the target machine |
| ii. assumptions regarding the operating system of the target machine |
| iii. assumptions regarding software resident on the target machine |
| iv. where in the source code these assumptions are in effect made |
| III. A systematic approach to writing the files tm.h and xm.h |
| A. Macros which require special care or skill |
| B. Examples, with special reference to the underlying reasoning |
| IV. A systematic approach to writing the machine description file md |
| A. Minimal viable sets of insn descriptions |
| B. Examples, with special reference to the underlying reasoning |
| V. Uses of the file aux-output.c |
| VI. Specification of what constitutes correct performance of an |
| implementation of GCC |
| A. The components of GCC |
| B. The itinerary of a C program through GCC |
| C. A system of benchmark programs |
| D. What your RTL and assembler should look like with these benchmarks |
| E. Fine tuning for speed and size of compiled code |
| VII. A systematic procedure for debugging an implementation of GCC |
| A. Use of GDB |
| i. the macros in the file .gdbinit for GCC |
| ii. obstacles to the use of GDB |
| a. functions implemented as macros can't be called in GDB |
| B. Debugging without GDB |
| i. How to turn off the normal operation of GCC and access specific |
| parts of GCC |
| C. Debugging tools |
| D. Debugging the parser |
| i. how machine macros and insn definitions affect the parser |
| E. Debugging the recognizer |
| i. how machine macros and insn definitions affect the recognizer |
| |
| ditto for other components |
| |
| VIII. Data types used by GCC, with special reference to restrictions not |
| specified in the formal definition of the data type |
| IX. References to the literature for the algorithms used in GCC |
| |