| This directory contains a mechanism for GCC to have its own internal |
| implementation of wcwidth functionality. (cpp_wcwidth () in libcpp/charset.c). |
| |
| The idea is to produce the necessary lookup table |
| (../../libcpp/generated_cpp_wcwidth.h) in a reproducible way, starting from the |
| following files that are distributed by the Unicode Consortium: |
| |
| ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt |
| ftp://ftp.unicode.org/Public/UNIDATA/EastAsianWidth.txt |
| ftp://ftp.unicode.org/Public/UNIDATA/PropList.txt |
| |
| These three files have been added to source control in this directory; |
| please see unicode-license.txt for the relevant copyright information. |
| |
| In order to keep in sync with glibc's wcwidth as much as possible, it is |
| desirable for the logic that processes the Unicode data to be the same as |
| glibc's. To that end, we also put in this directory, in the from_glibc/ |
| directory, the glibc python code that implements their logic. This code was |
| copied verbatim from glibc, and it can be updated at any time from the glibc |
| source code repository. The files copied from that respository are: |
| |
| localedata/unicode-gen/unicode_utils.py |
| localedata/unicode-gen/utf8_gen.py |
| |
| And the most recent versions added to GCC are from glibc git commit: |
| f6032247061fb37d59565f2e9667e242c8a98e76 |
| |
| Finally, the script gen_wcwidth.py found here contains the GCC-specific code to |
| map glibc's output to the lookup tables we require. This script should not need |
| to change, unless there are structural changes to the Unicode data files or to |
| the glibc code. |
| |
| The procedure to update GCC's wcwidth tables is the following: |
| |
| 1. Update the three Unicode data files from the above URLs. |
| |
| 2. Update the two glibc files in from_glibc/ from glibc's git. Update |
| the commit number above in this README. |
| |
| 3. Run ./gen_wcwidth.py X.Y > ../../libcpp/generated_cpp_wcwidth.h |
| (where X.Y is the version of the Unicode standard corresponding to the |
| Unicode data files being used, most recently, 13.0.0). |
| |
| After that, GCC's wcwidth will match the most recent glibc. |