| ## fast_float number parsing library: 4x faster than strtod |
| |
| The fast_float library provides fast header-only implementations for the C++ from_chars |
| functions for `float` and `double` types. These functions convert ASCII strings representing |
| decimal values (e.g., `1.3e10`) into binary types. We provide exact rounding (including |
| round to even). In our experience, these `fast_float` functions many times faster than comparable number-parsing functions from existing C++ standard libraries. |
| |
| Specifically, `fast_float` provides the following two functions with a C++17-like syntax (the library itself only requires C++11): |
| |
| ```C++ |
| from_chars_result from_chars(const char* first, const char* last, float& value, ...); |
| from_chars_result from_chars(const char* first, const char* last, double& value, ...); |
| ``` |
| |
| The return type (`from_chars_result`) is defined as the struct: |
| ```C++ |
| struct from_chars_result { |
| const char* ptr; |
| std::errc ec; |
| }; |
| ``` |
| |
| It parses the character sequence [first,last) for a number. It parses floating-point numbers expecting |
| a locale-independent format equivalent to the C++17 from_chars function. |
| The resulting floating-point value is the closest floating-point values (using either float or double), |
| using the "round to even" convention for values that would otherwise fall right in-between two values. |
| That is, we provide exact parsing according to the IEEE standard. |
| |
| |
| Given a successful parse, the pointer (`ptr`) in the returned value is set to point right after the |
| parsed number, and the `value` referenced is set to the parsed value. In case of error, the returned |
| `ec` contains a representative error, otherwise the default (`std::errc()`) value is stored. |
| |
| The implementation does not throw and does not allocate memory (e.g., with `new` or `malloc`). |
| |
| It will parse infinity and nan values. |
| |
| Example: |
| |
| ``` C++ |
| #include "fast_float/fast_float.h" |
| #include <iostream> |
| |
| int main() { |
| const std::string input = "3.1416 xyz "; |
| double result; |
| auto answer = fast_float::from_chars(input.data(), input.data()+input.size(), result); |
| if(answer.ec != std::errc()) { std::cerr << "parsing failure\n"; return EXIT_FAILURE; } |
| std::cout << "parsed the number " << result << std::endl; |
| return EXIT_SUCCESS; |
| } |
| ``` |
| |
| |
| Like the C++17 standard, the `fast_float::from_chars` functions take an optional last argument of |
| the type `fast_float::chars_format`. It is a bitset value: we check whether |
| `fmt & fast_float::chars_format::fixed` and `fmt & fast_float::chars_format::scientific` are set |
| to determine whether we allow the fixed point and scientific notation respectively. |
| The default is `fast_float::chars_format::general` which allows both `fixed` and `scientific`. |
| |
| The library seeks to follow the C++17 (see [20.19.3](http://eel.is/c++draft/charconv.from.chars).(7.1)) specification. |
| * The `from_chars` function does not skip leading white-space characters. |
| * [A leading `+` sign](https://en.cppreference.com/w/cpp/utility/from_chars) is forbidden. |
| * It is generally impossible to represent a decimal value exactly as binary floating-point number (`float` and `double` types). We seek the nearest value. We round to an even mantissa when we are in-between two binary floating-point numbers. |
| |
| Furthermore, we have the following restrictions: |
| * We only support `float` and `double` types at this time. |
| * We only support the decimal format: we do not support hexadecimal strings. |
| * For values that are either very large or very small (e.g., `1e9999`), we represent it using the infinity or negative infinity value. |
| |
| We support Visual Studio, macOS, Linux, freeBSD. We support big and little endian. We support 32-bit and 64-bit systems. |
| |
| We assume that the rounding mode is set to nearest (`std::fegetround() == FE_TONEAREST`). |
| |
| ## Using commas as decimal separator |
| |
| |
| The C++ standard stipulate that `from_chars` has to be locale-independent. In |
| particular, the decimal separator has to be the period (`.`). However, |
| some users still want to use the `fast_float` library with in a locale-dependent |
| manner. Using a separate function called `from_chars_advanced`, we allow the users |
| to pass a `parse_options` instance which contains a custom decimal separator (e.g., |
| the comma). You may use it as follows. |
| |
| ```C++ |
| #include "fast_float/fast_float.h" |
| #include <iostream> |
| |
| int main() { |
| const std::string input = "3,1416 xyz "; |
| double result; |
| fast_float::parse_options options{fast_float::chars_format::general, ','}; |
| auto answer = fast_float::from_chars_advanced(input.data(), input.data()+input.size(), result, options); |
| if((answer.ec != std::errc()) || ((result != 3.1416))) { std::cerr << "parsing failure\n"; return EXIT_FAILURE; } |
| std::cout << "parsed the number " << result << std::endl; |
| return EXIT_SUCCESS; |
| } |
| ``` |
| |
| You can parse delimited numbers: |
| ```C++ |
| const std::string input = "234532.3426362,7869234.9823,324562.645"; |
| double result; |
| auto answer = fast_float::from_chars(input.data(), input.data()+input.size(), result); |
| if(answer.ec != std::errc()) { |
| // check error |
| } |
| // we have result == 234532.3426362. |
| if(answer.ptr[0] != ',') { |
| // unexpected delimiter |
| } |
| answer = fast_float::from_chars(answer.ptr + 1, input.data()+input.size(), result); |
| if(answer.ec != std::errc()) { |
| // check error |
| } |
| // we have result == 7869234.9823. |
| if(answer.ptr[0] != ',') { |
| // unexpected delimiter |
| } |
| answer = fast_float::from_chars(answer.ptr + 1, input.data()+input.size(), result); |
| if(answer.ec != std::errc()) { |
| // check error |
| } |
| // we have result == 324562.645. |
| ``` |
| |
| ## Reference |
| |
| - Daniel Lemire, [Number Parsing at a Gigabyte per Second](https://arxiv.org/abs/2101.11408), Software: Practice and Experience 51 (8), 2021. |
| |
| ## Other programming languages |
| |
| - [There is an R binding](https://github.com/eddelbuettel/rcppfastfloat) called `rcppfastfloat`. |
| - [There is a Rust port of the fast_float library](https://github.com/aldanor/fast-float-rust/) called `fast-float-rust`. |
| - [There is a Java port of the fast_float library](https://github.com/wrandelshofer/FastDoubleParser) called `FastDoubleParser`. It used for important systems such as [Jackson](https://github.com/FasterXML/jackson-core). |
| - [There is a C# port of the fast_float library](https://github.com/CarlVerret/csFastFloat) called `csFastFloat`. |
| |
| |
| ## Relation With Other Work |
| |
| The fast_float library is part of GCC (as of version 12): the `from_chars` function in GCC relies on fast_float. |
| |
| The fastfloat algorithm is part of the [LLVM standard libraries](https://github.com/llvm/llvm-project/commit/87c016078ad72c46505461e4ff8bfa04819fe7ba). |
| |
| The fast_float library provides a performance similar to that of the [fast_double_parser](https://github.com/lemire/fast_double_parser) library but using an updated algorithm reworked from the ground up, and while offering an API more in line with the expectations of C++ programmers. The fast_double_parser library is part of the [Microsoft LightGBM machine-learning framework](https://github.com/microsoft/LightGBM). |
| |
| There is a [derived implementation part of AdaCore](https://github.com/AdaCore/VSS). |
| |
| ## Users |
| |
| The fast_float library is used by [Apache Arrow](https://github.com/apache/arrow/pull/8494) where it multiplied the number parsing speed by two or three times. It is also used by [Yandex ClickHouse](https://github.com/ClickHouse/ClickHouse) and by [Google Jsonnet](https://github.com/google/jsonnet). |
| |
| |
| ## How fast is it? |
| |
| It can parse random floating-point numbers at a speed of 1 GB/s on some systems. We find that it is often twice as fast as the best available competitor, and many times faster than many standard-library implementations. |
| |
| <img src="http://lemire.me/blog/wp-content/uploads/2020/11/fastfloat_speed.png" width="400"> |
| |
| ``` |
| $ ./build/benchmarks/benchmark |
| # parsing random integers in the range [0,1) |
| volume = 2.09808 MB |
| netlib : 271.18 MB/s (+/- 1.2 %) 12.93 Mfloat/s |
| doubleconversion : 225.35 MB/s (+/- 1.2 %) 10.74 Mfloat/s |
| strtod : 190.94 MB/s (+/- 1.6 %) 9.10 Mfloat/s |
| abseil : 430.45 MB/s (+/- 2.2 %) 20.52 Mfloat/s |
| fastfloat : 1042.38 MB/s (+/- 9.9 %) 49.68 Mfloat/s |
| ``` |
| |
| See https://github.com/lemire/simple_fastfloat_benchmark for our benchmarking code. |
| |
| |
| ## Video |
| |
| [](http://www.youtube.com/watch?v=AVXgvlMeIm4)<br /> |
| |
| ## Using as a CMake dependency |
| |
| This library is header-only by design. The CMake file provides the `fast_float` target |
| which is merely a pointer to the `include` directory. |
| |
| If you drop the `fast_float` repository in your CMake project, you should be able to use |
| it in this manner: |
| |
| ```cmake |
| add_subdirectory(fast_float) |
| target_link_libraries(myprogram PUBLIC fast_float) |
| ``` |
| |
| Or you may want to retrieve the dependency automatically if you have a sufficiently recent version of CMake (3.11 or better at least): |
| |
| ```cmake |
| FetchContent_Declare( |
| fast_float |
| GIT_REPOSITORY https://github.com/lemire/fast_float.git |
| GIT_TAG tags/v1.1.2 |
| GIT_SHALLOW TRUE) |
| |
| FetchContent_MakeAvailable(fast_float) |
| target_link_libraries(myprogram PUBLIC fast_float) |
| |
| ``` |
| |
| You should change the `GIT_TAG` line so that you recover the version you wish to use. |
| |
| ## Using as single header |
| |
| The script `script/amalgamate.py` may be used to generate a single header |
| version of the library if so desired. |
| Just run the script from the root directory of this repository. |
| You can customize the license type and output file if desired as described in |
| the command line help. |
| |
| You may directly download automatically generated single-header files: |
| |
| https://github.com/fastfloat/fast_float/releases/download/v3.4.0/fast_float.h |
| |
| ## Credit |
| |
| Though this work is inspired by many different people, this work benefited especially from exchanges with |
| Michael Eisel, who motivated the original research with his key insights, and with Nigel Tao who provided |
| invaluable feedback. Rémy Oudompheng first implemented a fast path we use in the case of long digits. |
| |
| The library includes code adapted from Google Wuffs (written by Nigel Tao) which was originally published |
| under the Apache 2.0 license. |
| |
| ## License |
| |
| <sup> |
| Licensed under either of <a href="LICENSE-APACHE">Apache License, Version |
| 2.0</a> or <a href="LICENSE-MIT">MIT license</a> at your option. |
| </sup> |
| |
| <br> |
| |
| <sub> |
| Unless you explicitly state otherwise, any contribution intentionally submitted |
| for inclusion in this repository by you, as defined in the Apache-2.0 license, |
| shall be dual licensed as above, without any additional terms or conditions. |
| </sub> |