What Is a Negative Float?
Floating point number means that the position of the decimal point of a number is not fixed, but can be floated. The floating-point standard, also known as the IEEE Binary Floating-point Arithmetic Standard (IEEE 754), is the most widely used floating-point arithmetic standard since the 1980s, and is used by many CPUs and floating-point arithmetic operators. This standard defines the format for representing floating-point numbers (including negative zero-0) and denormal numbers), some special values (infinity and non-numerical values (NaN)), and the "floating-point operator for these values" "; It also specifies four numerical rounding rules and five exceptions (including when and how exceptions occur).
- In the development of computer systems, several methods have been proposed to represent real numbers, but by far the most widely used is the floating-point representation. Compared with fixed-point numbers, floating-point numbers use exponents to make the position of the decimal point float up and down as needed, so that a larger range of real numbers can be flexibly expressed. Floating-point representation uses scientific notation to represent real numbers [1]
- A floating-point number (Value) can actually be expressed like this:
- That is, the actual value of the floating point number is equal to the sign bit (multiplied by the exponent bias) and then multiplied by the fraction (fraction).
- Following is the IEEE 754 description of the floating point format.
- Bit convention
- The W bits of data are encoded from 0 to W1 from the low end to the high end of the memory address. The low-end ratio of the memory address is usually on the far right, called the Least Significant Bit (LSB), which represents the smallest bit, and the bit that has the least impact on the overall value when changed. The need to declare this is that the X86 architecture is a little-endian data store. For decimal integer N, it is expressed as N10 if necessary with the binary number
- Floating-point numbers can basically be compared in a dictionary in the order of sign bit, exponent field, and mantissa field. Obviously, all positive numbers are greater than negative numbers; when the sign is the same, the binary representation of the index is larger and its floating-point value is larger.
- Rounding of floating point numbers
- The result of any significant number operation is usually stored in a long register. When the result is returned to floating point format, the extra bits must be discarded. There are multiple ways to run rounding jobs. In fact, the IEEE standard lists 4 different methods:
- Round to the nearest: round to the nearest, and even numbers will take precedence in the same case (Ties To Even, which is the default rounding method): the result will be rounded to the nearest and representable value, but when When there are two numbers that are equally close, then take the even number (the one that ends in 0 in binary Chinese expressions).
- Round towards + : Rounds the result toward positive infinity.
- Round towards -: Rounds the result toward negative infinity.
- Round towards 0: rounds the result towards 0.
- Operations and functions on floating-point numbers
- The following functions must be provided:
- Add, subtract, multiply, divide (Add, subtract, multiply, slide). Negative zero is equal to zero in addition and subtraction:
- Square root:
- Round to the nearest integer {\ displaystyle round (x)}. If it happens to be between two adjacent integers, it is rounded to an even number.
- Comparison operation. -Inf <negative reduced floating point number <negative non reduced floating point number <-0.0 = 0.0 <positive non reduced floating point number <positive reduced floating point number <Inf;
- Special comparison: -Inf = -Inf, Inf = Inf, the comparison result of NaN with any floating point number (including itself) is false, that is (NaN x) = false.
- Suggested functions and predicates
- copysign (x, y): The value returned by copysign (x, y) consists of the unsigned part of x and the sign of y. So abs (x) is equal to copysign (x, 1.0). Copysign can operate correctly on NaN, which is one of the few functions that can operate on NaN like ordinary arithmetic. C99 adds the copysign function.
- x: Reverses the sign of x from the meaning. When x is ± 0 or NaN, its meaning may differ from 0-x.
- scalb (y, N): Calculate y × 2N (N is an integer), there is no need to calculate 2N. The corresponding function name in C99 is scalbn.
- logb (x): Calculates n. C99 in x = 1.a × 2n (x 0, a [0, 1)). Added logb and ilogb functions.
- nextafter (x, y): find the expressible floating point number closest to x along the y direction. For example, nextafter (0, 1) gives the smallest expressible positive number. C99 adds the nextafter function.
- finite (x): Determines whether x is finite, that is, -Inf <x <Inf. C99 adds the isfinite function.
- isnan (x): Determines whether x is a NaN, which is equivalent to "x x". C99 adds the isnan function.
- x <> y: True only if x <y or x> y, its meaning is NOT (x = y). Note that this is different from "x y".
- unordered (x, y): True when x and y cannot be compared, for example, x or y is a NaN. The corresponding function name in C99 is isunordered.
- class (x): Classes of floating-point numbers that distinguish x: signal NaN, silent NaN, -Inf, negative statistic floating-point number, negative non-regular floating-point number, -0.0, 0.0, positive non-regular floating-point number, positive stat Floating point number, Inf.