雙精度浮點數

雙精度浮點數（英語：Double-precision floating-point）是计算机使用的一種資料型別。比起單精度浮點數僅有 32 位元（4字节），雙精度浮點數使用 64 位元（8字节）來儲存一個浮點數^[1]。它可以表示二進位制的53位有效數字，其可以表示的数字的绝对值范围为 $[2^{-1024},2^{1024}]$ 。

格式

sign bit（符號）：用來表示正負號
exponent（指數）：用來表示次方數
mantissa（尾數）：用來表示精確度

符号

0代表數值為正，1代表數值為負。

指數

共有11個位元，使用「偏移表示法（英语：Exponent bias）」，有2個例外分別為

「11個位元皆為0」
「11個位元皆為1」

並且以1023為偏移標準，表示實際指數為0，因此指數範圍為 -1022 到 +1023：

指數 000₁₆ 和 7ff₁₆ 具有特殊意義：

00000000000₂ = 000₁₆當尾數為0時為±0，尾數不為0時為非正規形式的浮點數。

11111111111₂ = 7ff₁₆當尾數為0時為∞，尾數不為0時為NaN。

尾數

在二進位的「科學記號」，數字被表示為：

${\text{1.mantissa}}\times {\text{2}}^{\text{exponent}}$

二進位的「科學記號」（a×2ⁿ）的a的範圍是大於等於1而小於2，例如：

二進位制的 ${\text{11.101}}\times {\text{2}}^{\text{1001}}$ 可以規格化為 ${\text{1.1101}}\times {\text{2}}^{\text{1010}}$ ，儲存時尾数只需要儲存1101即可。
二進位制的 ${\text{0.00110011}}\times {\text{2}}^{-1001}$ 可以規格化為 ${\text{1.10011}}\times {\text{2}}^{-1100}$ ，儲存時尾數只需要儲存10011即可。

小結

根據以上的敘述，一個雙精度浮點數所代表的數值為：

$(-1)^{\text{sign}}\times 2^{\text{exponent}}\times 1.{\text{mantissa}}$

例子

0 01111111111 0000000000000000000000000000000000000000000000000000₂ ≙ 3FF0 0000 0000 0000₁₆ ≙ +2⁰ × 1 = 1

0 01111111111 0000000000000000000000000000000000000000000000000001₂ ≙ 3FF0 0000 0000 0001₁₆ ≙ +2⁰ × (1 + 2⁻⁵²) ≈ 1.0000000000000002, the smallest number > 1

0 01111111111 0000000000000000000000000000000000000000000000000010₂ ≙ 3FF0 0000 0000 0002₁₆ ≙ +2⁰ × (1 + 2⁻⁵¹) ≈ 1.0000000000000004

0 10000000000 0000000000000000000000000000000000000000000000000000₂ ≙ 4000 0000 0000 0000₁₆ ≙ +2¹ × 1 = 2

1 10000000000 0000000000000000000000000000000000000000000000000000₂ ≙ C000 0000 0000 0000₁₆ ≙ −2¹ × 1 = −2

0 10000000000 1000000000000000000000000000000000000000000000000000₂ ≙ 4008 0000 0000 0000₁₆ ≙ +2¹ × 1.1₂ = 11₂ = 3

0 10000000001 0000000000000000000000000000000000000000000000000000₂ ≙ 4010 0000 0000 0000₁₆ ≙ +2² × 1 = 100₂ = 4

0 10000000001 0100000000000000000000000000000000000000000000000000₂ ≙ 4014 0000 0000 0000₁₆ ≙ +2² × 1.01₂ = 101₂ = 5

0 10000000001 1000000000000000000000000000000000000000000000000000₂ ≙ 4018 0000 0000 0000₁₆ ≙ +2² × 1.1₂ = 110₂ = 6

0 10000000011 0111000000000000000000000000000000000000000000000000₂ ≙ 4037 0000 0000 0000₁₆ ≙ +2⁴ × 1.0111₂ = 10111₂ = 23

0 01111111000 1000000000000000000000000000000000000000000000000000₂ ≙ 3F88 0000 0000 0000₁₆ ≙ +2⁻⁷ × 1.1₂ = 0.00000011₂ = 0.01171875 (3/256)

0 00000000000 0000000000000000000000000000000000000000000000000001₂ ≙ 0000 0000 0000 0001₁₆ ≙ +2⁻¹⁰²² × 2⁻⁵² = 2⁻¹⁰⁷⁴
≈ 4.9406564584124654 × 10⁻³²⁴ (Min. subnormal positive double)

0 00000000000 1111111111111111111111111111111111111111111111111111₂ ≙ 000F FFFF FFFF FFFF₁₆ ≙ +2⁻¹⁰²² × (1 − 2⁻⁵²)
≈ 2.2250738585072009 × 10⁻³⁰⁸ (Max. subnormal double)

0 00000000001 0000000000000000000000000000000000000000000000000000₂ ≙ 0010 0000 0000 0000₁₆ ≙ +2⁻¹⁰²² × 1
≈ 2.2250738585072014 × 10⁻³⁰⁸ (Min. normal positive double)

0 11111111110 1111111111111111111111111111111111111111111111111111₂ ≙ 7FEF FFFF FFFF FFFF₁₆ ≙ +2¹⁰²³ × (1 + (1 − 2⁻⁵²))
≈ 1.7976931348623157 × 10³⁰⁸ (Max. Double)

0 00000000000 0000000000000000000000000000000000000000000000000000₂ ≙ 0000 0000 0000 0000₁₆ ≙ +0

1 00000000000 0000000000000000000000000000000000000000000000000000₂ ≙ 8000 0000 0000 0000₁₆ ≙ −0

0 11111111111 0000000000000000000000000000000000000000000000000000₂ ≙ 7FF0 0000 0000 0000₁₆ ≙ +∞ (positive infinity)

1 11111111111 0000000000000000000000000000000000000000000000000000₂ ≙ FFF0 0000 0000 0000₁₆ ≙ −∞ (negative infinity)

0 11111111111 0000000000000000000000000000000000000000000000000001₂ ≙ 7FF0 0000 0000 0001₁₆ ≙ NaN (sNaN on most processors, such as x86 and ARM)

0 11111111111 1000000000000000000000000000000000000000000000000001₂ ≙ 7FF8 0000 0000 0001₁₆ ≙ NaN (qNaN on most processors, such as x86 and ARM)

0 11111111111 1111111111111111111111111111111111111111111111111111₂ ≙ 7FFF FFFF FFFF FFFF₁₆ ≙ NaN (an alternative encoding of NaN)

0 01111111101 0101010101010101010101010101010101010101010101010101₂
= 3fd5 5555 5555 5555₁₆ ≙ +2⁻² × (1 + 2⁻² + 2⁻⁴ + ... + 2⁻⁵²)
≈ ¹/₃

0 10000000000 1001001000011111101101010100010001000010110100011000₂
= 4009 21fb 5444 2d18₁₆ ≈ pi

参考文献

^ Stanley B. Lippman, Josée Lajoie, Barbara E. Moo. 《C++ Primer. fifth edition 中文版》. 碁峰資訊. 2020: 第33頁. ISBN 978-986-502-172-6.

參閱

[1] Stanley B. Lippman, Josée Lajoie, Barbara E. Moo. 《C++ Primer. fifth edition 中文版》. 碁峰資訊. 2020: 第33頁. ISBN 978-986-502-172-6.

[1]