From 05373d2adb2f3024a2ccbec9b249ea477407e0af Mon Sep 17 00:00:00 2001 From: David Rotermund <54365609+davrot@users.noreply.github.com> Date: Fri, 1 Dec 2023 15:04:45 +0100 Subject: [PATCH] Update README.md Signed-off-by: David Rotermund <54365609+davrot@users.noreply.github.com> --- matlab/2/README.md | 71 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 71 insertions(+) diff --git a/matlab/2/README.md b/matlab/2/README.md index 7cd0511..31991c8 100644 --- a/matlab/2/README.md +++ b/matlab/2/README.md @@ -45,3 +45,74 @@ Two further examples for 8-bit numbers: 10101010 $=0 \star 2^0+1 \star 2^1+0 \star 2^2+1 \star 2^3 +0 \star 2^4+1 \star 2^5+0 \star 2^6+1 \star 2^7 = 170$ To represent negative numbers, a small trick is necessary: one specific bit codes for the sign of the number. In an $n$-bit number, one would for example reserve the $n$-th it for the sign, and the remaining $n-1$ bits code, as usual, a binary number $z$: If the $n$-th bit is not set, then the result is by default $+z$. If the $n$-th bit is set, the result is $-(2^{n-1})+z$. For illustration, again a small table: + +|dual system| decimal system| +| ------------- |:-------------:| +|01111111| +127| +|01111110| +126| +|$\vdots$| $\vdots$| +|00000010| +2| +|00000001| +1| +|00000000| +0| +|11111111| -1| +|11111110| -2| +|$\vdots$| $\vdots$| +|10000010| -126| +|10000001| -127| +|10000000| -128| + +Certain bit lengths have names: +| | | +| ------------- |:-------------:| +| 1 Byte| 8 Bit| +|1 Word| 16 Bit| +|1 Kilobyte| 1024 Byte| +|1 Megabyte| 1024 Kilobyte| +|1 Gigabyte| 1024 Megabyte| +|1 Terabyte| 1024 Gigabyte| + +## Representation of Real Numbers and Numerical Errors +Real numbers are, by their nature, analogue quantities. Hence we would expect the handling of these numbers on digital computers not to be completely problem-free. Present digital computers usually represent real numbers as floating-point numbers. + +$\mbox{floating-point number} = \mbox{mantissa} \cdot \mbox{basis}^{\mbox{exponent}} $ + +Thereby, the precision, with which the real number can be represented, is determined by the number of available bits."Simple precision" requires 4 Bytes, for "double precision" 8~Bytes are needed. The latter is the default configuration in Matlab. The IEEE format of double precision uses 53~Bits for the mantissa, 11~Bits for the exponent and for the basis the remaining ~2. One Bit of the mantissa respectively the exponent are used for the sign of the quantity. Thus, the exponent can vary between$-1024$ and $+1023$. The mantissa always represents a value in the interval $[1, 2[$ in the IEEE notation. Here, the $52$ Bits are utilized to add up fractions of exponents of 2. The value of the mantissa yields mantissa=$1+\sum_{i=1}^{52} b_i 2^{-i}$, with $b_i=1$ , if the $i$-th bit in the mantissa is set. + +## Range Error +The maximal range of the floating-point numbers is determined by the number of bits used to code for the exponent. A typical number for single precision is + +$2^{\pm 127} \approx 10^{\pm 38}$ + +and for double precision + +$2^{\pm 1023} \approx 10^{\pm 308} $ + +Via application of arithmetic operations on these numbers, the range can be exceeded. The error occurring in that case is named a range error. As an example we consider the Bohr radius in SI units + +$a_0 = \frac{4\pi\varepsilon_0\hbar^2}{m_ee^2}\approx 5.3\times 10^{-11} \mbox{m} $ + +The quantity $\hbar$ is Planck's quantum of action divided by $2\pi$. Bohr's radius is in the range of single precision floating-point numbers. However, the same does not hold for the numerator $4\pi\varepsilon_0\hbar^2 \approx 1.24\cdot 10^{-78}\mbox{KgC}^2\mbox{m}$ and the denominator $m_ee^2 \approx 2.34\times 10^{-68}\mbox{KgC}^2$. I.e. neither the numerator nor the denominator can be represented as a single precision floating-point number. Hence, the calculation of Bohr's radius by the given formula can be problematic. A simple solution of this problem lies in the use of natural units, such as Bohr's radius, for distances, etc. + +An even bigger problem can be illustrated by the calculation of the factorial. The factorial is defined as + +$n! = n\cdot(n-1)\cdot(n-2)\cdot\ldots3\cdot 2\cdot 1 $ + +In Matlab, it can be easily verified by using the function factorial(n), that the factorial for $n>170$ can not be represented, even with double precision numbers. A way out is provided by the use of logarithms, since the logarithm of a bigger number still gives moderately small values, e.g. $\log_{10}(10^{100}) = 100$. It ensues that + +$\ln(n!) = \ln(n) + \ln(n-1) + \ldots + \ln(3) + \ln(2) + \ln(1) $ + +For bigger $n$, the evaluation of this expression is, however, to laborious. If $n>30$, one is advised to use Stirling's formula + +$\ln(n!) = n\ln(n)-n+\frac{1}{2}\ln(2\pi n)+\ln\left(1+\frac{1}{12n}+\frac{1}{288n^2}+\ldots\right) $ + +The factorial $n!$ can than be written as the following + +$n! = \mbox{mantissa}\times 10^{\mbox{exponent}} $ + +To get the mantissa and the exponent, we form the logarithm to the basis 10 (reminder: $\log_{10}(x) = \ln(x)/\ln(10)$) + +$\log_{10}(n!) = \log_{10}(\mbox{mantissa})+{\mbox{exponent}} $ + +We now associate the integer part of $\log_{10}(n!)$ with the exponent. The post-decimal places are associated with the mantissa, i.e. mantissa = $10^a$ with $a = \log_{10}(n!)-\mbox{exponent}$.  + +From these examples we learn that range errors can usually be circumvented with a little creativity.