b15b453f5f
Signed-off-by: David Rotermund <54365609+davrot@users.noreply.github.com> |
||
---|---|---|
.. | ||
image0.png | ||
README.md |
Python -- Type annotations and static type checking for Python
Goal
We want to use static type checking and type annotations in our Python code for detecting errors we made. We will use the mypy extension in VS code for that.
a: int = 0
b: float = 0.0
a = b Incompatible types in assignment (expression has type "float", variable has type "int")
Questions to David Rotermund
Why Type hints?
Why we got type hints according PEP 484 -- Type Hints (29-Sep-2014):
This PEP aims to provide a standard syntax for type annotations, opening up Python code to easier static analysis and refactoring, potential runtime type checking, and (perhaps, in some contexts) code generation utilizing type information.
Of these goals, static analysis is the most important. This includes support for off-line type checkers such as mypy, as well as providing a standard notation that can be used by IDEs for code completion and refactoring.
[...]
It should also be emphasized that Python will remain a dynamically typed language, and the authors have no desire to ever make type hints mandatory, even by convention.
I would redefine this list a bit:
- It is a part of your automatic documentation (like with meaningful variable names). If another person gets your source code they understand it easier.
- You editor might thank you. Do to some new features in Python 3.10, the modern editors that do syntax highlighting and error checking have a harder time to infer what you mean. The more it need to think about what you mean, the slower your editor might get or even fail to show you syntax highlighting.
- Static code analysis is really helpful. It showed me any problems ahead that I would have figured out the hard way otherwise.
- Packages like the just-in-time compiler numba can produce better results if you can tell it what the variables are.
How do we do it?
Variables are assigned to a type the first time when used or can be defined even before use:
a: int
b: int = 0
You are allowed to connect a variable once and only once to a type. If you assign a type a second time to a variable then you will get an error and have to remove the second assignment.
For functions it looks a bit different because we have to handle the type of the return value with the -> construct:
def this_is_a_function() -> None:
pass
def this_is_a_function() -> int:
return 5
def this_is_a_function(a: int) -> int:
return a
def this_is_a_function(a: int, b: int = 8) -> int:
return a + b
def this_is_a_function(a: int, b: int = 8) -> tuple[int, int]:
return a, b
Please note, that there is a difference how type annotations worked for older version. I will cover only Python 3.10 and newer. The official documentation can be found here.
Ignore a line
You can force MyPy to ignore a line if you add the comment # type: ignore to that line.
Please check Common issues and solutions if you run in strange problems.
MyPy (command line)
pip install mypy
If you don't vs code then you can use the mypy command line tool:
mypy test2.py
test2.py:3: error: Incompatible types in assignment (expression has type "float", variable has type "int")
Found 1 error in 1 file (checked 1 source file)
MyPy under VS Code
Go under Extensions and install the Mypy Type Checker from Microsoft.
Built-in types
- If the type starts with an upper letter then you might import it from the typing module like
from typing import Any
- If you have no clue what type something has, well use type():
import numpy as np
import torch
def func() -> None:
return
a = 0
b = np.zeros((10,))
c = torch.zeros((10, 1))
d = func
print(type(a))
print(type(b))
print(type(c))
print(type(d))
Output:
<class 'int'>
<class 'numpy.ndarray'>
<class 'torch.Tensor'>
<class 'function'>
The correct typing would have been:
import numpy as np
import torch
from typing import Callable
def func() -> None:
return
a: int = 0
b: np.ndarray = np.zeros((10,))
c: torch.Tensor = torch.zeros((10, 1))
d: Callable = func
As you can see, we had to change b a bit because we didn't use import numpy but used import numpy as np. Thus we had to use np.ndarray instead of numpy.ndarray.
Concerning <class 'function'>, this is a specical case. And requires an import from the typing module via from typing import Callable. More about that later.
Simple types
Here are examples of some common built-in types:
Type | Description |
---|---|
int | integer |
float | floating point number |
bool | boolean value (subclass of int) |
str | text, sequence of unicode codepoints |
bytes | 8-bit string, sequence of byte values |
object | an arbitrary object (object is the common base class) |
a: int = 0
b: float = 0.0
c: bool = True
d: str = "LaLa"
Any type
Special type indicating an unconstrained type.
from typing import Any
a: Any = 0
b: float = 0.0
a = b
Generic types
Type | Description |
---|---|
list[str] | list of str objects |
tuple[int, int] | tuple of two int objects (tuple[()] is the empty tuple) |
tuple[int, ...] | tuple of an arbitrary number of int objects |
dict[str, int] | dictionary from str keys to int values |
Iterable[int] | iterable object containing ints |
Sequence[bool] | sequence of booleans (read-only) |
Mapping[str, int] | mapping from str keys to int values (read-only) |
type[C] | type object of C (C is a class/type variable/union of types) |
Examples:
la: list = ["a", 1, 3.3]
ta: tuple = ("a", 1, 3.3)
tb: tuple[str, int, float] = ("a", 1, 3.3)
Wrong:
la: list[str, int, float] = ["a", 1, 3.3]
Correct:
la: list[str | int | float] = ["a", 1, 3.3]
|
In the case you expect a variable that can have differnt types over it's lifetime. Let us say you initialize it with None and later want to store integer in it:
a: None | int = None
An other example is this:
import torch
import numpy as np
a: np.ndarray | torch.Tensor = torch.zeros((100,))
This is called a Union. The Union with None is called Optional. But nowadays you just need to remember |.
In the real world I encountered this problem:
a: int | None = 1
b: int
b = a # Incompatible types in assignment (expression has type "int | None", variable has type "int")
The solution is to use assert:
a: int | None = 1
b: int
assert a is not None
b = a
TypeAlias
You can create an alias for more complicated types
from typing import TypeAlias
Numbis: TypeAlias = int | float
a: Numbis
a = 1
a = 1.1
a = "Hello" # Incompatible types in assignment (expression has type "str", variable has type "int | float")
Tuple
a: tuple[int, str, int] = (5, "Hello", 6)
a = (
"Hello",
4,
4,
) # Incompatible types in assignment (expression has type "Tuple[str, int, int]", variable has type "Tuple[int, str, int]")
Or if you don't care about what is in the tuple
a: tuple = (5, "Hello", 6)
a = ("Hello", 4, 4)
List
A generic list:
mylist: list = []
mylist.append(1)
mylist.append(2)
mylist.append("Hello")
Or defining more details about the list:
mylist: list[int] = []
mylist.append(1)
mylist.append(2)
mylist.append(
"Hello"
) # Argument 1 to "append" of "list" has incompatible type "str"; expected "int"
print(mylist) # -> [1, 2, 'Hello']
Dict
Generic dictionary:
mydict: dict = {"A": 1, "B": 3.14, "C": "Hello"}
We can give it more information. However, we have to be careful to include the types correctly.
This is wrong:
mydict_a: dict[str, int] = {"A": 1, "B": 3.14, "C": "Hello", 1: 1}
We get these errors:
Dict entry 1 has incompatible type "str": "float"; expected "str": "int"
Dict entry 2 has incompatible type "str": "str"; expected "str": "int"
Dict entry 3 has incompatible type "int": "int"; expected "str": "int"
These are correct ways to handle it:
mydict_a: dict[str | int, str | int | float] = {"A": 1, "B": 3.14, "C": "Hello", 1: 1}
mydict_b: dict[str, str | int | float] = {"A": 1, "B": 3.14, "C": "Hello", "1": 1}
Numpy
Generic:
import numpy as np
a: np.ndarray = np.zeros((10, 1))
Protecting against a wrong dtype:
import numpy as np
from typing import Any
a: np.ndarray[Any, np.dtype[np.uint64]]
a = np.zeros((10, 1), dtype=np.uint64)
a = np.zeros((10, 1)) # -> Incompatible types in assignment (expression has type "ndarray[Any, dtype[floating[_64Bit]]]", variable has type "ndarray[Any, dtype[unsignedinteger[_64Bit]]]")
PyTorch
Please note the big T!
import torch
a: torch.Tensor = torch.zeros((10, 1))
Callable
Callable means "function". As you know, you can shove functions objects around and also use then as function arguments of other function. It is helpful to make sure that the function you get has the properties you expect.
Callable[[Arg1Type, Arg2Type], ReturnType]
from typing import Callable
def function_a(x: int) -> int:
return x + 1
def function_a_bad(x: int, y: int) -> int:
return x + y
def function_b(x, other_function: Callable[[int], int]) -> int:
return other_function(x) ** 2
print(function_b(1, function_a)) # -> 4
print(function_b(1, function_b)) # -> Argument 2 to "function_b" has incompatible type "Callable[[Any, Callable[[int], int]], int]"; expected "Callable[[int], int]"
NewType, Generics, User-defined generic types
{: .topic-optional} This is an optional topic!
Well... this exists... never used it.
NewType | Use the NewType helper to create distinct types |
Generics | Since type information about objects kept in containers cannot be statically inferred in a generic way, many container classes in the standard library support subscription to denote the expected types of container elements. |
User-defined generic types | A user-defined class can be defined as a generic class. |