pytutorial/python_basics/python_typing
David Rotermund b15b453f5f
Update README.md
Signed-off-by: David Rotermund <54365609+davrot@users.noreply.github.com>
2023-12-13 11:51:06 +01:00
..
image0.png Rename python_typing/image0.png to python_basics/python_typing/image0.png 2023-12-13 11:48:23 +01:00
README.md Update README.md 2023-12-13 11:51:06 +01:00

Python -- Type annotations and static type checking for Python

Goal

We want to use static type checking and type annotations in our Python code for detecting errors we made. We will use the mypy extension in VS code for that.

a: int = 0
b: float = 0.0
a = b     Incompatible types in assignment (expression has type "float", variable has type "int")

Questions to David Rotermund

Why Type hints?

Why we got type hints according PEP 484 -- Type Hints (29-Sep-2014):

Rationale and Goals

This PEP aims to provide a standard syntax for type annotations, opening up Python code to easier static analysis and refactoring, potential runtime type checking, and (perhaps, in some contexts) code generation utilizing type information.

Of these goals, static analysis is the most important. This includes support for off-line type checkers such as mypy, as well as providing a standard notation that can be used by IDEs for code completion and refactoring.

[...]

It should also be emphasized that Python will remain a dynamically typed language, and the authors have no desire to ever make type hints mandatory, even by convention.

I would redefine this list a bit:

  • It is a part of your automatic documentation (like with meaningful variable names). If another person gets your source code they understand it easier.
  • You editor might thank you. Do to some new features in Python 3.10, the modern editors that do syntax highlighting and error checking have a harder time to infer what you mean. The more it need to think about what you mean, the slower your editor might get or even fail to show you syntax highlighting.
  • Static code analysis is really helpful. It showed me any problems ahead that I would have figured out the hard way otherwise.
  • Packages like the just-in-time compiler numba can produce better results if you can tell it what the variables are.

How do we do it?

Variables are assigned to a type the first time when used or can be defined even before use:

a: int
b: int = 0

You are allowed to connect a variable once and only once to a type. If you assign a type a second time to a variable then you will get an error and have to remove the second assignment.

For functions it looks a bit different because we have to handle the type of the return value with the -> construct:

def this_is_a_function() -> None:
    pass

def this_is_a_function() -> int:
    return 5

def this_is_a_function(a: int) -> int:
    return a

def this_is_a_function(a: int, b: int = 8) -> int:
    return a + b

def this_is_a_function(a: int, b: int = 8) -> tuple[int, int]:
    return a, b

Please note, that there is a difference how type annotations worked for older version. I will cover only Python 3.10 and newer. The official documentation can be found here.

Ignore a line

You can force MyPy to ignore a line if you add the comment # type: ignore to that line.

Please check Common issues and solutions if you run in strange problems.

MyPy (command line)

pip install mypy

If you don't vs code then you can use the mypy command line tool:

mypy test2.py 
test2.py:3: error: Incompatible types in assignment (expression has type "float", variable has type "int")
Found 1 error in 1 file (checked 1 source file)

MyPy under VS Code

Go under Extensions and install the Mypy Type Checker from Microsoft.

figure 1

Built-in types

  • If the type starts with an upper letter then you might import it from the typing module like
from typing import Any
  • If you have no clue what type something has, well use type():
import numpy as np
import torch


def func() -> None:
    return


a = 0
b = np.zeros((10,))
c = torch.zeros((10, 1))
d = func

print(type(a))
print(type(b))
print(type(c))
print(type(d))

Output:

<class 'int'>
<class 'numpy.ndarray'>
<class 'torch.Tensor'>
<class 'function'>

The correct typing would have been:

import numpy as np
import torch
from typing import Callable


def func() -> None:
    return


a: int = 0
b: np.ndarray = np.zeros((10,))
c: torch.Tensor = torch.zeros((10, 1))
d: Callable = func

As you can see, we had to change b a bit because we didn't use import numpy but used import numpy as np. Thus we had to use np.ndarray instead of numpy.ndarray.

Concerning <class 'function'>, this is a specical case. And requires an import from the typing module via from typing import Callable. More about that later.

Simple types

Here are examples of some common built-in types:

Type Description
int integer
float floating point number
bool boolean value (subclass of int)
str text, sequence of unicode codepoints
bytes 8-bit string, sequence of byte values
object an arbitrary object (object is the common base class)
a: int = 0
b: float = 0.0
c: bool = True
d: str = "LaLa"

Any type

Special type indicating an unconstrained type.

from typing import Any
a: Any = 0
b: float = 0.0
a = b

Generic types

Type Description
list[str] list of str objects
tuple[int, int] tuple of two int objects (tuple[()] is the empty tuple)
tuple[int, ...] tuple of an arbitrary number of int objects
dict[str, int] dictionary from str keys to int values
Iterable[int] iterable object containing ints
Sequence[bool] sequence of booleans (read-only)
Mapping[str, int] mapping from str keys to int values (read-only)
type[C] type object of C (C is a class/type variable/union of types)

Examples:

la: list = ["a", 1, 3.3]

ta: tuple = ("a", 1, 3.3)
tb: tuple[str, int, float] = ("a", 1, 3.3)

Wrong:

la: list[str, int, float] = ["a", 1, 3.3]

Correct:

la: list[str | int | float] = ["a", 1, 3.3]

|

In the case you expect a variable that can have differnt types over it's lifetime. Let us say you initialize it with None and later want to store integer in it:

a: None | int = None

An other example is this:

import torch
import numpy as np
a: np.ndarray  | torch.Tensor = torch.zeros((100,))

This is called a Union. The Union with None is called Optional. But nowadays you just need to remember |.

In the real world I encountered this problem:

a: int | None = 1
b: int

b = a  # Incompatible types in assignment (expression has type "int | None", variable has type "int")

The solution is to use assert:

a: int | None = 1
b: int

assert a is not None

b = a

TypeAlias

You can create an alias for more complicated types

from typing import TypeAlias

Numbis: TypeAlias = int | float
a: Numbis

a = 1
a = 1.1
a = "Hello" # Incompatible types in assignment (expression has type "str", variable has type "int | float")

Tuple

a: tuple[int, str, int] = (5, "Hello", 6)
a = (
    "Hello",
    4,
    4,
)  # Incompatible types in assignment (expression has type "Tuple[str, int, int]", variable has type "Tuple[int, str, int]")

Or if you don't care about what is in the tuple

a: tuple = (5, "Hello", 6)
a = ("Hello", 4, 4)

List

A generic list:

mylist: list = []

mylist.append(1)
mylist.append(2)
mylist.append("Hello")

Or defining more details about the list:

mylist: list[int] = []

mylist.append(1)
mylist.append(2)
mylist.append(
    "Hello"
)  # Argument 1 to "append" of "list" has incompatible type "str"; expected "int"

print(mylist) # -> [1, 2, 'Hello']

Dict

Generic dictionary:

mydict: dict = {"A": 1, "B": 3.14, "C": "Hello"}

We can give it more information. However, we have to be careful to include the types correctly.

This is wrong:

mydict_a: dict[str, int] = {"A": 1, "B": 3.14, "C": "Hello", 1: 1}

We get these errors:

Dict entry 1 has incompatible type "str": "float"; expected "str": "int"
Dict entry 2 has incompatible type "str": "str"; expected "str": "int"
Dict entry 3 has incompatible type "int": "int"; expected "str": "int"

These are correct ways to handle it:

mydict_a: dict[str | int, str | int | float] = {"A": 1, "B": 3.14, "C": "Hello", 1: 1}
mydict_b: dict[str, str | int | float] = {"A": 1, "B": 3.14, "C": "Hello", "1": 1}

Numpy

Generic:

import numpy as np
a: np.ndarray = np.zeros((10, 1))

Protecting against a wrong dtype:

import numpy as np
from typing import Any

a: np.ndarray[Any, np.dtype[np.uint64]]

a = np.zeros((10, 1), dtype=np.uint64)
a = np.zeros((10, 1)) # -> Incompatible types in assignment (expression has type "ndarray[Any, dtype[floating[_64Bit]]]", variable has type "ndarray[Any, dtype[unsignedinteger[_64Bit]]]")

PyTorch

Please note the big T!

import torch

a: torch.Tensor = torch.zeros((10, 1))

Callable

Callable means "function". As you know, you can shove functions objects around and also use then as function arguments of other function. It is helpful to make sure that the function you get has the properties you expect.

Callable[[Arg1Type, Arg2Type], ReturnType]
from typing import Callable


def function_a(x: int) -> int:
    return x + 1


def function_a_bad(x: int, y: int) -> int:
    return x + y


def function_b(x, other_function: Callable[[int], int]) -> int:
    return other_function(x) ** 2


print(function_b(1, function_a))  # -> 4

print(function_b(1, function_b)) # -> Argument 2 to "function_b" has incompatible type "Callable[[Any, Callable[[int], int]], int]"; expected "Callable[[int], int]"

NewType, Generics, User-defined generic types

{: .topic-optional} This is an optional topic!

Well... this exists... never used it.

NewType Use the NewType helper to create distinct types
Generics Since type information about objects kept in containers cannot be statically inferred in a generic way, many container classes in the standard library support subscription to denote the expected types of container elements.
User-defined generic types A user-defined class can be defined as a generic class.

References