# Data structures: [dataclass](https://docs.python.org/3/library/dataclasses.html)
{:.no_toc}
<navmarkdown="1"class="toc-class">
* TOC
{:toc}
</nav>
## The goal
There is a new build-in [dataclass](https://docs.python.org/3/library/dataclasses.html) class which is highly interesting for data scientists. Obviously it is a class for storing your data. Who would have guessed...
Questions to [David Rotermund](mailto:davrot@uni-bremen.de)
**Type annotations required!!!**
This is the first construct in Python that requires type annotation.
[@dataclass](https://docs.python.org/3/library/dataclasses.html) is a decorator that tells Python that this class is a dataclass. A dataclass is a class with different properties compared to a normal class.
An alternative is to use field with the default argument:
```python
from dataclasses import dataclass, field
@dataclass
class TestClassA:
name: str
number_of_electrodes: int
dt: float
sample_rate_in_hz: float = field(default=1000.0)
data_1 = TestClassA("Dataset A", 100, 1 / 1000)
print(data_1)
```
## Default factory
We can use the field's default_factory to put suitable generic default into attributes. default and default_factory can not used together.
Why should we use a default_factory? Well, please see the problem with [mutable](https://docs.python.org/3/glossary.html#term-mutable) objects in the [official Python documentation](https://docs.python.org/3/tutorial/classes.html#class-and-instance-variables).
Or in other words: Using = [ ] as default will cause you pain.
```python
from dataclasses import dataclass, field
@dataclass
class TestClassA:
name: str = field(default_factory=str)
number_of_electrodes: int = field(default_factory=int)
We can mark attributes as key word only (kw_only=true). Normally we would need them to put at the end of the definition of attributes. However, with this allows us to mix it in between:
```python
from dataclasses import dataclass, field
@dataclass
class TestClassA:
name: str
number_of_electrodes: int = field(kw_only=True, default=42)
dt: float = field(init=False)
sample_rate_in_hz: float = 1000.0
def __post_init__(self) -> None:
self.dt = 1.0 / self.sample_rate_in_hz
def __str__(self) -> str:
output: str = (
f"Name: {self.name}"
"\n"
f"Number of electrodes: {self.number_of_electrodes}"
f"Number of electrodes: {self.number_of_electrodes}"
"\n"
f"dt: {self.dt:.4f}s"
"\n"
f"Sample Rate: {self.sample_rate_in_hz:.2f}Hz"
)
return output
data_1 = TestClassA("Dataset A", 100, 500)
print(data_1)
```
Output
```python
Name: Dataset A
Number of electrodes: 100
dt: 0.0020s
Sample Rate: 500.00Hz
```
## Read Only data
We can protect the data from being modified later. Note: If we need to modify data in e.g. the \_\_post\_init\_\_ function then we need to use object.\_\_setattr\_\_.
f"Number of electrodes: {self.number_of_electrodes}"
"\n"
f"dt: {self.dt:.4f}s"
"\n"
f"Sample Rate: {self.sample_rate_in_hz:.2f}Hz"
)
return output
data_1 = TestClassA("Dataset A", 100, 500)
data_1.name = "New Name" # -> FrozenInstanceError: cannot assign to field 'name'
```
## Inheritance
```python
from dataclasses import dataclass
@dataclass
class BasicDataset:
x: int = 1
y: int = 2
@dataclass
class NewDataSet(BasicDataset):
a: int = 3
x: int = 4
data_1 = BasicDataset()
print(data_1)
data_2 = NewDataSet()
print(data_2)
```
Output:
```python
BasicDataset(x=1, y=2)
NewDataSet(x=4, y=2, a=3)
```
## Why should want we to use a data class?
### Comparing datasets
We can compare datasets now
```python
from dataclasses import dataclass
@dataclass
class MyDataset:
x: int
y: int
data_1a = MyDataset(x=1, y=1)
data_1b = MyDataset(x=1, y=1)
print(data_1a == data_1b)
data_2 = MyDataset(x=1, y=2)
print(data_1a == data_2)
```
Output:
```python
True
False
```
We can remove attributes from the comparison
```python
from dataclasses import dataclass, field
@dataclass
class MyDataset:
x: int
y: int = field(compare=False)
data_1a = MyDataset(x=1, y=1)
data_1b = MyDataset(x=1, y=1)
print(data_1a == data_1b)
data_2 = MyDataset(x=1, y=2)
print(data_1a == data_2)
```
Output:
```python
True
True
```
### Sorting datasets
We can add a custom sort_index attribute. Which we can also hide with [repr=False](https://docs.python.org/3/library/dataclasses.html#dataclasses.field):