In the case we might not be able to load the fully dataset into memory, the **torch.utils.data.Dataset** is very helpful.
```python
CLASS torch.utils.data.Dataset(*args, **kwds)
```
> An abstract class representing a Dataset.
>
> All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite **\_\_getitem\_\_()**, supporting fetching a data sample for a given key. Subclasses could also optionally overwrite **\_\_len\_\_()**, which is expected to return the size of the dataset by many Sampler implementations and the default options of DataLoader. Subclasses could also optionally implement **\_\_getitems\_\_()**, for speedup batched samples loading. This method accepts list of indices of samples of batch and returns list of samples.
We need to create a new class which is derived from **torch.utils.data.Dataset**. We can do what every we want in this class as long as we service the functions
* **\_\_len\_\_()** : gives us the number of pattern in the dataset
* **\_\_getitem\_\_(index)** : gives us the information about ONE pattern at position index in the data set. In the following example, I return the image as 3d torch.Tensor and the corresponding class for that pattern (for which I use int).
We have a lot of freedom for our own design. e.g.:
* The argument **train:bool** of the contructor was introduced by me.
* The **\_\_getitem\_\_(index)** doesn't need to give back the data for that pattern in exactly this way (means: order of variables, types of variables, number of variables).
We assume that the data is in the four following files: