Create README.md

Signed-off-by: David Rotermund <54365609+davrot@users.noreply.github.com>
2025-07-05 08:00:01 +02:00 · 2023-12-22 14:31:58 +01:00 · 2023-12-22 14:31:58 +01:00 · 526158aac1
commit 526158aac1
parent 7facc0e075
1 changed files with 77 additions and 0 deletions
--- a/pytorch/interfacing_data/README.md
+++ b/pytorch/interfacing_data/README.md
@ -0,0 +1,77 @@
+# Interfacing data
+{:.no_toc}
+
+<nav markdown="1" class="toc-class">
+* TOC
+{:toc}
+</nav>
+
+## The goal
+
+We need to handle our data and make it accessible for PyTorch.
+
+Questions to [David Rotermund](mailto:davrot@uni-bremen.de)
+
+There are options to interface your data. 
+
+
+
+## torch.utils.data.Dataset
+
+In the case we might not be able to load the fully dataset into memory, the torch.utils.data.Dataset is very helpful. 
+
+We need to create a new class which is derived from **torch.utils.data.Dataset**. We can do what every we want in this class as long as we service the functions
+* **\_\_len\_\_()** :  gives us the number of pattern in the dataset
+* **\_\_getitem\_\_(index)** :  gives us the information about ONE pattern at position index in the data set. In the following example, I return the image as 3d torch.Tensor and the corresponding class for that pattern (for which I use int). 
+
+We have a lot of freedom for our own design. e.g.:
+* The argument **train:bool** of the contructor was introduced by me. 
+* The **\_\_getitem\_\_(index)** doesn't need to give back the data for that pattern in exactly this way (means: order of variables, types of variables, number of variables). 
+
+We assume that the data is in the four following files: 
+* train_pattern_storage.npy
+* train_label_storage.npy
+* test_pattern_storage.npy
+* test_label_storage.npy
+  
+
+```python
+import numpy as np
+import torch
+
+
+class MyDataset(torch.utils.data.Dataset):
+
+    # Initialize
+    def __init__(self, train: bool = False) -> None:
+        super(MyDataset, self).__init__()
+
+        if train is True:
+            self.pattern_storage: np.ndarray = np.load("train_pattern_storage.npy")
+            self.label_storage: np.ndarray = np.load("train_label_storage.npy")
+        else:
+            self.pattern_storage = np.load("test_pattern_storage.npy")
+            self.label_storage = np.load("test_label_storage.npy")
+
+        self.pattern_storage = self.pattern_storage.astype(np.float32)
+        self.pattern_storage /= np.max(self.pattern_storage)
+
+        # How many pattern are there?
+        self.number_of_pattern: int = self.label_storage.shape[0]
+
+    def __len__(self) -> int:
+        return self.number_of_pattern
+
+    # Get one pattern at position index
+    def __getitem__(self, index: int) -> tuple[torch.Tensor, int]:
+
+        image = torch.tensor(self.pattern_storage[index, np.newaxis, :, :])
+        target = int(self.label_storage[index])
+
+        return image, target
+
+
+if __name__ == "__main__":
+    pass
+
+```