Update README.md

Signed-off-by: David Rotermund <54365609+davrot@users.noreply.github.com>
2025-07-02 17:00:03 +02:00 · 2024-01-03 16:43:35 +01:00 · 2024-01-03 16:43:35 +01:00 · 7fe786a990
commit 7fe786a990
parent 0809a99c97
1 changed files with 82 additions and 35 deletions
--- a/PyBind11/direct/README.md
+++ b/PyBind11/direct/README.md
@ -16,27 +16,84 @@ Questions to [David Rotermund](mailto:davrot@uni-bremen.de)

 It is the job of Python (Numpy or PyTorch) to provide the tensors from which we read and in which we write. In the cpp domain, we will use this matrices as the interface to Python. We are not allowed to change the sizes of these tensors. We are only allowed to change the content of the tensors. In addition we need to make sure that the matrices are in C_CONTIGUOUS shape. 

+Don't forget that C Contiguous is just a complicated way of saying Row-major order memory layout [Row- and column-major order](https://en.wikipedia.org/wiki/Row-_and_column-major_order).
+
+$$M[a,b,c,d] = M[\eta_a \cdot a + \eta_b \cdot  b + \eta_c \cdot  c + d]$$
+
+with
+
+$$\eta_c = n_d$$
+
+$$\eta_b = \eta_c \cdot n_c$$
+
+$$\eta_a = \eta_b \cdot n_b$$
+
 ## On the Python side

+### PyTorch (CPU)
+
 ```python
-# If it is a torch tensor then make a "view" to its numpy core
-np_input: np.ndarray = input.contiguous().detach().numpy()
+import torch

-# We need to make sure that the numpy ndarray is C_CONTIGUOUS. 
-# If not then use numpy.ascontiguousarray() to make it so
-assert np_input.flags["C_CONTIGUOUS"] is True
+a = torch.zeros((10, 10, 10, 10))

-# Input is a 4d ndarray. And I will make sure that this is really the case
-assert np_input.ndim == 4
+assert a.is_contiguous()
+assert a.is_cuda is False
+
+assert a.ndim == 4

 # Now I extract the pointer to the data memory of the ndarray
-np_input_pointer, _ = np_input.__array_interface__["data"]
+input_pointer = a.data_ptr()

 # Also I need the shape information for the C++ program.
-np_input_dim_0: int = np_input.shape[0]
-np_input_dim_1: int = np_input.shape[1]
-np_input_dim_2: int = np_input.shape[2]
-np_input_dim_3: int = np_input.shape[3]
+input_dim_0: int = a.shape[0]
+input_dim_1: int = a.shape[1]
+input_dim_2: int = a.shape[2]
+input_dim_3: int = a.shape[3]
+```
+
+### PyTorch (GPU)
+
+```python
+import torch
+
+a = torch.zeros((10, 10, 10, 10), device=torch.device("cuda:0"))
+
+assert a.is_contiguous()
+assert a.is_cuda
+
+assert a.ndim == 4
+
+# Now I extract the pointer to the data memory of the ndarray
+input_pointer = a.data_ptr()
+
+# Also I need the shape information for the C++ program.
+input_dim_0: int = a.shape[0]
+input_dim_1: int = a.shape[1]
+input_dim_2: int = a.shape[2]
+input_dim_3: int = a.shape[3]
+```
+
+### Numpy
+
+```python
+import numpy as np
+
+a = np.zeros((10, 10, 10, 10))
+
+assert a.flags["C_CONTIGUOUS"]
+
+
+assert a.ndim == 4
+
+# Now I extract the pointer to the data memory of the ndarray
+input_pointer, _ = a.__array_interface__["data"]
+
+# Also I need the shape information for the C++ program.
+input_dim_0: int = a.shape[0]
+input_dim_1: int = a.shape[1]
+input_dim_2: int = a.shape[2]
+input_dim_3: int = a.shape[3]
 ```

 ## On the C++ side
@ -44,11 +101,11 @@ np_input_dim_3: int = np_input.shape[3]
 Your C++ method needs to accept these arguments

 ```cpp
-int64_t np_input_pointer_addr, 
-int64_t np_input_dim_0,
-int64_t np_input_dim_1, 
-int64_t np_input_dim_2, 
-int64_t np_input_dim_3,
+int64_t input_pointer_addr, 
+int64_t input_dim_0,
+int64_t input_dim_1, 
+int64_t input_dim_2, 
+int64_t input_dim_3,
 ```

 Inside your C++ method you convert the address into a pointer. **BE WARNED:** Make absolutely sure that the dtype of the np.ndarray is correctly reflected in the pointer type
@ -62,24 +119,14 @@ Inside your C++ method you convert the address into a pointer. **BE WARNED:** Ma
 **If you fuck this up then this will end in tears!**

 ```cpp
-float *np_input_pointer = (float *)np_input_pointer_addr;
+float *input_pointer = (float *)input_pointer_addr;

 // Input
-assert((np_input_pointer != nullptr));
-assert((np_input_dim_0 > 0));
-assert((np_input_dim_1 > 0));
-assert((np_input_dim_2 > 0));
-assert((np_input_dim_3 > 0));
+assert((input_pointer != nullptr));
+assert((input_dim_0 > 0));
+assert((input_dim_1 > 0));
+assert((input_dim_2 > 0));
+assert((input_dim_3 > 0));
 ```

-Don't forget that C Contiguous is just a complicated way of saying Row-major order memory layout [Row- and column-major order](https://en.wikipedia.org/wiki/Row-_and_column-major_order).

-$$M[a,b,c,d] = M[\eta_a \cdot a + \eta_b \cdot  b + \eta_c \cdot  c + d]$$
-
-with
-
-$$\eta_c = n_d$$
-
-$$\eta_b = \eta_c \cdot n_c$$
-
-$$\eta_a = \eta_b \cdot n_b$$