# Expanding Python with C++ modules {:.no_toc} ## Top A minimal introduction in how to use PyBind11. PyBind11 allows you to extend Python with C++ modules which are written in C++11 or newer. Questions to [David Rotermund](mailto:davrot@uni-bremen.de) ## A very simple example What do we need in the most minimal scenario? * [Makefile](Makefile) plus a .env file * Module wrapper ([PyMyModuleCPU.cpp](PyMyModuleCPU.cpp)) * The module ([MyModuleCPU.cpp](MyModuleCPU.cpp) and [MyModuleCPU.h](MyModuleCPU.h)) * Some test code [test.py](test.py) ## [Makefile](Makefile) If you are programming in C++ and don't know how to use a Makefile then you really should [look it up](https://www.gnu.org/software/make/manual/html_node/Simple-Makefile.html). I am working under Linux and my Makefile looks like [this](Makefile)... MyModuleCPU.cpp and PyMyModuleCPU.cpp are compiled in .o files and then linked into PyMyModuleCPU. However, Python needs a special filename ending which depends on the Python version. Thus there is this additional copy command dealing with this issue. ## The wrapper file ([PyMyModuleCPU.cpp](PyMyModuleCPU.cpp)) The wrapper file is the connection point between Python and C++. It tells Python what to call. In this example we have three functions we export into the Python space: * PutStuffIn , which is connected to MyModule::PutStuffIn * DoStuff , which is connected to MyModule::DoStuff * GetStuffOut , which is connected to MyModule::GetStuffOut These methods of the class MyModule are defined in MyModuleCPU.h and do what their names suggest they will do... ## The class definition of MyModule ([MyModuleCPU.h](MyModuleCPU.h)) The exported methods are public. The rest are a collection of methods to handle the data exchange between C++ and Python in a safe way. The exported methods do the following: * PutStuffIn : An Numpy array arrives. GetShape extracts the shape of the numpy.ndarray and stored it into std::vector Data_Shape;. Then it puts the numpy.ndarray into Converter and makes std::vector Data_Data; out of it. * DoStuff: Python gives double Factor to the method. The method multiplies this number with the data Data_Data from the numpy.ndarray. This is done in SIMD (single instruction multiple data) fashion using openmp. * GetStuffOut : It takes Data_Data and Data_Shape and makes a Python numpy.ndarray out of it and gives it to Python. ```cpp int MyModule::PutStuffIn(py::array & Arg_Input){ if (GetShape(Arg_Input, Data_Shape) == false){ return false; } if (MyModule::Converter(Arg_Input, Data_Data) == false){ return false; } return true; } int MyModule::DoStuff(double Factor){ size_t Counter; #pragma omp simd for (Counter = 0; Counter < Data_Data.size(); Counter++){ Data_Data[Counter] *= Factor; } return true; } py::array MyModule::GetStuffOut(void){ return Converter(Data_Data, Data_Shape); } ``` ## The save (and slow) way to communicate ([MyModuleCPU.cpp](MyModuleCPU.cpp)) Please see this just a set of examples. I focused on double (float64) in this example. ### C++ in and Python out * Put vector of vector<> in and get a py::list out : py::list MakeList(std::vector> &Arg_Data, std::vector> &Arg_Shape); * Put vector<> in and get py::array out : py::array Converter(std::vector &Arg_Data, std::vector &Arg_Shape); * Put a value in and get a py:array out : py::array Converter(double &Arg_Data); ### Python in and C++ out * Put py::array in and get vector<> out : bool Converter(py::array &Arg_In, std::vector &Arg_Data); * Put py::list in and get vector> out : bool ConvertList(py::list &Arg_List, std::vector> &Arg_Data, std::vector> &Arg_Shape); * Put a py::array in and get a vector<> with the dimensions out : bool GetShape(py::array &Arg_Input, std::vector &Arg_Shape); * Put a py::list in and get a vector> with the dimensions out : int GetShape(py::list &Arg_List, std::vector> &Arg_Shape); ### Helper functions * Put a py::list in and get a vector> of the data out : int CopyData(py::list &Arg_List, std::vector> Arg_Data, std::vector> &Arg_Shape); * Check the properties of a list : bool CheckList(py::list &Arg_List, int Check_NumberOfDimensions, size_t dType); ## The test program ([test.py](test.py)) I think that the mathematical operation that the test code does, need no additional explanation. (A random matrix is multiplied by 5.0) ```python X [[0.43861361 0.34633103 0.30473636 0.25559892 0.61136669 0.61763177] [0.58565176 0.04562993 0.89141907 0.17663681 0.94354389 0.08857159] [0.40814404 0.58116521 0.76818518 0.11430939 0.90513926 0.38985626] [0.07986693 0.41520487 0.11921055 0.12390022 0.64135749 0.04744072] [0.44492385 0.94347543 0.01514797 0.74471067 0.34624101 0.91923338]] X-Y: [[0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0.]] X*5-Z: [[0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0.]] ``` ## Source code ### .env file **Change the directories and parameters according you system.** ```Makefile PYBIN=~/P3.11/bin/ CC=/usr/lib64/ccache/clang++ NVCC=/usr/local/cuda-12/bin/nvcc -allow-unsupported-compiler PARAMETERS_O_CPU = -O3 -std=c++14 -fPIC -Wall -fopenmp=libomp PARAMETERS_Linker_CPU = -shared -lm -lomp -lstdc++ -Wall PARAMETERS_O_GPU= -O3 -std=c++14 -ccbin=$(CC) \ -Xcompiler "-fPIC -Wall -fopenmp=libomp" PARAMETERS_Linker_GPU=-Xcompiler "-shared -lm -lomp -lstdc++ -Wall" O_DIRS = o/ ``` ### Makefile [Makefile](Makefile) ```Makefile include .env export name = MyModule type = CPU PYPOSTFIX := $(shell $(PYBIN)python3-config --extension-suffix) PYBIND11INCLUDE := $(shell $(PYBIN)python3 -m pybind11 --includes) PARAMETERS_O = $(PARAMETERS_O_CPU) $(PYBIND11INCLUDE) PARAMETERS_Linker = $(PARAMETERS_Linker_CPU) so_file = Py$(name)$(type)$(PYPOSTFIX) pyi_file = Py$(name)$(type).pyi all: $(so_file) $(O_DIRS)$(name)$(type).o: $(name)$(type).h $(name)$(type).cpp mkdir -p $(O_DIRS) $(CC) $(PARAMETERS_O) -c $(name)$(type).cpp -o $(O_DIRS)$(name)$(type).o $(O_DIRS)Py$(name)$(type).o: $(name)$(type).h Py$(name)$(type).cpp mkdir -p $(O_DIRS) $(CC) $(PARAMETERS_O) -c Py$(name)$(type).cpp -o $(O_DIRS)Py$(name)$(type).o $(so_file): $(O_DIRS)$(name)$(type).o $(O_DIRS)Py$(name)$(type).o $(CC) $(PARAMETERS_Linker) -o $(so_file) $(O_DIRS)$(name)$(type).o $(O_DIRS)Py$(name)$(type).o ####################### clean: rm -rf $(O_DIRS) rm -f $(so_file) rm -f $(pyi_file) ``` ### PyMyModuleCPU.cpp [PyMyModuleCPU.cpp](PyMyModuleCPU.cpp) ```cpp #include #include "MyModuleCPU.h" namespace py = pybind11; PYBIND11_MODULE(PyMyModuleCPU, m) { m.doc() = "Example Module"; py::class_(m, "MyModule") .def(py::init<>()) .def("PutStuffIn", &MyModule::PutStuffIn) .def("DoStuff", &MyModule::DoStuff) .def("GetStuffOut", &MyModule::GetStuffOut); } ``` ### MyModuleCPU.h [MyModuleCPU.h](MyModuleCPU.h) ```cpp #ifndef MYMODULECPU #define MYMODULECPU #include #include #include namespace py = pybind11; class MyModule { public: MyModule(); ~MyModule(); // The functionality of the module int PutStuffIn(py::array& Arg_Input); int DoStuff(double Factor); py::array GetStuffOut(void); private: // Example data: std::vector Data_Data; std::vector Data_Shape; // Private functions: // ================== // Put vector of vector<> in and get a py::list out py::list MakeList(std::vector>& Arg_Data, std::vector>& Arg_Shape); // Put vector<> in and get py::array out py::array Converter(std::vector& Arg_Data, std::vector& Arg_Shape); // Put a value in and get a py:array out py::array Converter(double& Arg_Data); // Put py::array in and get vector<> out bool Converter(py::array& Arg_In, std::vector& Arg_Data); // Put py::list in and get vector> out bool ConvertList(py::list& Arg_List, std::vector>& Arg_Data, std::vector>& Arg_Shape); // Put a py::array in and get a vector<> with the dimensions out bool GetShape(py::array& Arg_Input, std::vector& Arg_Shape); // Put a py::list in and get a vector> with the dimensions out int GetShape(py::list& Arg_List, std::vector>& Arg_Shape); // Put a py::list in and get a vector> of the data out out int CopyData(py::list& Arg_List, std::vector>& Arg_Data, std::vector>& Arg_Shape); // Check the properties of a list // 0: single // 1: double // 2: uint32_t // 3: uint64_t bool CheckList(py::list& Arg_List, int Check_NumberOfDimensions, size_t dType); }; #endif /* MYMODULECPU */ ``` ### MyModuleCPU.cpp [MyModuleCPU.cpp](MyModuleCPU.cpp) ```cpp #include "MyModuleCPU.h" #include #include #include MyModule::MyModule() {}; MyModule::~MyModule() {}; int MyModule::PutStuffIn(py::array& Arg_Input) { if (GetShape(Arg_Input, Data_Shape) == false) { return false; } if (MyModule::Converter(Arg_Input, Data_Data) == false) { return false; } return true; } int MyModule::DoStuff(double Factor) { size_t Counter; #pragma omp simd for (Counter = 0; Counter < Data_Data.size(); Counter++) { Data_Data[Counter] *= Factor; } return true; } py::array MyModule::GetStuffOut(void) { return Converter(Data_Data, Data_Shape); } // ------------------------------------------------ py::list MyModule::MakeList(std::vector>& Arg_Data, std::vector>& Arg_Shape) { py::list ReturnValue; if (Arg_Data.size() != Arg_Shape.size()) { std::cout << "MyModule::MakeList => The sizes of the two vectors are different.\n"; return ReturnValue; } size_t List_Pos = 0; for (List_Pos = 0; List_Pos < Arg_Shape.size(); List_Pos++) { std::vector ShapeVector; ShapeVector.resize(Arg_Shape[List_Pos].size()); size_t Counter = 0; for (Counter = 0; Counter < Arg_Shape[List_Pos].size(); Counter++) { ShapeVector[Counter] = Arg_Shape[List_Pos].at(Counter); } auto Temp = py::array_t(ShapeVector, Arg_Data[List_Pos].data()); ReturnValue.append(Temp); } return ReturnValue; } py::array MyModule::Converter(std::vector& Arg_Data, std::vector& Arg_Shape) { py::array ReturnValue; std::vector ShapeVector; ShapeVector.resize(Arg_Shape.size()); size_t Counter = 0; for (Counter = 0; Counter < Arg_Shape.size(); Counter++) { ShapeVector[Counter] = Arg_Shape.at(Counter); } auto Temp = py::array_t(ShapeVector, Arg_Data.data()); return Temp; } bool MyModule::Converter(py::array& Arg_In, std::vector& Arg_Data) { if ((Arg_In.flags() & pybind11::detail::npy_api::NPY_ARRAY_C_CONTIGUOUS_) != pybind11::detail::npy_api::NPY_ARRAY_C_CONTIGUOUS_) { std::cout << "MyModule::Converter => Array is not c_style.\n"; return false; } size_t Size = Arg_In.nbytes(); if (Size == 0) { std::cout << "MyModule::Converter => Array is empty.\n"; return false; } auto Temp_Array = Arg_In.request(); if (py::isinstance>(Arg_In) == false) { std::cout << "MyModule::Converter => Wrong type.\n"; return false; } double* MyPtr = (double*)Temp_Array.ptr; if (MyPtr == nullptr) { std::cout << "MyModule::Converter => Pointer is null.\n"; return false; } Arg_Data.resize(Size / sizeof(double)); memcpy(Arg_Data.data(), MyPtr, Size); return true; } bool MyModule::ConvertList(py::list& Arg_List, std::vector>& Arg_Data, std::vector>& Arg_Shape) { Arg_Data.resize(0); Arg_Shape.resize(0); // Get the shapes of all the matrices if (GetShape(Arg_List, Arg_Shape) != 0) { return false; } // Get the data from the list if (CopyData(Arg_List, Arg_Data, Arg_Shape) != 0) { return false; } return true; } int MyModule::GetShape(py::list& Arg_List, std::vector>& Arg_Shape) { Arg_Shape.resize(0); size_t List_Length = Arg_List.size(); Arg_Shape.resize(List_Length); size_t Counter_List; size_t Counter_Dims; py::array Temp_Array; for (Counter_List = 0; Counter_List < List_Length; Counter_List++) { Arg_Shape[Counter_List].resize(0); Temp_Array = Arg_List[Counter_List]; Arg_Shape[Counter_List].resize(Temp_Array.ndim()); for (Counter_Dims = 0; Counter_Dims < Temp_Array.ndim(); Counter_Dims++) { Arg_Shape[Counter_List][Counter_Dims] = Temp_Array.shape(Counter_Dims); } } return 0; } bool MyModule::GetShape(py::array& Arg_Input, std::vector& Arg_Shape) { Arg_Shape.resize(Arg_Input.ndim()); size_t Counter_Dims; for (Counter_Dims = 0; Counter_Dims < Arg_Input.ndim(); Counter_Dims++) { Arg_Shape[Counter_Dims] = Arg_Input.shape(Counter_Dims); } return true; } int MyModule::CopyData(py::list& Arg_List, std::vector>& Arg_Data, std::vector>& Arg_Shape) { Arg_Data.resize(0); size_t List_Length = Arg_List.size(); size_t List_Pos = List_Length; double* MyPtr = nullptr; py::array Temp_Array; Arg_Data.resize(List_Length); for (List_Pos = 0; List_Pos < List_Length; List_Pos++) { MyPtr = nullptr; Temp_Array = Arg_List[List_Pos]; size_t Counter = 0; size_t ElementsOfArray = 0; for (Counter = 0; Counter < Arg_Shape[List_Pos].size(); Counter++) { if (Counter == 0) { ElementsOfArray = Arg_Shape[List_Pos][Counter]; } else { ElementsOfArray *= Arg_Shape[List_Pos][Counter]; } } size_t SizeOfArray_Bytes = ElementsOfArray * sizeof(double); if (SizeOfArray_Bytes != Temp_Array.nbytes()) { std::cout << "MyModule::CopyData => " << "Liste element: " << Counter << " is not the right amount of data.\n"; return -1; } auto Temp_Array_f = Temp_Array.request(); MyPtr = (double*)Temp_Array_f.ptr; if (MyPtr == nullptr) { std::cout << "MyModule::CopyData => " << "Pointer is null.\n"; return -1; } Arg_Data[List_Pos].resize(ElementsOfArray); memcpy((void*)Arg_Data[List_Pos].data(), (void*)MyPtr, SizeOfArray_Bytes); } return 0; } py::array MyModule::Converter(double& Arg_Data) { std::vector ShapeVector; ShapeVector.resize(1); ShapeVector[0] = 1; return py::array_t(ShapeVector, &Arg_Data); } bool MyModule::CheckList(py::list& Arg_List, int Check_NumberOfDimensions, size_t dType) { // Is it a list? py::handle type = Arg_List.get_type(); py::object type_name = type.attr("__name__"); std::string Correct_List = std::string("list"); if (Correct_List.compare(py::cast(type_name)) != 0) { std::cout << "MyModule => Not a list.\n"; return false; } // Is there something in the list? size_t List_Length = Arg_List.size(); if (List_Length <= 0) { std::cout << "MyModule => List is empty.\n"; return false; } // Are the list elements numpy arrays? size_t Counter = 0; std::string Correct_NDArray = std::string("ndarray"); for (Counter = 0; Counter < List_Length; Counter++) { type = Arg_List[Counter].get_type(); type_name = type.attr("__name__"); if (Correct_NDArray.compare(py::cast(type_name)) != 0) { std::cout << "MyModule => Liste element: " << Counter << " not a numpy array .\n"; return false; } } // Has every array the right dimension? py::array Temp_Array; for (Counter = 0; Counter < List_Length; Counter++) { Temp_Array = Arg_List[Counter]; if (Temp_Array.ndim() != Check_NumberOfDimensions) { std::cout << " MyModule => Liste element: " << Counter << " has not the necessary " << Check_NumberOfDimensions << " dimensions (found: " << Temp_Array.ndim() << ").\n"; return false; } } // Are all the numpy arrays c_style? for (Counter = 0; Counter < List_Length; Counter++) { Temp_Array = Arg_List[Counter]; if ((Temp_Array.flags() & pybind11::detail::npy_api::NPY_ARRAY_C_CONTIGUOUS_) != pybind11::detail::npy_api::NPY_ARRAY_C_CONTIGUOUS_) { std::cout << "MyModule => Liste element: " << Counter << " is not c_style.\n"; return false; } } // 0: single // 1: double // 2: uint32_t // 3: uint64_t for (Counter = 0; Counter < List_Length; Counter++) { Temp_Array = Arg_List[Counter]; // Float if (dType == 0) { if (py::isinstance>(Temp_Array) == false) { std::cout << "MyModule => Liste element: " << Counter << " is not a float.\n"; return -1; } } // Double if (dType == 1) { if (py::isinstance>(Temp_Array) == false) { std::cout << "MyModule => Liste element: " << Counter << " is not a double.\n"; return false; } } // uint32_t if (dType == 2) { if (py::isinstance>(Temp_Array) == false) { std::cout << "MyModule => Liste element: " << Counter << " is not a uint32.\n"; return false; } } // uint64_t if (dType == 3) { if (py::isinstance>(Temp_Array) == false) { std::cout << "MyModule => Liste element: " << Counter << " is not a uint64.\n"; return false; } } } return true; } ``` ### test.py [test.py](test.py) ```python from PyMyModuleCPU import MyModule import numpy as np MyCExtension = MyModule() X = np.random.random((5, 6)) print("X") print(X) if MyCExtension.PutStuffIn(X) is False: print("Error (1)\n") exit() Y = MyCExtension.GetStuffOut() print("X-Y:") print(X - Y) if MyCExtension.DoStuff(5.0) is False: print("Error (2)\n") exit() Z = MyCExtension.GetStuffOut() print("X*5-Z:") print(X * 5.0 - Z) ``` ## [OpenMP](https://bisqwit.iki.fi/story/howto/openmp/) ### [SIMD (Single Instruction Multiple Data)](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) Make absolutely sure that you don't overlap read and write memory areas. Also make absolutely sure that you don't write at same positions. (I mean stuff like s[i] = v[i+j]; ) ```cpp #pragma omp simd for(...){} ``` ```cpp #pragma omp simd reduction(+ : SOME_VARIABLE_NAME) for(...){ SOME_VARIABLE_NAME += ... } ``` Parallel loop (on multiple cores) ```cpp omp_set_num_threads(number_of_cpu_processes); ``` ```cpp #pragma omp parallel for for(...){} ``` For the parallel loop you need to add the parameters -fopenmp=libomp -lomp into the Makefile. ## Reference * [PyBind11](https://pybind11.readthedocs.io/en/stable/)