pytutorial/numpy/JSON/README.md
David Rotermund 174449eaea
Create README.md
Signed-off-by: David Rotermund <54365609+davrot@users.noreply.github.com>
2023-12-29 14:53:28 +01:00

141 lines
3.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Numpy <-> JSON over Pandas
{:.no_toc}
<nav markdown="1" class="toc-class">
* TOC
{:toc}
</nav>
## The goal
Normally we can not use JSON with Numpy. However, if we use [Pandas](https://pandas.pydata.org/) as an intermediary then we can do it.
Questions to [David Rotermund](mailto:davrot@uni-bremen.de)
**Note: Pandas can also be used for many other formats beside JSON.**
## Writing JSON [pandas.DataFrame.to_json](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_json.html#pandas-dataframe-to-json)
```python
DataFrame.to_json(path_or_buf=None, orient=None, date_format=None, double_precision=10, force_ascii=True, date_unit='ms', default_handler=None, lines=False, compression='infer', index=None, indent=None, storage_options=None, mode='w')[source]
```
> Convert the object to a JSON string.
>
> Note NaNs and None will be converted to null and datetime objects will be converted to UNIX timestamps.
```python
import numpy as np
import pandas as pd
rng = np.random.default_rng()
some_data = rng.random((11, 3))
df = pd.DataFrame(some_data)
# As file
filename = "mynumpydata.json"
df.to_json(filename, orient="index")
# As string
output = df.to_json(orient="index")
print(output)
```
Output (reformated):
```python
{
"0": {
"0": 0.3145859169,
"1": 0.2517001569,
"2": 0.6685086575
},
"1": {
"0": 0.7324177066,
"1": 0.6750562092,
"2": 0.0086333192
},
"2": {
"0": 0.7529914827,
"1": 0.3597052352,
"2": 0.2780062722
},
"3": {
"0": 0.2847410336,
"1": 0.5572451873,
"2": 0.5591149362
},
"4": {
"0": 0.4507115703,
"1": 0.9623511422,
"2": 0.7180796014
},
"5": {
"0": 0.5406601852,
"1": 0.9315847158,
"2": 0.2456480951
},
"6": {
"0": 0.3441382077,
"1": 0.4714817658,
"2": 0.1777388975
},
"7": {
"0": 0.6994839505,
"1": 0.6520935819,
"2": 0.9870686976
},
"8": {
"0": 0.187576403,
"1": 0.7466669157,
"2": 0.2952841542
},
"9": {
"0": 0.9140410582,
"1": 0.6828387334,
"2": 0.165762789
},
"10": {
"0": 0.644055269,
"1": 0.6122094952,
"2": 0.9695111468
}
}
```
## Read JSON [pandas.read_json](https://pandas.pydata.org/docs/reference/api/pandas.read_json.html#pandas-read-json)
```python
pandas.read_json(path_or_buf, *, orient=None, typ='frame', dtype=None, convert_axes=None, convert_dates=True, keep_default_dates=True, precise_float=False, date_unit=None, encoding=None, encoding_errors='strict', lines=False, chunksize=None, compression='infer', nrows=None, storage_options=None, dtype_backend=_NoDefault.no_default, engine='ujson')[source]
```
> Convert a JSON string to pandas object.
```python
import numpy as np
import pandas as pd
filename = "mynumpydata.json"
df = pd.read_json(filename, orient="index")
output_np = df.to_numpy()
print(type(output_np)) # -> <class 'numpy.ndarray'>
print(output_np)
```
Output:
```python
[[0.31458592 0.25170016 0.66850866]
[0.73241771 0.67505621 0.00863332]
[0.75299148 0.35970524 0.27800627]
[0.28474103 0.55724519 0.55911494]
[0.45071157 0.96235114 0.7180796 ]
[0.54066019 0.93158472 0.2456481 ]
[0.34413821 0.47148177 0.1777389 ]
[0.69948395 0.65209358 0.9870687 ]
[0.1875764 0.74666692 0.29528415]
[0.91404106 0.68283873 0.16576279]
[0.64405527 0.6122095 0.96951115]]
```