89bc82a5c2
Signed-off-by: David Rotermund <54365609+davrot@users.noreply.github.com> |
||
---|---|---|
.. | ||
README.md |
Pandas
{:.no_toc}
* TOC {:toc}The goal
Questions to David Rotermund
pip install pandas
Pandas
The two most important data types of Pandas are:
- Series
- Data Frames
“Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.”
It is the basis for:
This module contains a large number of probability distributions, summary and frequency statistics, correlation functions and statistical tests, masked statistics, kernel density estimation, quasi-Monte Carlo functionality, and more.
Pingouin is an open-source statistical package written in Python 3 and based mostly on Pandas and NumPy.
rpy2 is an interface to R running embedded in a Python process.
Pandas.Series
class pandas.Series(data=None, index=None, dtype=None, name=None, copy=None, fastpath=False)
One-dimensional ndarray with axis labels (including time series).
Labels need not be unique but must be a hashable type. The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index. Statistical methods from ndarray have been overridden to automatically exclude missing data (currently represented as NaN).
Operations between Series (+, -, /, *, **) align values based on their associated index values– they need not be the same length. The result index will be the sorted union of the two indexes.
Example 1:
import pandas as pd
example = pd.Series(["Bambu", "Tree", "Sleep"])
print(example)
Output:
0 Bambu
1 Tree
2 Sleep
dtype: object
Example 2:
import numpy as np
import pandas as pd
example = pd.Series([99, 88, 32])
print(example)
Output:
0 99
1 88
2 32
dtype: int64
Example 3:
import numpy as np
import pandas as pd
rng = np.random.default_rng()
a = rng.random((5))
example = pd.Series(a)
print(example)
Output:
0 0.305920
1 0.633360
2 0.219094
3 0.005722
4 0.006673
dtype: float64
Example 4:
import pandas as pd
example = pd.Series(["Bambu", 3, "Sleep"])
print(example)
Output:
0 Bambu
1 3
2 Sleep
dtype: object
index and values
import pandas as pd
example = pd.Series(["Bambu", "Tree", "Sleep"])
print(example.index)
print()
print(example.values)
Output:
RangeIndex(start=0, stop=3, step=1)
['Bambu' 'Tree' 'Sleep']
DataFrame
class pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=None)
Two-dimensional, size-mutable, potentially heterogeneous tabular data.
Data structure also contains labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary pandas data structure.
pandas.concat
pandas.concat(objs, *, axis=0, join='outer', ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=None)
Concatenate pandas objects along a particular axis.
Allows optional set logic along the other axes.
Can also add a layer of hierarchical indexing on the concatenation axis, which may be useful if the labels are the same (or overlapping) on the passed axis number.
I/O operations
Pickling |
Flat file |
Clipboard |
Excel |
JSON |
HTML |
XML |
Latex |
HDFStore: PyTables (HDF5) |
Feather |
Parquet |
ORC |
SAS |
SPSS |
SQL |
Google BigQuery |
STATA |