Create README.md

Signed-off-by: David Rotermund <54365609+davrot@users.noreply.github.com>
2025-07-08 17:00:02 +02:00 · 2023-12-16 14:29:12 +01:00 · 2023-12-16 14:29:12 +01:00 · d18ed4d586
commit d18ed4d586
parent 496a8053c4
1 changed files with 478 additions and 0 deletions
--- a/scikit-learn/overview/README.md
+++ b/scikit-learn/overview/README.md
@ -0,0 +1,478 @@
+# Overview
+{:.no_toc}
+
+<nav markdown="1" class="toc-class">
+* TOC
+{:toc}
+</nav>
+
+## The goal
+
+[scikit-learn](https://scikit-learn.org/stable/index.html) is a machine learning tool kit for data analysis. 
+
+Questions to [David Rotermund](mailto:davrot@uni-bremen.de)
+
+
+```shell
+pip install scikit-learn
+```
+
+> * Simple and efficient tools for predictive data analysis
+> * Accessible to everybody, and reusable in various contexts
+> * Built on NumPy, SciPy, and matplotlib
+
+
+**I will keep it short and I will mark the most relevant tools in bold**
+
+
+## [sklearn.base: Base classes and utility functions](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.base)
+
+see [here](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.base)
+
+
+## [sklearn.calibration: Probability Calibration](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.calibration)
+
+
+calibration.CalibratedClassifierCV([...])
+Probability calibration with isotonic regression or logistic regression.
+
+calibration.calibration_curve(y_true, y_prob, *)
+Compute true and predicted probabilities for a calibration curve.
+
+## [sklearn.cluster: Clustering](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.cluster)
+
+see more [here](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.cluster)
+
+cluster.AffinityPropagation(*[, damping, ...])
+Perform Affinity Propagation Clustering of data.
+
+cluster.AgglomerativeClustering([...])
+Agglomerative Clustering.
+
+cluster.Birch(*[, threshold, ...])
+Implements the BIRCH clustering algorithm.
+
+cluster.DBSCAN([eps, min_samples, metric, ...])
+Perform DBSCAN clustering from vector array or distance matrix.
+
+cluster.HDBSCAN([min_cluster_size, ...])
+Cluster data using hierarchical density-based clustering.
+
+cluster.FeatureAgglomeration([n_clusters, ...])
+Agglomerate features.
+
+**cluster.KMeans([n_clusters, init, n_init, ...])**
+**K-Means clustering.**
+
+cluster.BisectingKMeans([n_clusters, init, ...])
+Bisecting K-Means clustering.
+
+**cluster.MiniBatchKMeans([n_clusters, init, ...])**
+**Mini-Batch K-Means clustering.**
+
+cluster.MeanShift(*[, bandwidth, seeds, ...])
+Mean shift clustering using a flat kernel.
+
+cluster.OPTICS(*[, min_samples, max_eps, ...])
+Estimate clustering structure from vector array.
+
+cluster.SpectralClustering([n_clusters, ...])
+Apply clustering to a projection of the normalized Laplacian.
+
+cluster.SpectralBiclustering([n_clusters, ...])
+Spectral biclustering (Kluger, 2003).
+
+cluster.SpectralCoclustering([n_clusters, ...])
+Spectral Co-Clustering algorithm (Dhillon, 2001).
+
+
+## [sklearn.compose: Composite Estimators](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.compose)
+
+compose.ColumnTransformer(transformers, *[, ...])
+Applies transformers to columns of an array or pandas DataFrame.
+
+compose.TransformedTargetRegressor([...])
+Meta-estimator to regress on a transformed target.
+
+compose.make_column_transformer(*transformers)
+Construct a ColumnTransformer from the given transformers.
+
+compose.make_column_selector([pattern, ...])
+Create a callable to select columns to be used with ColumnTransformer.
+
+## [sklearn.covariance: Covariance Estimators](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.covariance)
+
+covariance.EmpiricalCovariance(*[, ...])
+Maximum likelihood covariance estimator.
+
+covariance.EllipticEnvelope(*[, ...])
+An object for detecting outliers in a Gaussian distributed dataset.
+
+covariance.GraphicalLasso([alpha, mode, ...])
+Sparse inverse covariance estimation with an l1-penalized estimator.
+
+covariance.GraphicalLassoCV(*[, alphas, ...])
+Sparse inverse covariance w/ cross-validated choice of the l1 penalty.
+
+covariance.LedoitWolf(*[, store_precision, ...])
+LedoitWolf Estimator.
+
+covariance.MinCovDet(*[, store_precision, ...])
+Minimum Covariance Determinant (MCD): robust estimator of covariance.
+
+covariance.OAS(*[, store_precision, ...])
+Oracle Approximating Shrinkage Estimator as proposed in [R69773891e6a6-1].
+
+covariance.ShrunkCovariance(*[, ...])
+Covariance estimator with shrinkage.
+
+covariance.empirical_covariance(X, *[, ...])
+Compute the Maximum likelihood covariance estimator.
+
+covariance.graphical_lasso(emp_cov, alpha, *)
+L1-penalized covariance estimator.
+
+covariance.ledoit_wolf(X, *[, ...])
+Estimate the shrunk Ledoit-Wolf covariance matrix.
+
+covariance.ledoit_wolf_shrinkage(X[, ...])
+Estimate the shrunk Ledoit-Wolf covariance matrix.
+
+covariance.oas(X, *[, assume_centered])
+Estimate covariance with the Oracle Approximating Shrinkage as proposed in [Rca3a42e5ec35-1].
+
+covariance.shrunk_covariance(emp_cov[, ...])
+Calculate a covariance matrix shrunk on the diagonal.
+
+
+## [sklearn.cross_decomposition: Cross decomposition](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.cross_decomposition)
+
+cross_decomposition.CCA([n_components, ...])
+Canonical Correlation Analysis, also known as "Mode B" PLS.
+
+cross_decomposition.PLSCanonical([...])
+Partial Least Squares transformer and regressor.
+
+cross_decomposition.PLSRegression([...])
+PLS regression.
+
+cross_decomposition.PLSSVD([n_components, ...])
+Partial Least Square SVD.
+
+
+## [sklearn.datasets: Datasets](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.datasets)
+
+see [here](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.datasets)
+
+
+## [sklearn.decomposition: Matrix Decomposition](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.decomposition)
+
+
+decomposition.DictionaryLearning([...])
+Dictionary learning.
+
+decomposition.FactorAnalysis([n_components, ...])
+Factor Analysis (FA).
+
+**decomposition.FastICA([n_components, ...])**
+**FastICA: a fast algorithm for Independent Component Analysis.**
+
+decomposition.IncrementalPCA([n_components, ...])
+Incremental principal components analysis (IPCA).
+
+decomposition.KernelPCA([n_components, ...])
+Kernel Principal component analysis (KPCA) [R396fc7d924b8-1].
+
+decomposition.LatentDirichletAllocation([...])
+Latent Dirichlet Allocation with online variational Bayes algorithm.
+
+decomposition.MiniBatchDictionaryLearning([...])
+Mini-batch dictionary learning.
+
+decomposition.MiniBatchSparsePCA([...])
+Mini-batch Sparse Principal Components Analysis.
+
+decomposition.NMF([n_components, init, ...])
+Non-Negative Matrix Factorization (NMF).
+
+decomposition.MiniBatchNMF([n_components, ...])
+Mini-Batch Non-Negative Matrix Factorization (NMF).
+
+**decomposition.PCA([n_components, copy, ...])**
+**Principal component analysis (PCA).**
+
+decomposition.SparsePCA([n_components, ...])
+Sparse Principal Components Analysis (SparsePCA).
+
+decomposition.SparseCoder(dictionary, *[, ...])
+Sparse coding.
+
+decomposition.TruncatedSVD([n_components, ...])
+Dimensionality reduction using truncated SVD (aka LSA).
+
+decomposition.dict_learning(X, n_components, ...)
+Solve a dictionary learning matrix factorization problem.
+
+decomposition.dict_learning_online(X[, ...])
+Solve a dictionary learning matrix factorization problem online.
+
+decomposition.fastica(X[, n_components, ...])
+Perform Fast Independent Component Analysis.
+
+decomposition.non_negative_factorization(X)
+Compute Non-negative Matrix Factorization (NMF).
+
+decomposition.sparse_encode(X, dictionary, *)
+Sparse coding.
+
+## [sklearn.discriminant_analysis: Discriminant Analysis](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.discriminant_analysis)
+
+discriminant_analysis.LinearDiscriminantAnalysis([...])
+Linear Discriminant Analysis.
+
+discriminant_analysis.QuadraticDiscriminantAnalysis(*)
+Quadratic Discriminant Analysis.
+
+
+## [sklearn.dummy: Dummy estimators](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.dummy)
+
+dummy.DummyClassifier(*[, strategy, ...])
+DummyClassifier makes predictions that ignore the input features.
+
+dummy.DummyRegressor(*[, strategy, ...])
+Regressor that makes predictions using simple rules.
+
+## [sklearn.ensemble: Ensemble Methods](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.ensemble)
+
+ensemble.AdaBoostClassifier([estimator, ...])
+An AdaBoost classifier.
+
+ensemble.AdaBoostRegressor([estimator, ...])
+An AdaBoost regressor.
+
+ensemble.BaggingClassifier([estimator, ...])
+A Bagging classifier.
+
+ensemble.BaggingRegressor([estimator, ...])
+A Bagging regressor.
+
+ensemble.ExtraTreesClassifier([...])
+An extra-trees classifier.
+
+ensemble.ExtraTreesRegressor([n_estimators, ...])
+An extra-trees regressor.
+
+ensemble.GradientBoostingClassifier(*[, ...])
+Gradient Boosting for classification.
+
+ensemble.GradientBoostingRegressor(*[, ...])
+Gradient Boosting for regression.
+
+ensemble.IsolationForest(*[, n_estimators, ...])
+Isolation Forest Algorithm.
+
+ensemble.RandomForestClassifier([...])
+A random forest classifier.
+
+ensemble.RandomForestRegressor([...])
+A random forest regressor.
+
+ensemble.RandomTreesEmbedding([...])
+An ensemble of totally random trees.
+
+ensemble.StackingClassifier(estimators[, ...])
+Stack of estimators with a final classifier.
+
+ensemble.StackingRegressor(estimators[, ...])
+Stack of estimators with a final regressor.
+
+ensemble.VotingClassifier(estimators, *[, ...])
+Soft Voting/Majority Rule classifier for unfitted estimators.
+
+ensemble.VotingRegressor(estimators, *[, ...])
+Prediction voting regressor for unfitted estimators.
+
+ensemble.HistGradientBoostingRegressor([...])
+Histogram-based Gradient Boosting Regression Tree.
+
+ensemble.HistGradientBoostingClassifier([...])
+Histogram-based Gradient Boosting Classification Tree.
+
+## [sklearn.exceptions: Exceptions and warnings](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.exceptions)
+
+see [here](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.exceptions)
+
+## [sklearn.experimental: Experimental](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.experimental)
+
+see [here](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.experimental)
+
+
+## [sklearn.preprocessing: Preprocessing and Normalization](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.preprocessing)
+
+preprocessing.Binarizer(*[, threshold, copy])
+Binarize data (set feature values to 0 or 1) according to a threshold.
+
+preprocessing.FunctionTransformer([func, ...])
+Constructs a transformer from an arbitrary callable.
+
+preprocessing.KBinsDiscretizer([n_bins, ...])
+Bin continuous data into intervals.
+
+preprocessing.KernelCenterer()
+Center an arbitrary kernel matrix 
+
+preprocessing.LabelBinarizer(*[, neg_label, ...])
+Binarize labels in a one-vs-all fashion.
+
+preprocessing.LabelEncoder()
+Encode target labels with value between 0 and n_classes-1.
+
+preprocessing.MultiLabelBinarizer(*[, ...])
+Transform between iterable of iterables and a multilabel format.
+
+preprocessing.MaxAbsScaler(*[, copy])
+Scale each feature by its maximum absolute value.
+
+preprocessing.MinMaxScaler([feature_range, ...])
+Transform features by scaling each feature to a given range.
+
+preprocessing.Normalizer([norm, copy])
+Normalize samples individually to unit norm.
+
+preprocessing.OneHotEncoder(*[, categories, ...])
+Encode categorical features as a one-hot numeric array.
+
+preprocessing.OrdinalEncoder(*[, ...])
+Encode categorical features as an integer array.
+
+preprocessing.PolynomialFeatures([degree, ...])
+Generate polynomial and interaction features.
+
+preprocessing.PowerTransformer([method, ...])
+Apply a power transform featurewise to make data more Gaussian-like.
+
+preprocessing.QuantileTransformer(*[, ...])
+Transform features using quantiles information.
+
+preprocessing.RobustScaler(*[, ...])
+Scale features using statistics that are robust to outliers.
+
+preprocessing.SplineTransformer([n_knots, ...])
+Generate univariate B-spline bases for features.
+
+preprocessing.StandardScaler(*[, copy, ...])
+Standardize features by removing the mean and scaling to unit variance.
+
+preprocessing.TargetEncoder([categories, ...])
+Target Encoder for regression and classification targets.
+
+preprocessing.add_dummy_feature(X[, value])
+Augment dataset with an additional dummy feature.
+
+preprocessing.binarize(X, *[, threshold, copy])
+Boolean thresholding of array-like or scipy.sparse matrix.
+
+preprocessing.label_binarize(y, *, classes)
+Binarize labels in a one-vs-all fashion.
+
+preprocessing.maxabs_scale(X, *[, axis, copy])
+Scale each feature to the [-1, 1] range without breaking the sparsity.
+
+preprocessing.minmax_scale(X[, ...])
+Transform features by scaling each feature to a given range.
+
+preprocessing.normalize(X[, norm, axis, ...])
+Scale input vectors individually to unit norm (vector length).
+
+preprocessing.quantile_transform(X, *[, ...])
+Transform features using quantiles information.
+
+preprocessing.robust_scale(X, *[, axis, ...])
+Standardize a dataset along any axis.
+
+preprocessing.scale(X, *[, axis, with_mean, ...])
+Standardize a dataset along any axis.
+
+preprocessing.power_transform(X[, method, ...])
+Parametric, monotonic transformation to make data more Gaussian-like.
+
+
+
+## [sklearn.random_projection: Random projection](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.random_projection)
+
+random_projection.GaussianRandomProjection([...])
+Reduce dimensionality through Gaussian random projection.
+
+random_projection.SparseRandomProjection([...])
+Reduce dimensionality through sparse random projection.
+
+random_projection.johnson_lindenstrauss_min_dim(...)
+Find a 'safe' number of components to randomly project to.
+
+## [sklearn.semi_supervised: Semi-Supervised Learning](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.semi_supervised)
+
+semi_supervised.LabelPropagation([kernel, ...])
+Label Propagation classifier.
+
+semi_supervised.LabelSpreading([kernel, ...])
+LabelSpreading model for semi-supervised learning.
+
+semi_supervised.SelfTrainingClassifier(...)
+Self-training classifier.
+
+
+
+## [sklearn.svm: Support Vector Machines](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.svm)
+
+svm.LinearSVC([penalty, loss, dual, tol, C, ...])
+Linear Support Vector Classification.
+
+svm.LinearSVR(*[, epsilon, tol, C, loss, ...])
+Linear Support Vector Regression.
+
+svm.NuSVC(*[, nu, kernel, degree, gamma, ...])
+Nu-Support Vector Classification.
+
+svm.NuSVR(*[, nu, C, kernel, degree, gamma, ...])
+Nu Support Vector Regression.
+
+svm.OneClassSVM(*[, kernel, degree, gamma, ...])
+Unsupervised Outlier Detection.
+
+svm.SVC(*[, C, kernel, degree, gamma, ...])
+C-Support Vector Classification.
+
+svm.SVR(*[, kernel, degree, gamma, coef0, ...])
+Epsilon-Support Vector Regression.
+
+svm.l1_min_c(X, y, *[, loss, fit_intercept, ...])
+Return the lowest bound for C.
+
+
+
+## [sklearn.tree: Decision Trees](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.tree)
+
+tree.DecisionTreeClassifier(*[, criterion, ...])
+A decision tree classifier.
+
+tree.DecisionTreeRegressor(*[, criterion, ...])
+A decision tree regressor.
+
+tree.ExtraTreeClassifier(*[, criterion, ...])
+An extremely randomized tree classifier.
+
+tree.ExtraTreeRegressor(*[, criterion, ...])
+An extremely randomized tree regressor.
+
+tree.export_graphviz(decision_tree[, ...])
+Export a decision tree in DOT format.
+
+tree.export_text(decision_tree, *[, ...])
+Build a text report showing the rules of a decision tree.
+
+tree.plot_tree(decision_tree, *[, ...])
+Plot a decision tree.
+
+## [sklearn.utils: Utilities](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.utils)
+
+see [here](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.utils)