# Overview {:.no_toc} ## The goal [scikit-learn](https://scikit-learn.org/stable/index.html) is a machine learning tool kit for data analysis. Questions to [David Rotermund](mailto:davrot@uni-bremen.de) ```shell pip install scikit-learn ``` > * Simple and efficient tools for predictive data analysis > * Accessible to everybody, and reusable in various contexts > * Built on NumPy, SciPy, and matplotlib **I will keep it short and I will mark the most relevant tools in bold** ## [sklearn.base: Base classes and utility functions](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.base) see [here](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.base) ## [sklearn.calibration: Probability Calibration](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.calibration) calibration.CalibratedClassifierCV([...]) Probability calibration with isotonic regression or logistic regression. calibration.calibration_curve(y_true, y_prob, *) Compute true and predicted probabilities for a calibration curve. ## [sklearn.cluster: Clustering](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.cluster) see more [here](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.cluster) cluster.AffinityPropagation(*[, damping, ...]) Perform Affinity Propagation Clustering of data. cluster.AgglomerativeClustering([...]) Agglomerative Clustering. cluster.Birch(*[, threshold, ...]) Implements the BIRCH clustering algorithm. cluster.DBSCAN([eps, min_samples, metric, ...]) Perform DBSCAN clustering from vector array or distance matrix. cluster.HDBSCAN([min_cluster_size, ...]) Cluster data using hierarchical density-based clustering. cluster.FeatureAgglomeration([n_clusters, ...]) Agglomerate features. **cluster.KMeans([n_clusters, init, n_init, ...])** **K-Means clustering.** cluster.BisectingKMeans([n_clusters, init, ...]) Bisecting K-Means clustering. **cluster.MiniBatchKMeans([n_clusters, init, ...])** **Mini-Batch K-Means clustering.** cluster.MeanShift(*[, bandwidth, seeds, ...]) Mean shift clustering using a flat kernel. cluster.OPTICS(*[, min_samples, max_eps, ...]) Estimate clustering structure from vector array. cluster.SpectralClustering([n_clusters, ...]) Apply clustering to a projection of the normalized Laplacian. cluster.SpectralBiclustering([n_clusters, ...]) Spectral biclustering (Kluger, 2003). cluster.SpectralCoclustering([n_clusters, ...]) Spectral Co-Clustering algorithm (Dhillon, 2001). ## [sklearn.compose: Composite Estimators](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.compose) compose.ColumnTransformer(transformers, *[, ...]) Applies transformers to columns of an array or pandas DataFrame. compose.TransformedTargetRegressor([...]) Meta-estimator to regress on a transformed target. compose.make_column_transformer(*transformers) Construct a ColumnTransformer from the given transformers. compose.make_column_selector([pattern, ...]) Create a callable to select columns to be used with ColumnTransformer. ## [sklearn.covariance: Covariance Estimators](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.covariance) covariance.EmpiricalCovariance(*[, ...]) Maximum likelihood covariance estimator. covariance.EllipticEnvelope(*[, ...]) An object for detecting outliers in a Gaussian distributed dataset. covariance.GraphicalLasso([alpha, mode, ...]) Sparse inverse covariance estimation with an l1-penalized estimator. covariance.GraphicalLassoCV(*[, alphas, ...]) Sparse inverse covariance w/ cross-validated choice of the l1 penalty. covariance.LedoitWolf(*[, store_precision, ...]) LedoitWolf Estimator. covariance.MinCovDet(*[, store_precision, ...]) Minimum Covariance Determinant (MCD): robust estimator of covariance. covariance.OAS(*[, store_precision, ...]) Oracle Approximating Shrinkage Estimator as proposed in [R69773891e6a6-1]. covariance.ShrunkCovariance(*[, ...]) Covariance estimator with shrinkage. covariance.empirical_covariance(X, *[, ...]) Compute the Maximum likelihood covariance estimator. covariance.graphical_lasso(emp_cov, alpha, *) L1-penalized covariance estimator. covariance.ledoit_wolf(X, *[, ...]) Estimate the shrunk Ledoit-Wolf covariance matrix. covariance.ledoit_wolf_shrinkage(X[, ...]) Estimate the shrunk Ledoit-Wolf covariance matrix. covariance.oas(X, *[, assume_centered]) Estimate covariance with the Oracle Approximating Shrinkage as proposed in [Rca3a42e5ec35-1]. covariance.shrunk_covariance(emp_cov[, ...]) Calculate a covariance matrix shrunk on the diagonal. ## [sklearn.cross_decomposition: Cross decomposition](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.cross_decomposition) cross_decomposition.CCA([n_components, ...]) Canonical Correlation Analysis, also known as "Mode B" PLS. cross_decomposition.PLSCanonical([...]) Partial Least Squares transformer and regressor. cross_decomposition.PLSRegression([...]) PLS regression. cross_decomposition.PLSSVD([n_components, ...]) Partial Least Square SVD. ## [sklearn.datasets: Datasets](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.datasets) see [here](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.datasets) ## [sklearn.decomposition: Matrix Decomposition](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.decomposition) decomposition.DictionaryLearning([...]) Dictionary learning. decomposition.FactorAnalysis([n_components, ...]) Factor Analysis (FA). **decomposition.FastICA([n_components, ...])** **FastICA: a fast algorithm for Independent Component Analysis.** decomposition.IncrementalPCA([n_components, ...]) Incremental principal components analysis (IPCA). decomposition.KernelPCA([n_components, ...]) Kernel Principal component analysis (KPCA) [R396fc7d924b8-1]. decomposition.LatentDirichletAllocation([...]) Latent Dirichlet Allocation with online variational Bayes algorithm. decomposition.MiniBatchDictionaryLearning([...]) Mini-batch dictionary learning. decomposition.MiniBatchSparsePCA([...]) Mini-batch Sparse Principal Components Analysis. decomposition.NMF([n_components, init, ...]) Non-Negative Matrix Factorization (NMF). decomposition.MiniBatchNMF([n_components, ...]) Mini-Batch Non-Negative Matrix Factorization (NMF). **decomposition.PCA([n_components, copy, ...])** **Principal component analysis (PCA).** decomposition.SparsePCA([n_components, ...]) Sparse Principal Components Analysis (SparsePCA). decomposition.SparseCoder(dictionary, *[, ...]) Sparse coding. decomposition.TruncatedSVD([n_components, ...]) Dimensionality reduction using truncated SVD (aka LSA). decomposition.dict_learning(X, n_components, ...) Solve a dictionary learning matrix factorization problem. decomposition.dict_learning_online(X[, ...]) Solve a dictionary learning matrix factorization problem online. decomposition.fastica(X[, n_components, ...]) Perform Fast Independent Component Analysis. decomposition.non_negative_factorization(X) Compute Non-negative Matrix Factorization (NMF). decomposition.sparse_encode(X, dictionary, *) Sparse coding. ## [sklearn.discriminant_analysis: Discriminant Analysis](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.discriminant_analysis) discriminant_analysis.LinearDiscriminantAnalysis([...]) Linear Discriminant Analysis. discriminant_analysis.QuadraticDiscriminantAnalysis(*) Quadratic Discriminant Analysis. ## [sklearn.dummy: Dummy estimators](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.dummy) dummy.DummyClassifier(*[, strategy, ...]) DummyClassifier makes predictions that ignore the input features. dummy.DummyRegressor(*[, strategy, ...]) Regressor that makes predictions using simple rules. ## [sklearn.ensemble: Ensemble Methods](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.ensemble) ensemble.AdaBoostClassifier([estimator, ...]) An AdaBoost classifier. ensemble.AdaBoostRegressor([estimator, ...]) An AdaBoost regressor. ensemble.BaggingClassifier([estimator, ...]) A Bagging classifier. ensemble.BaggingRegressor([estimator, ...]) A Bagging regressor. ensemble.ExtraTreesClassifier([...]) An extra-trees classifier. ensemble.ExtraTreesRegressor([n_estimators, ...]) An extra-trees regressor. ensemble.GradientBoostingClassifier(*[, ...]) Gradient Boosting for classification. ensemble.GradientBoostingRegressor(*[, ...]) Gradient Boosting for regression. ensemble.IsolationForest(*[, n_estimators, ...]) Isolation Forest Algorithm. ensemble.RandomForestClassifier([...]) A random forest classifier. ensemble.RandomForestRegressor([...]) A random forest regressor. ensemble.RandomTreesEmbedding([...]) An ensemble of totally random trees. ensemble.StackingClassifier(estimators[, ...]) Stack of estimators with a final classifier. ensemble.StackingRegressor(estimators[, ...]) Stack of estimators with a final regressor. ensemble.VotingClassifier(estimators, *[, ...]) Soft Voting/Majority Rule classifier for unfitted estimators. ensemble.VotingRegressor(estimators, *[, ...]) Prediction voting regressor for unfitted estimators. ensemble.HistGradientBoostingRegressor([...]) Histogram-based Gradient Boosting Regression Tree. ensemble.HistGradientBoostingClassifier([...]) Histogram-based Gradient Boosting Classification Tree. ## [sklearn.exceptions: Exceptions and warnings](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.exceptions) see [here](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.exceptions) ## [sklearn.experimental: Experimental](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.experimental) see [here](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.experimental) ## [sklearn.preprocessing: Preprocessing and Normalization](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.preprocessing) preprocessing.Binarizer(*[, threshold, copy]) Binarize data (set feature values to 0 or 1) according to a threshold. preprocessing.FunctionTransformer([func, ...]) Constructs a transformer from an arbitrary callable. preprocessing.KBinsDiscretizer([n_bins, ...]) Bin continuous data into intervals. preprocessing.KernelCenterer() Center an arbitrary kernel matrix preprocessing.LabelBinarizer(*[, neg_label, ...]) Binarize labels in a one-vs-all fashion. preprocessing.LabelEncoder() Encode target labels with value between 0 and n_classes-1. preprocessing.MultiLabelBinarizer(*[, ...]) Transform between iterable of iterables and a multilabel format. preprocessing.MaxAbsScaler(*[, copy]) Scale each feature by its maximum absolute value. preprocessing.MinMaxScaler([feature_range, ...]) Transform features by scaling each feature to a given range. preprocessing.Normalizer([norm, copy]) Normalize samples individually to unit norm. preprocessing.OneHotEncoder(*[, categories, ...]) Encode categorical features as a one-hot numeric array. preprocessing.OrdinalEncoder(*[, ...]) Encode categorical features as an integer array. preprocessing.PolynomialFeatures([degree, ...]) Generate polynomial and interaction features. preprocessing.PowerTransformer([method, ...]) Apply a power transform featurewise to make data more Gaussian-like. preprocessing.QuantileTransformer(*[, ...]) Transform features using quantiles information. preprocessing.RobustScaler(*[, ...]) Scale features using statistics that are robust to outliers. preprocessing.SplineTransformer([n_knots, ...]) Generate univariate B-spline bases for features. preprocessing.StandardScaler(*[, copy, ...]) Standardize features by removing the mean and scaling to unit variance. preprocessing.TargetEncoder([categories, ...]) Target Encoder for regression and classification targets. preprocessing.add_dummy_feature(X[, value]) Augment dataset with an additional dummy feature. preprocessing.binarize(X, *[, threshold, copy]) Boolean thresholding of array-like or scipy.sparse matrix. preprocessing.label_binarize(y, *, classes) Binarize labels in a one-vs-all fashion. preprocessing.maxabs_scale(X, *[, axis, copy]) Scale each feature to the [-1, 1] range without breaking the sparsity. preprocessing.minmax_scale(X[, ...]) Transform features by scaling each feature to a given range. preprocessing.normalize(X[, norm, axis, ...]) Scale input vectors individually to unit norm (vector length). preprocessing.quantile_transform(X, *[, ...]) Transform features using quantiles information. preprocessing.robust_scale(X, *[, axis, ...]) Standardize a dataset along any axis. preprocessing.scale(X, *[, axis, with_mean, ...]) Standardize a dataset along any axis. preprocessing.power_transform(X[, method, ...]) Parametric, monotonic transformation to make data more Gaussian-like. ## [sklearn.random_projection: Random projection](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.random_projection) random_projection.GaussianRandomProjection([...]) Reduce dimensionality through Gaussian random projection. random_projection.SparseRandomProjection([...]) Reduce dimensionality through sparse random projection. random_projection.johnson_lindenstrauss_min_dim(...) Find a 'safe' number of components to randomly project to. ## [sklearn.semi_supervised: Semi-Supervised Learning](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.semi_supervised) semi_supervised.LabelPropagation([kernel, ...]) Label Propagation classifier. semi_supervised.LabelSpreading([kernel, ...]) LabelSpreading model for semi-supervised learning. semi_supervised.SelfTrainingClassifier(...) Self-training classifier. ## [sklearn.svm: Support Vector Machines](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.svm) svm.LinearSVC([penalty, loss, dual, tol, C, ...]) Linear Support Vector Classification. svm.LinearSVR(*[, epsilon, tol, C, loss, ...]) Linear Support Vector Regression. svm.NuSVC(*[, nu, kernel, degree, gamma, ...]) Nu-Support Vector Classification. svm.NuSVR(*[, nu, C, kernel, degree, gamma, ...]) Nu Support Vector Regression. svm.OneClassSVM(*[, kernel, degree, gamma, ...]) Unsupervised Outlier Detection. svm.SVC(*[, C, kernel, degree, gamma, ...]) C-Support Vector Classification. svm.SVR(*[, kernel, degree, gamma, coef0, ...]) Epsilon-Support Vector Regression. svm.l1_min_c(X, y, *[, loss, fit_intercept, ...]) Return the lowest bound for C. ## [sklearn.tree: Decision Trees](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.tree) tree.DecisionTreeClassifier(*[, criterion, ...]) A decision tree classifier. tree.DecisionTreeRegressor(*[, criterion, ...]) A decision tree regressor. tree.ExtraTreeClassifier(*[, criterion, ...]) An extremely randomized tree classifier. tree.ExtraTreeRegressor(*[, criterion, ...]) An extremely randomized tree regressor. tree.export_graphviz(decision_tree[, ...]) Export a decision tree in DOT format. tree.export_text(decision_tree, *[, ...]) Build a text report showing the rules of a decision tree. tree.plot_tree(decision_tree, *[, ...]) Plot a decision tree. ## [sklearn.utils: Utilities](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.utils) see [here](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.utils)