pytutorial/scikit-learn/overview
David Rotermund ef4f69b54c
Update README.md
Signed-off-by: David Rotermund <54365609+davrot@users.noreply.github.com>
2023-12-16 14:52:33 +01:00
..
README.md Update README.md 2023-12-16 14:52:33 +01:00

Overview

{:.no_toc}

* TOC {:toc}

The goal

scikit-learn is a machine learning tool kit for data analysis.

Questions to David Rotermund

pip install scikit-learn
  • Simple and efficient tools for predictive data analysis
  • Accessible to everybody, and reusable in various contexts
  • Built on NumPy, SciPy, and matplotlib

I will keep it short and I will mark the most relevant tools in bold

sklearn.base: Base classes and utility functions

see here

sklearn.calibration: Probability Calibration

calibration.CalibratedClassifierCV([...]) Probability calibration with isotonic regression or logistic regression.

calibration.calibration_curve(y_true, y_prob, *) Compute true and predicted probabilities for a calibration curve.

sklearn.cluster: Clustering

Classes

cluster.AffinityPropagation(*[, damping, ...]) Perform Affinity Propagation Clustering of data.

cluster.AgglomerativeClustering([...]) Agglomerative Clustering.

cluster.Birch(*[, threshold, ...]) Implements the BIRCH clustering algorithm.

cluster.DBSCAN([eps, min_samples, metric, ...]) Perform DBSCAN clustering from vector array or distance matrix.

cluster.HDBSCAN([min_cluster_size, ...]) Cluster data using hierarchical density-based clustering.

cluster.FeatureAgglomeration([n_clusters, ...]) Agglomerate features.

cluster.KMeans([n_clusters, init, n_init, ...]) K-Means clustering.

cluster.BisectingKMeans([n_clusters, init, ...]) Bisecting K-Means clustering.

cluster.MiniBatchKMeans([n_clusters, init, ...]) Mini-Batch K-Means clustering.

cluster.MeanShift(*[, bandwidth, seeds, ...]) Mean shift clustering using a flat kernel.

cluster.OPTICS(*[, min_samples, max_eps, ...]) Estimate clustering structure from vector array.

cluster.SpectralClustering([n_clusters, ...]) Apply clustering to a projection of the normalized Laplacian.

cluster.SpectralBiclustering([n_clusters, ...]) Spectral biclustering (Kluger, 2003).

cluster.SpectralCoclustering([n_clusters, ...]) Spectral Co-Clustering algorithm (Dhillon, 2001).

Functions

cluster.affinity_propagation(S, *[, ...]) Perform Affinity Propagation Clustering of data.

cluster.cluster_optics_dbscan(*, ...) Perform DBSCAN extraction for an arbitrary epsilon.

cluster.cluster_optics_xi(*, reachability, ...) Automatically extract clusters according to the Xi-steep method.

cluster.compute_optics_graph(X, *, ...) Compute the OPTICS reachability graph.

cluster.dbscan(X[, eps, min_samples, ...]) Perform DBSCAN clustering from vector array or distance matrix.

cluster.estimate_bandwidth(X, *[, quantile, ...]) Estimate the bandwidth to use with the mean-shift algorithm.

cluster.k_means(X, n_clusters, *[, ...]) Perform K-means clustering algorithm.

cluster.kmeans_plusplus(X, n_clusters, *[, ...]) Init n_clusters seeds according to k-means++.

cluster.mean_shift(X, *[, bandwidth, seeds, ...]) Perform mean shift clustering of data using a flat kernel.

cluster.spectral_clustering(affinity, *[, ...]) Apply clustering to a projection of the normalized Laplacian.

cluster.ward_tree(X, *[, connectivity, ...]) Ward clustering based on a Feature matrix.

sklearn.compose: Composite Estimators

compose.ColumnTransformer(transformers, *[, ...]) Applies transformers to columns of an array or pandas DataFrame.

compose.TransformedTargetRegressor([...]) Meta-estimator to regress on a transformed target.

compose.make_column_transformer(*transformers) Construct a ColumnTransformer from the given transformers.

compose.make_column_selector([pattern, ...]) Create a callable to select columns to be used with ColumnTransformer.

sklearn.covariance: Covariance Estimators

covariance.EmpiricalCovariance(*[, ...]) Maximum likelihood covariance estimator.

covariance.EllipticEnvelope(*[, ...]) An object for detecting outliers in a Gaussian distributed dataset.

covariance.GraphicalLasso([alpha, mode, ...]) Sparse inverse covariance estimation with an l1-penalized estimator.

covariance.GraphicalLassoCV(*[, alphas, ...]) Sparse inverse covariance w/ cross-validated choice of the l1 penalty.

covariance.LedoitWolf(*[, store_precision, ...]) LedoitWolf Estimator.

covariance.MinCovDet(*[, store_precision, ...]) Minimum Covariance Determinant (MCD): robust estimator of covariance.

covariance.OAS(*[, store_precision, ...]) Oracle Approximating Shrinkage Estimator as proposed in [R69773891e6a6-1].

covariance.ShrunkCovariance(*[, ...]) Covariance estimator with shrinkage.

covariance.empirical_covariance(X, *[, ...]) Compute the Maximum likelihood covariance estimator.

covariance.graphical_lasso(emp_cov, alpha, *) L1-penalized covariance estimator.

covariance.ledoit_wolf(X, *[, ...]) Estimate the shrunk Ledoit-Wolf covariance matrix.

covariance.ledoit_wolf_shrinkage(X[, ...]) Estimate the shrunk Ledoit-Wolf covariance matrix.

covariance.oas(X, *[, assume_centered]) Estimate covariance with the Oracle Approximating Shrinkage as proposed in [Rca3a42e5ec35-1].

covariance.shrunk_covariance(emp_cov[, ...]) Calculate a covariance matrix shrunk on the diagonal.

sklearn.cross_decomposition: Cross decomposition

cross_decomposition.CCA([n_components, ...]) Canonical Correlation Analysis, also known as "Mode B" PLS.

cross_decomposition.PLSCanonical([...]) Partial Least Squares transformer and regressor.

cross_decomposition.PLSRegression([...]) PLS regression.

cross_decomposition.PLSSVD([n_components, ...]) Partial Least Square SVD.

sklearn.datasets: Datasets

see here

sklearn.decomposition: Matrix Decomposition

decomposition.DictionaryLearning([...]) Dictionary learning.

decomposition.FactorAnalysis([n_components, ...]) Factor Analysis (FA).

decomposition.FastICA([n_components, ...]) FastICA: a fast algorithm for Independent Component Analysis.

decomposition.IncrementalPCA([n_components, ...]) Incremental principal components analysis (IPCA).

decomposition.KernelPCA([n_components, ...]) Kernel Principal component analysis (KPCA) [R396fc7d924b8-1].

decomposition.LatentDirichletAllocation([...]) Latent Dirichlet Allocation with online variational Bayes algorithm.

decomposition.MiniBatchDictionaryLearning([...]) Mini-batch dictionary learning.

decomposition.MiniBatchSparsePCA([...]) Mini-batch Sparse Principal Components Analysis.

decomposition.NMF([n_components, init, ...]) Non-Negative Matrix Factorization (NMF).

decomposition.MiniBatchNMF([n_components, ...]) Mini-Batch Non-Negative Matrix Factorization (NMF).

decomposition.PCA([n_components, copy, ...]) Principal component analysis (PCA).

decomposition.SparsePCA([n_components, ...]) Sparse Principal Components Analysis (SparsePCA).

decomposition.SparseCoder(dictionary, *[, ...]) Sparse coding.

decomposition.TruncatedSVD([n_components, ...]) Dimensionality reduction using truncated SVD (aka LSA).

decomposition.dict_learning(X, n_components, ...) Solve a dictionary learning matrix factorization problem.

decomposition.dict_learning_online(X[, ...]) Solve a dictionary learning matrix factorization problem online.

decomposition.fastica(X[, n_components, ...]) Perform Fast Independent Component Analysis.

decomposition.non_negative_factorization(X) Compute Non-negative Matrix Factorization (NMF).

decomposition.sparse_encode(X, dictionary, *) Sparse coding.

sklearn.discriminant_analysis: Discriminant Analysis

discriminant_analysis.LinearDiscriminantAnalysis([...]) Linear Discriminant Analysis.

discriminant_analysis.QuadraticDiscriminantAnalysis(*) Quadratic Discriminant Analysis.

sklearn.dummy: Dummy estimators

dummy.DummyClassifier(*[, strategy, ...]) DummyClassifier makes predictions that ignore the input features.

dummy.DummyRegressor(*[, strategy, ...]) Regressor that makes predictions using simple rules.

sklearn.ensemble: Ensemble Methods

ensemble.AdaBoostClassifier([estimator, ...]) An AdaBoost classifier.

ensemble.AdaBoostRegressor([estimator, ...]) An AdaBoost regressor.

ensemble.BaggingClassifier([estimator, ...]) A Bagging classifier.

ensemble.BaggingRegressor([estimator, ...]) A Bagging regressor.

ensemble.ExtraTreesClassifier([...]) An extra-trees classifier.

ensemble.ExtraTreesRegressor([n_estimators, ...]) An extra-trees regressor.

ensemble.GradientBoostingClassifier(*[, ...]) Gradient Boosting for classification.

ensemble.GradientBoostingRegressor(*[, ...]) Gradient Boosting for regression.

ensemble.IsolationForest(*[, n_estimators, ...]) Isolation Forest Algorithm.

ensemble.RandomForestClassifier([...]) A random forest classifier.

ensemble.RandomForestRegressor([...]) A random forest regressor.

ensemble.RandomTreesEmbedding([...]) An ensemble of totally random trees.

ensemble.StackingClassifier(estimators[, ...]) Stack of estimators with a final classifier.

ensemble.StackingRegressor(estimators[, ...]) Stack of estimators with a final regressor.

ensemble.VotingClassifier(estimators, *[, ...]) Soft Voting/Majority Rule classifier for unfitted estimators.

ensemble.VotingRegressor(estimators, *[, ...]) Prediction voting regressor for unfitted estimators.

ensemble.HistGradientBoostingRegressor([...]) Histogram-based Gradient Boosting Regression Tree.

ensemble.HistGradientBoostingClassifier([...]) Histogram-based Gradient Boosting Classification Tree.

sklearn.exceptions: Exceptions and warnings

see here

sklearn.experimental: Experimental

see here

sklearn.feature_extraction: Feature Extraction

feature_extraction.DictVectorizer(*[, ...]) Transforms lists of feature-value mappings to vectors.

feature_extraction.FeatureHasher([...]) Implements feature hashing, aka the hashing trick.

From images

feature_extraction.image.extract_patches_2d(...) Reshape a 2D image into a collection of patches.

feature_extraction.image.grid_to_graph(n_x, n_y) Graph of the pixel-to-pixel connections.

feature_extraction.image.img_to_graph(img, *) Graph of the pixel-to-pixel gradient connections.

feature_extraction.image.reconstruct_from_patches_2d(...) Reconstruct the image from all of its patches.

feature_extraction.image.PatchExtractor(*[, ...]) Extracts patches from a collection of images.

From text

feature_extraction.text.CountVectorizer(*[, ...]) Convert a collection of text documents to a matrix of token counts.

feature_extraction.text.HashingVectorizer(*) Convert a collection of text documents to a matrix of token occurrences.

feature_extraction.text.TfidfTransformer(*) Transform a count matrix to a normalized tf or tf-idf representation.

feature_extraction.text.TfidfVectorizer(*[, ...]) Convert a collection of raw documents to a matrix of TF-IDF features.

sklearn.feature_selection: Feature Selection

feature_selection.GenericUnivariateSelect([...]) Univariate feature selector with configurable strategy.

feature_selection.SelectPercentile([...]) Select features according to a percentile of the highest scores.

feature_selection.SelectKBest([score_func, k]) Select features according to the k highest scores.

feature_selection.SelectFpr([score_func, alpha]) Filter: Select the pvalues below alpha based on a FPR test.

feature_selection.SelectFdr([score_func, alpha]) Filter: Select the p-values for an estimated false discovery rate.

feature_selection.SelectFromModel(estimator, *) Meta-transformer for selecting features based on importance weights.

feature_selection.SelectFwe([score_func, alpha]) Filter: Select the p-values corresponding to Family-wise error rate.

feature_selection.SequentialFeatureSelector(...) Transformer that performs Sequential Feature Selection.

feature_selection.RFE(estimator, *[, ...]) Feature ranking with recursive feature elimination.

feature_selection.RFECV(estimator, *[, ...]) Recursive feature elimination with cross-validation to select features.

feature_selection.VarianceThreshold([threshold]) Feature selector that removes all low-variance features.

feature_selection.chi2(X, y) Compute chi-squared stats between each non-negative feature and class.

feature_selection.f_classif(X, y) Compute the ANOVA F-value for the provided sample.

feature_selection.f_regression(X, y, *[, ...]) Univariate linear regression tests returning F-statistic and p-values.

feature_selection.r_regression(X, y, *[, ...]) Compute Pearson's r for each features and the target.

feature_selection.mutual_info_classif(X, y, *) Estimate mutual information for a discrete target variable.

feature_selection.mutual_info_regression(X, y, *) Estimate mutual information for a continuous target variable.

sklearn.gaussian_process: Gaussian Processes

gaussian_process.GaussianProcessClassifier([...]) Gaussian process classification (GPC) based on Laplace approximation.

gaussian_process.GaussianProcessRegressor([...]) Gaussian process regression (GPR).

Kernels

gaussian_process.kernels.CompoundKernel(kernels) Kernel which is composed of a set of other kernels.

gaussian_process.kernels.ConstantKernel([...]) Constant kernel.

gaussian_process.kernels.DotProduct([...]) Dot-Product kernel.

gaussian_process.kernels.ExpSineSquared([...]) Exp-Sine-Squared kernel (aka periodic kernel).

gaussian_process.kernels.Exponentiation(...) The Exponentiation kernel takes one base kernel and a scalar parameter and combines them via

gaussian_process.kernels.Hyperparameter(...) A kernel hyperparameter's specification in form of a namedtuple.

gaussian_process.kernels.Kernel() Base class for all kernels.

gaussian_process.kernels.Matern([...]) Matern kernel.

gaussian_process.kernels.PairwiseKernel([...]) Wrapper for kernels in sklearn.metrics.pairwise.

gaussian_process.kernels.Product(k1, k2) The Product kernel takes two kernels k1 and k2 and combines them via

gaussian_process.kernels.RBF([length_scale, ...]) Radial basis function kernel (aka squared-exponential kernel).

gaussian_process.kernels.RationalQuadratic([...]) Rational Quadratic kernel.

gaussian_process.kernels.Sum(k1, k2) The Sum kernel takes two kernels k1 and k2 and combines them via

gaussian_process.kernels.WhiteKernel([...]) White kernel.

sklearn.impute: Impute

impute.SimpleImputer(*[, missing_values, ...]) Univariate imputer for completing missing values with simple strategies.

impute.IterativeImputer([estimator, ...]) Multivariate imputer that estimates each feature from all the others.

impute.MissingIndicator(*[, missing_values, ...]) Binary indicators for missing values.

impute.KNNImputer(*[, missing_values, ...]) Imputation for completing missing values using k-Nearest Neighbors.

sklearn.inspection: Inspection

inspection.partial_dependence(estimator, X, ...) Partial dependence of features.

inspection.permutation_importance(estimator, ...) Permutation importance for feature evaluation [Rd9e56ef97513-BRE].

Plotting

inspection.DecisionBoundaryDisplay(*, xx0, ...) Decisions boundary visualization.

inspection.PartialDependenceDisplay(...[, ...]) Partial Dependence Plot (PDP).

sklearn.isotonic: Isotonic regression

isotonic.IsotonicRegression(*[, y_min, ...]) Isotonic regression model.

isotonic.check_increasing(x, y) Determine whether y is monotonically correlated with x.

isotonic.isotonic_regression(y, *[, ...]) Solve the isotonic regression model.

sklearn.kernel_approximation: Kernel Approximation

kernel_approximation.AdditiveChi2Sampler(*) Approximate feature map for additive chi2 kernel.

kernel_approximation.Nystroem([kernel, ...]) Approximate a kernel map using a subset of the training data.

kernel_approximation.PolynomialCountSketch(*) Polynomial kernel approximation via Tensor Sketch.

kernel_approximation.RBFSampler(*[, gamma, ...]) Approximate a RBF kernel feature map using random Fourier features.

kernel_approximation.SkewedChi2Sampler(*[, ...]) Approximate feature map for "skewed chi-squared" kernel.

---------------

sklearn.metrics: Metrics

Model Selection Interface

metrics.check_scoring(estimator[, scoring, ...]) Determine scorer from user options.

metrics.get_scorer(scoring) Get a scorer from string.

metrics.get_scorer_names() Get the names of all available scorers.

metrics.make_scorer(score_func, *[, ...]) Make a scorer from a performance metric or loss function.

Classification metrics

metrics.accuracy_score(y_true, y_pred, *[, ...]) Accuracy classification score.

metrics.auc(x, y) Compute Area Under the Curve (AUC) using the trapezoidal rule.

metrics.average_precision_score(y_true, ...) Compute average precision (AP) from prediction scores.

metrics.balanced_accuracy_score(y_true, ...) Compute the balanced accuracy.

metrics.brier_score_loss(y_true, y_prob, *) Compute the Brier score loss.

metrics.class_likelihood_ratios(y_true, ...) Compute binary classification positive and negative likelihood ratios.

metrics.classification_report(y_true, y_pred, *) Build a text report showing the main classification metrics.

metrics.cohen_kappa_score(y1, y2, *[, ...]) Compute Cohen's kappa: a statistic that measures inter-annotator agreement.

metrics.confusion_matrix(y_true, y_pred, *) Compute confusion matrix to evaluate the accuracy of a classification.

metrics.dcg_score(y_true, y_score, *[, k, ...]) Compute Discounted Cumulative Gain.

metrics.det_curve(y_true, y_score[, ...]) Compute error rates for different probability thresholds.

metrics.f1_score(y_true, y_pred, *[, ...]) Compute the F1 score, also known as balanced F-score or F-measure.

metrics.fbeta_score(y_true, y_pred, *, beta) Compute the F-beta score.

metrics.hamming_loss(y_true, y_pred, *[, ...]) Compute the average Hamming loss.

metrics.hinge_loss(y_true, pred_decision, *) Average hinge loss (non-regularized).

metrics.jaccard_score(y_true, y_pred, *[, ...]) Jaccard similarity coefficient score.

metrics.log_loss(y_true, y_pred, *[, eps, ...]) Log loss, aka logistic loss or cross-entropy loss.

metrics.matthews_corrcoef(y_true, y_pred, *) Compute the Matthews correlation coefficient (MCC).

metrics.multilabel_confusion_matrix(y_true, ...) Compute a confusion matrix for each class or sample.

metrics.ndcg_score(y_true, y_score, *[, k, ...]) Compute Normalized Discounted Cumulative Gain.

metrics.precision_recall_curve(y_true, ...) Compute precision-recall pairs for different probability thresholds.

metrics.precision_recall_fscore_support(...) Compute precision, recall, F-measure and support for each class.

metrics.precision_score(y_true, y_pred, *[, ...]) Compute the precision.

metrics.recall_score(y_true, y_pred, *[, ...]) Compute the recall.

metrics.roc_auc_score(y_true, y_score, *[, ...]) Compute Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores.

metrics.roc_curve(y_true, y_score, *[, ...]) Compute Receiver operating characteristic (ROC).

metrics.top_k_accuracy_score(y_true, y_score, *) Top-k Accuracy classification score.

metrics.zero_one_loss(y_true, y_pred, *[, ...]) Zero-one classification loss.

Regression metrics

metrics.explained_variance_score(y_true, ...) Explained variance regression score function.

metrics.max_error(y_true, y_pred) The max_error metric calculates the maximum residual error.

metrics.mean_absolute_error(y_true, y_pred, *) Mean absolute error regression loss.

metrics.mean_squared_error(y_true, y_pred, *) Mean squared error regression loss.

metrics.mean_squared_log_error(y_true, y_pred, *) Mean squared logarithmic error regression loss.

metrics.median_absolute_error(y_true, y_pred, *) Median absolute error regression loss.

metrics.mean_absolute_percentage_error(...) Mean absolute percentage error (MAPE) regression loss.

metrics.r2_score(y_true, y_pred, *[, ...]) R^2 (coefficient of determination) regression score function.

metrics.mean_poisson_deviance(y_true, y_pred, *) Mean Poisson deviance regression loss.

metrics.mean_gamma_deviance(y_true, y_pred, *) Mean Gamma deviance regression loss.

metrics.mean_tweedie_deviance(y_true, y_pred, *) Mean Tweedie deviance regression loss.

metrics.d2_tweedie_score(y_true, y_pred, *) D^2 regression score function, fraction of Tweedie deviance explained.

metrics.mean_pinball_loss(y_true, y_pred, *) Pinball loss for quantile regression.

metrics.d2_pinball_score(y_true, y_pred, *) D^2 regression score function, fraction of pinball loss explained.

metrics.d2_absolute_error_score(y_true, ...) D^2 regression score function, fraction of absolute error explained.

Multilabel ranking metrics

metrics.coverage_error(y_true, y_score, *[, ...]) Coverage error measure.

metrics.label_ranking_average_precision_score(...) Compute ranking-based average precision.

metrics.label_ranking_loss(y_true, y_score, *) Compute Ranking loss measure.

Clustering metrics

metrics.adjusted_mutual_info_score(...[, ...]) Adjusted Mutual Information between two clusterings.

metrics.adjusted_rand_score(labels_true, ...) Rand index adjusted for chance.

metrics.calinski_harabasz_score(X, labels) Compute the Calinski and Harabasz score.

metrics.davies_bouldin_score(X, labels) Compute the Davies-Bouldin score.

metrics.completeness_score(labels_true, ...) Compute completeness metric of a cluster labeling given a ground truth.

metrics.cluster.contingency_matrix(...[, ...]) Build a contingency matrix describing the relationship between labels.

metrics.cluster.pair_confusion_matrix(...) Pair confusion matrix arising from two clusterings [R9ca8fd06d29a-1].

metrics.fowlkes_mallows_score(labels_true, ...) Measure the similarity of two clusterings of a set of points.

metrics.homogeneity_completeness_v_measure(...) Compute the homogeneity and completeness and V-Measure scores at once.

metrics.homogeneity_score(labels_true, ...) Homogeneity metric of a cluster labeling given a ground truth.

metrics.mutual_info_score(labels_true, ...) Mutual Information between two clusterings.

metrics.normalized_mutual_info_score(...[, ...]) Normalized Mutual Information between two clusterings.

metrics.rand_score(labels_true, labels_pred) Rand index.

metrics.silhouette_score(X, labels, *[, ...]) Compute the mean Silhouette Coefficient of all samples.

metrics.silhouette_samples(X, labels, *[, ...]) Compute the Silhouette Coefficient for each sample.

metrics.v_measure_score(labels_true, ...[, beta]) V-measure cluster labeling given a ground truth.

Biclustering metrics

metrics.consensus_score(a, b, *[, similarity]) The similarity of two sets of biclusters.

Distance metrics

metrics.DistanceMetric Uniform interface for fast distance metric functions.

Pairwise metrics

metrics.pairwise.additive_chi2_kernel(X[, Y]) Compute the additive chi-squared kernel between observations in X and Y.

metrics.pairwise.chi2_kernel(X[, Y, gamma]) Compute the exponential chi-squared kernel between X and Y.

metrics.pairwise.cosine_similarity(X[, Y, ...]) Compute cosine similarity between samples in X and Y.

metrics.pairwise.cosine_distances(X[, Y]) Compute cosine distance between samples in X and Y.

metrics.pairwise.distance_metrics() Valid metrics for pairwise_distances.

metrics.pairwise.euclidean_distances(X[, Y, ...]) Compute the distance matrix between each pair from a vector array X and Y.

metrics.pairwise.haversine_distances(X[, Y]) Compute the Haversine distance between samples in X and Y.

metrics.pairwise.kernel_metrics() Valid metrics for pairwise_kernels.

metrics.pairwise.laplacian_kernel(X[, Y, gamma]) Compute the laplacian kernel between X and Y.

metrics.pairwise.linear_kernel(X[, Y, ...]) Compute the linear kernel between X and Y.

metrics.pairwise.manhattan_distances(X[, Y, ...]) Compute the L1 distances between the vectors in X and Y.

metrics.pairwise.nan_euclidean_distances(X) Calculate the euclidean distances in the presence of missing values.

metrics.pairwise.pairwise_kernels(X[, Y, ...]) Compute the kernel between arrays X and optional array Y.

metrics.pairwise.polynomial_kernel(X[, Y, ...]) Compute the polynomial kernel between X and Y.

metrics.pairwise.rbf_kernel(X[, Y, gamma]) Compute the rbf (gaussian) kernel between X and Y.

metrics.pairwise.sigmoid_kernel(X[, Y, ...]) Compute the sigmoid kernel between X and Y.

metrics.pairwise.paired_euclidean_distances(X, Y) Compute the paired euclidean distances between X and Y.

metrics.pairwise.paired_manhattan_distances(X, Y) Compute the paired L1 distances between X and Y.

metrics.pairwise.paired_cosine_distances(X, Y) Compute the paired cosine distances between X and Y.

metrics.pairwise.paired_distances(X, Y, *[, ...]) Compute the paired distances between X and Y.

metrics.pairwise_distances(X[, Y, metric, ...]) Compute the distance matrix from a vector array X and optional Y.

metrics.pairwise_distances_argmin(X, Y, *[, ...]) Compute minimum distances between one point and a set of points.

metrics.pairwise_distances_argmin_min(X, Y, *) Compute minimum distances between one point and a set of points.

metrics.pairwise_distances_chunked(X[, Y, ...]) Generate a distance matrix chunk by chunk with optional reduction.

Plotting

metrics.ConfusionMatrixDisplay(...[, ...]) Confusion Matrix visualization.

metrics.DetCurveDisplay(*, fpr, fnr[, ...]) DET curve visualization.

metrics.PrecisionRecallDisplay(precision, ...) Precision Recall visualization.

metrics.PredictionErrorDisplay(*, y_true, y_pred) Visualization of the prediction error of a regression model.

metrics.RocCurveDisplay(*, fpr, tpr[, ...]) ROC Curve visualization.

calibration.CalibrationDisplay(prob_true, ...) Calibration curve (also known as reliability diagram) visualization.

sklearn.mixture: Gaussian Mixture Models

mixture.BayesianGaussianMixture(*[, ...]) Variational Bayesian estimation of a Gaussian mixture.

mixture.GaussianMixture([n_components, ...]) Gaussian Mixture.

sklearn.model_selection: Model Selection

Splitter Classes

model_selection.GroupKFold([n_splits]) K-fold iterator variant with non-overlapping groups.

model_selection.GroupShuffleSplit([...]) Shuffle-Group(s)-Out cross-validation iterator

model_selection.KFold([n_splits, shuffle, ...]) K-Folds cross-validator

model_selection.LeaveOneGroupOut() Leave One Group Out cross-validator

model_selection.LeavePGroupsOut(n_groups) Leave P Group(s) Out cross-validator

model_selection.LeaveOneOut() Leave-One-Out cross-validator

model_selection.LeavePOut(p) Leave-P-Out cross-validator

model_selection.PredefinedSplit(test_fold) Predefined split cross-validator

model_selection.RepeatedKFold(*[, n_splits, ...]) Repeated K-Fold cross validator.

model_selection.RepeatedStratifiedKFold(*[, ...]) Repeated Stratified K-Fold cross validator.

model_selection.ShuffleSplit([n_splits, ...]) Random permutation cross-validator

model_selection.StratifiedKFold([n_splits, ...]) Stratified K-Folds cross-validator.

model_selection.StratifiedShuffleSplit([...]) Stratified ShuffleSplit cross-validator

model_selection.StratifiedGroupKFold([...]) Stratified K-Folds iterator variant with non-overlapping groups.

model_selection.TimeSeriesSplit([n_splits, ...]) Time Series cross-validator

Splitter Functions

model_selection.check_cv([cv, y, classifier]) Input checker utility for building a cross-validator.

model_selection.train_test_split(*arrays[, ...]) Split arrays or matrices into random train and test subsets.

##ä Hyper-parameter optimizers

model_selection.GridSearchCV(estimator, ...) Exhaustive search over specified parameter values for an estimator.

model_selection.HalvingGridSearchCV(...[, ...]) Search over specified parameter values with successive halving.

model_selection.ParameterGrid(param_grid) Grid of parameters with a discrete number of values for each.

model_selection.ParameterSampler(...[, ...]) Generator on parameters sampled from given distributions.

model_selection.RandomizedSearchCV(...[, ...]) Randomized search on hyper parameters.

model_selection.HalvingRandomSearchCV(...[, ...]) Randomized search on hyper parameters.

Model validation

model_selection.cross_validate(estimator, X) Evaluate metric(s) by cross-validation and also record fit/score times.

model_selection.cross_val_predict(estimator, X) Generate cross-validated estimates for each input data point.

model_selection.cross_val_score(estimator, X) Evaluate a score by cross-validation.

model_selection.learning_curve(estimator, X, ...) Learning curve.

model_selection.permutation_test_score(...) Evaluate the significance of a cross-validated score with permutations.

model_selection.validation_curve(estimator, ...) Validation curve.

Visualization

model_selection.LearningCurveDisplay(*, ...) Learning Curve visualization.

model_selection.ValidationCurveDisplay(*, ...) Validation Curve visualization.

sklearn.multiclass: Multiclass classification

multiclass.OneVsRestClassifier(estimator, *) One-vs-the-rest (OvR) multiclass strategy.

multiclass.OneVsOneClassifier(estimator, *) One-vs-one multiclass strategy.

multiclass.OutputCodeClassifier(estimator, *) (Error-Correcting) Output-Code multiclass strategy.

sklearn.multioutput: Multioutput regression and classification

multioutput.ClassifierChain(base_estimator, *) A multi-label model that arranges binary classifiers into a chain.

multioutput.MultiOutputRegressor(estimator, *) Multi target regression.

multioutput.MultiOutputClassifier(estimator, *) Multi target classification.

multioutput.RegressorChain(base_estimator, *) A multi-label model that arranges regressions into a chain.

sklearn.naive_bayes: Naive Bayes

naive_bayes.BernoulliNB(*[, alpha, ...]) Naive Bayes classifier for multivariate Bernoulli models.

naive_bayes.CategoricalNB(*[, alpha, ...]) Naive Bayes classifier for categorical features.

naive_bayes.ComplementNB(*[, alpha, ...]) The Complement Naive Bayes classifier described in Rennie et al. (2003).

naive_bayes.GaussianNB(*[, priors, ...]) Gaussian Naive Bayes (GaussianNB).

naive_bayes.MultinomialNB(*[, alpha, ...]) Naive Bayes classifier for multinomial models.

sklearn.neighbors: Nearest Neighbors

neighbors.BallTree(X[, leaf_size, metric]) BallTree for fast generalized N-point problems

neighbors.KDTree(X[, leaf_size, metric]) KDTree for fast generalized N-point problems

neighbors.KernelDensity(*[, bandwidth, ...]) Kernel Density Estimation.

neighbors.KNeighborsClassifier([...]) Classifier implementing the k-nearest neighbors vote.

neighbors.KNeighborsRegressor([n_neighbors, ...]) Regression based on k-nearest neighbors.

neighbors.KNeighborsTransformer(*[, mode, ...]) Transform X into a (weighted) graph of k nearest neighbors.

neighbors.LocalOutlierFactor([n_neighbors, ...]) Unsupervised Outlier Detection using the Local Outlier Factor (LOF).

neighbors.RadiusNeighborsClassifier([...]) Classifier implementing a vote among neighbors within a given radius.

neighbors.RadiusNeighborsRegressor([radius, ...]) Regression based on neighbors within a fixed radius.

neighbors.RadiusNeighborsTransformer(*[, ...]) Transform X into a (weighted) graph of neighbors nearer than a radius.

neighbors.NearestCentroid([metric, ...]) Nearest centroid classifier.

neighbors.NearestNeighbors(*[, n_neighbors, ...]) Unsupervised learner for implementing neighbor searches.

neighbors.NeighborhoodComponentsAnalysis([...]) Neighborhood Components Analysis.

neighbors.kneighbors_graph(X, n_neighbors, *) Compute the (weighted) graph of k-Neighbors for points in X.

neighbors.radius_neighbors_graph(X, radius, *) Compute the (weighted) graph of Neighbors for points in X.

neighbors.sort_graph_by_row_values(graph[, ...]) Sort a sparse graph such that each row is stored with increasing values.

sklearn.neural_network: Neural network models

pipeline.FeatureUnion(transformer_list, *[, ...]) Concatenates results of multiple transformer objects.

pipeline.Pipeline(steps, *[, memory, verbose]) Pipeline of transforms with a final estimator.

pipeline.make_pipeline(*steps[, memory, verbose]) Construct a Pipeline from the given estimators.

pipeline.make_union(*transformers[, n_jobs, ...]) Construct a FeatureUnion from the given transformers.

sklearn.pipeline: Pipeline

see here

sklearn.preprocessing: Preprocessing and Normalization

preprocessing.Binarizer(*[, threshold, copy]) Binarize data (set feature values to 0 or 1) according to a threshold.

preprocessing.FunctionTransformer([func, ...]) Constructs a transformer from an arbitrary callable.

preprocessing.KBinsDiscretizer([n_bins, ...]) Bin continuous data into intervals.

preprocessing.KernelCenterer() Center an arbitrary kernel matrix

preprocessing.LabelBinarizer(*[, neg_label, ...]) Binarize labels in a one-vs-all fashion.

preprocessing.LabelEncoder() Encode target labels with value between 0 and n_classes-1.

preprocessing.MultiLabelBinarizer(*[, ...]) Transform between iterable of iterables and a multilabel format.

preprocessing.MaxAbsScaler(*[, copy]) Scale each feature by its maximum absolute value.

preprocessing.MinMaxScaler([feature_range, ...]) Transform features by scaling each feature to a given range.

preprocessing.Normalizer([norm, copy]) Normalize samples individually to unit norm.

preprocessing.OneHotEncoder(*[, categories, ...]) Encode categorical features as a one-hot numeric array.

preprocessing.OrdinalEncoder(*[, ...]) Encode categorical features as an integer array.

preprocessing.PolynomialFeatures([degree, ...]) Generate polynomial and interaction features.

preprocessing.PowerTransformer([method, ...]) Apply a power transform featurewise to make data more Gaussian-like.

preprocessing.QuantileTransformer(*[, ...]) Transform features using quantiles information.

preprocessing.RobustScaler(*[, ...]) Scale features using statistics that are robust to outliers.

preprocessing.SplineTransformer([n_knots, ...]) Generate univariate B-spline bases for features.

preprocessing.StandardScaler(*[, copy, ...]) Standardize features by removing the mean and scaling to unit variance.

preprocessing.TargetEncoder([categories, ...]) Target Encoder for regression and classification targets.

preprocessing.add_dummy_feature(X[, value]) Augment dataset with an additional dummy feature.

preprocessing.binarize(X, *[, threshold, copy]) Boolean thresholding of array-like or scipy.sparse matrix.

preprocessing.label_binarize(y, *, classes) Binarize labels in a one-vs-all fashion.

preprocessing.maxabs_scale(X, *[, axis, copy]) Scale each feature to the [-1, 1] range without breaking the sparsity.

preprocessing.minmax_scale(X[, ...]) Transform features by scaling each feature to a given range.

preprocessing.normalize(X[, norm, axis, ...]) Scale input vectors individually to unit norm (vector length).

preprocessing.quantile_transform(X, *[, ...]) Transform features using quantiles information.

preprocessing.robust_scale(X, *[, axis, ...]) Standardize a dataset along any axis.

preprocessing.scale(X, *[, axis, with_mean, ...]) Standardize a dataset along any axis.

preprocessing.power_transform(X[, method, ...]) Parametric, monotonic transformation to make data more Gaussian-like.

sklearn.random_projection: Random projection

random_projection.GaussianRandomProjection([...]) Reduce dimensionality through Gaussian random projection.

random_projection.SparseRandomProjection([...]) Reduce dimensionality through sparse random projection.

random_projection.johnson_lindenstrauss_min_dim(...) Find a 'safe' number of components to randomly project to.

sklearn.semi_supervised: Semi-Supervised Learning

semi_supervised.LabelPropagation([kernel, ...]) Label Propagation classifier.

semi_supervised.LabelSpreading([kernel, ...]) LabelSpreading model for semi-supervised learning.

semi_supervised.SelfTrainingClassifier(...) Self-training classifier.

sklearn.svm: Support Vector Machines

svm.LinearSVC([penalty, loss, dual, tol, C, ...]) Linear Support Vector Classification.

svm.LinearSVR(*[, epsilon, tol, C, loss, ...]) Linear Support Vector Regression.

svm.NuSVC(*[, nu, kernel, degree, gamma, ...]) Nu-Support Vector Classification.

svm.NuSVR(*[, nu, C, kernel, degree, gamma, ...]) Nu Support Vector Regression.

svm.OneClassSVM(*[, kernel, degree, gamma, ...]) Unsupervised Outlier Detection.

svm.SVC(*[, C, kernel, degree, gamma, ...]) C-Support Vector Classification.

svm.SVR(*[, kernel, degree, gamma, coef0, ...]) Epsilon-Support Vector Regression.

svm.l1_min_c(X, y, *[, loss, fit_intercept, ...]) Return the lowest bound for C.

sklearn.tree: Decision Trees

tree.DecisionTreeClassifier(*[, criterion, ...]) A decision tree classifier.

tree.DecisionTreeRegressor(*[, criterion, ...]) A decision tree regressor.

tree.ExtraTreeClassifier(*[, criterion, ...]) An extremely randomized tree classifier.

tree.ExtraTreeRegressor(*[, criterion, ...]) An extremely randomized tree regressor.

tree.export_graphviz(decision_tree[, ...]) Export a decision tree in DOT format.

tree.export_text(decision_tree, *[, ...]) Build a text report showing the rules of a decision tree.

tree.plot_tree(decision_tree, *[, ...]) Plot a decision tree.

sklearn.utils: Utilities

see here