diff --git a/scikit-learn/kmeans/README.md b/scikit-learn/kmeans/README.md index 680453b..59c661d 100644 --- a/scikit-learn/kmeans/README.md +++ b/scikit-learn/kmeans/README.md @@ -17,6 +17,8 @@ Questions to [David Rotermund](mailto:davrot@uni-bremen.de) import numpy as np import matplotlib.pyplot as plt +rng = np.random.default_rng(1) + rng = np.random.default_rng() a_x = rng.normal(1.5, 1.0, size=(1000)) @@ -31,3 +33,61 @@ plt.show() ``` ![image0](image0.png) +## [sklearn.cluster.KMeans](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html) and its [fit](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#sklearn.cluster.KMeans.fit) + +```python +class sklearn.cluster.KMeans(n_clusters=8, *, init='k-means++', n_init='warn', max_iter=300, tol=0.0001, verbose=0, random_state=None, copy_x=True, algorithm='lloyd') +``` + +> K-Means clustering. + + +Attribute: +> **cluster_centers_** : ndarray of shape (n_clusters, n_features) +> Coordinates of cluster centers. If the algorithm stops before fully converging (see tol and max_iter), these will not be consistent with labels_. + +Method: +```python +fit(X, y=None, sample_weight=None) +``` + +> Compute k-means clustering +> **X**: {array-like, sparse matrix} of shape (n_samples, n_features) +> Training instances to cluster. It must be noted that the data will be converted to C ordering, which will cause a memory copy if the given data is not C-contiguous. If a sparse matrix is passed, a copy will be made if it’s not in CSR format. + +```python +import numpy as np +import matplotlib.pyplot as plt +from sklearn.cluster import KMeans + +rng = np.random.default_rng(1) + +a_x = rng.normal(1.5, 1.0, size=(1000))[:, np.newaxis] +a_y = rng.normal(3.0, 1.0, size=(1000))[:, np.newaxis] +data_a = np.concatenate((a_x, a_y), axis=1) + +b_x = rng.normal(0.0, 1.0, size=(1000))[:, np.newaxis] +b_y = rng.normal(0.0, 1.0, size=(1000))[:, np.newaxis] +data_b = np.concatenate((b_x, b_y), axis=1) + +data = np.concatenate((data_a, data_b), axis=0) + +kmeans = KMeans(n_clusters=2) +kmeans.fit(data) + + +plt.plot(a_x, a_y, "c.") +plt.plot(b_x, b_y, "m.") +plt.plot( + kmeans.cluster_centers_[0, 0], kmeans.cluster_centers_[0, 1], "k*", markersize=12 +) +plt.plot( + kmeans.cluster_centers_[1, 0], kmeans.cluster_centers_[1, 1], "k*", markersize=12 +) + +plt.show() +``` + +![image1](image1.png) + +