Data Science/AI
인공지능 #6-2 | 비지도학습 - K Means 알고리즘, 센트로이드, 엘보우 방법
토마토.
2023. 3. 7. 15:23
6. 비지도 학습¶
6-2. k-means¶
K-means 알고리즘¶
- 무작위로 k개 클러스터 중심을 고르기
- 각 샘플에서 가장 가까운 클러스터 중심을 찾아 해당 클러스터로 할당시키기
- 클러스터의 중심을 클러스터에 속한 샘플들의 평균값으로 변경하기
- 클러스터 중심이 수렴할 때까지 2~3 반복하기
In [16]:
# 데이터 가져오기
import numpy as np
fruits = np.load("fruits_300.npy")
fruits_2d = fruits.reshape(-1, 100*100)
In [17]:
# K-means 알고리즘
from sklearn.cluster import KMeans
km = KMeans(n_clusters=3, random_state=42)
km.fit(fruits_2d)
c:\Users\.venv\lib\site-packages\sklearn\cluster\_kmeans.py:870: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning warnings.warn(
Out[17]:
KMeans(n_clusters=3, random_state=42)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
KMeans(n_clusters=3, random_state=42)
In [19]:
print(km.labels_)
print(np.unique(km.labels_, return_counts=True))
[2 2 2 2 2 0 2 2 2 2 2 2 2 2 2 2 2 2 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 0 2 0 2 2 2 2 2 2 2 0 2 2 2 2 2 2 2 2 2 0 0 2 2 2 2 2 2 2 2 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 0 2 2 2 2 2 2 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1] (array([0, 1, 2]), array([111, 98, 91], dtype=int64))
In [20]:
# 이미지 출력을 위한 유틸 함수
import matplotlib.pyplot as plt
def draw_fruits(arr, ratio=1):
n = len(arr)
rows = int(np.ceil(n/10))
cols = n if rows < 2 else 10
fig, axs = plt.subplots(
rows,
cols,
figsize=(cols*ratio, rows*ratio),
squeeze=False
)
for i in range(rows):
for j in range(cols):
if i * 10 + j < n:
axs[i, j].imshow(arr[i*10+j], cmap="gray_r")
axs[i, j].axis("off")
plt.show()
In [21]:
draw_fruits(fruits[km.labels_==0])
draw_fruits(fruits[km.labels_==1])
draw_fruits(fruits[km.labels_==2])
In [ ]:
# 클러스터의 중심
draw_fruits(km.cluster_centers_.reshape(-1, 100, 100), ratio=3)
최적의 k 찾기¶
- 사전에 클러스터 개수를 지정해야 한다
- inertia 이너셔: 클러스터 중심과 샘플 사이의 거리의 제곱 합
- elbow 방법: 클러스터 개수를 늘려가면서 이너셔의 변화를 관찰하여 최적의 클러스터 개수를 찾는 방법
In [22]:
inertia = []
for k in range(2, 7):
km = KMeans(n_clusters=k, random_state=42)
km.fit(fruits_2d)
inertia.append(km.inertia_)
plt.plot(range(2, 7), inertia)
plt.xlabel("k")
plt.ylabel("inertia")
plt.show()
c:\Users\.venv\lib\site-packages\sklearn\cluster\_kmeans.py:870: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning warnings.warn( c:\Users\.venv\lib\site-packages\sklearn\cluster\_kmeans.py:870: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning warnings.warn( c:\Users\.venv\lib\site-packages\sklearn\cluster\_kmeans.py:870: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning warnings.warn( c:\Users\.venv\lib\site-packages\sklearn\cluster\_kmeans.py:870: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning warnings.warn( c:\Users\.venv\lib\site-packages\sklearn\cluster\_kmeans.py:870: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning warnings.warn(