Skip to content Skip to sidebar Skip to footer

How Do I Obtain Individual Centroids Of K Mean Cluster Using Nltk (python)

I have used nltk to perform k mean clustering as I would like to change the distance metrics to cosine distance. However, how do I obtain the centroids of all the clusters? kcluste

Solution 1:

import pandas as pd
import numpy as np

# created dummy dataframe with 3 feature
df = pd.DataFrame([[1,2,3],[50, 51,52],[2.0,6.0,8.5],[50.11,53.78,52]], columns = ['feature1', 'feature2','feature3'])
print(df)

enter image description here

obj = KMeansClusterer(2, distance = nltk.cluster.util.cosine_distance) #giving number of cluster 2
vectors = [np.array(f) for f in df.values]

df['predicted_cluster'] = obj.cluster(vectors,assign_clusters = True))

enter image description here

print(obj.means())
#OP
[array([50.055, 52.39 , 52.   ]), array([1.5 , 4.  , 5.75])] #which is going to be mean of three feature for 2 cluster, since number of cluster that we passed is 2#now if u want the cluster center in pandas dataframe df['centroid'] = df['predicted_cluster'].apply(lambda x: obj.means()[x])

enter image description here

Post a Comment for "How Do I Obtain Individual Centroids Of K Mean Cluster Using Nltk (python)"