K-Means Algorithm is a popular algorithm in Data Science to discover patterns by looking for ‘K’ number of clusters in a dataset. A cluster refers to a collection of data points aggregated together because of certain similarities (Dr. Michael J., towardsdatascience.com).
I was intrigued by this subject thanks to Sean Fruin from Sigma-AEC, demonstrating possible use cases with this algorithm. Thereafter, I attempted at writing 1 in python. The results were pretty, though I’m aware that it probably isn’t a complete one. But check it out anyway!
I’m not going to elaborate on the algorithm as there are plenty of good explanations online already. I’m just going to put mine out on display, and you can download the files below if you’re interested.K-meansDemo.zip (45 downloads)
The 2 key inputs in the python script that follows are:
- ‘K’ numbers of clusters
- Dataset of Points to find clusters from
# Enable Python support and load DesignScript library import clr clr.AddReference('ProtoGeometry') from Autodesk.DesignScript.Geometry import * clr.AddReference('DSCoreNodes') from DSCore import * def ClusterCentroid(points, centroid): numOfPoints = len(points) if (numOfPoints == 0): return centroid else: xAvg = sum( [p.X for p in points] ) / numOfPoints yAvg = sum( [p.Y for p in points] ) / numOfPoints zAvg = sum( [p.Z for p in points] ) / numOfPoints return Point.ByCoordinates(xAvg, yAvg, zAvg) def RandomCentroids(k, points): xLst = [p.X for p in points] minX = min(xLst) maxX = max(xLst) yLst = [p.Y for p in points] minY = min(yLst) maxY = max(yLst) zLst = [p.Z for p in points] minZ = min(zLst) maxZ = max(zLst) centers =  for i in range(k): randX = Math.Random(minX, maxX) randY = Math.Random(minY, maxY) randZ = Math.Random(minZ, maxZ) center = Point.ByCoordinates( randX, randY, randZ) centers.append( center ) return centers def ClusterPoints(points, centroids): groups =  for centroid in centroids: group =  for point in points: nearestCentroid = sorted(centroids, key = lambda c: point.DistanceTo(c), reverse = False) if centroid is nearestCentroid: group.append(point) groups.append(group) return groups # INPUTS k = IN points = IN epochs = 5 # MAIN #1. Init centroids centroids = RandomCentroids(k, points) #2. Assign points to nearest centroids clusters = ClusterPoints(points, centroids) #3. Move centers to assigned points for i in range(epochs): centroids = [ClusterCentroid( cluster, centroid ) for cluster, centroid in zip(clusters, centroids)] clusters = ClusterPoints(points, centroids) OUT = clusters, centroids
I hope you’ve enjoyed this post, as much as I enjoyed building it. If you encounter any bugs using the script, do flag it out to me, I figured that I will have to refine it eventually. If you’re enjoying the content, do show your support through our Patreon page! Else, happy coding!