We applied SALR clustering to a set of ~2500 images of nuclei clumps (examples shown above) and compared the results with those of several of the leading and popular methods: MPV3, SPVQi1, SPVXu4, SBF5, gLoG6, MSER7, the distance transform local maximum. The comparison metric used is the F1 score, F1 = 2TP / (2TP + FN + FP), where TP, FN, and FP are the number of true positives, false negatives, and false positives.
The F1 score (F1=1: good, F1=0: bad) depends on how close a calculated seed-point must be to a true nuclei center before it is considered a TP point. For this reason, the F1 score is computed over a range, starting with the requirement that the seed-point and true center be very close to each other (δr=3 pixels), and going to the requirement that they need not be very close to each other (δr=10 pixels). Note, the average nuclei radius is 19 pixels.
The results of the comparison are in the figure above; SALR clustering is 8.2% better than the next best method at δr=3, and 3.0% better than the next best average F1 score (across δr).
The steps of applying SALR clustering to locating the centers of cell nuclei are briefly described below and a graphical animation of the process can be seen to the right. For details, please see the manuscript8.
1. An initial segmentation of the image is made with the goal of creating a binary mask of the objects, but not trying to separate the overlapping objects.9
2. The confining potential is created using V(r)=1/dt(~BW(r)), where dt() is the distance transform and ~ represents the not operator. This confining potential can be improved to be scale invariant by first scaling the distance transform to a range of [1, λmax] before computing the confining potential.
3. The particles are initialized. There are two important quantities worth mentioning, the particle charge q and the particle’s initial positions. The charge is set to q=N-β, where N is the number of particle used in the simulation and β=1/3; this helps to normalize the interaction strength of large particle clusters. There are several methods of initializing the particle positions. The method we use first computes a set of possible positions using centers of curvature of each boundary vertex from the binary mask. We then overlay a hexagonal lattice over the binary mask and select one of the possible positions from each hexagon. The size of the hexagons, defined by the particles Wigner-Seitz radius rs, controls the number of particles in the simulation.
4. The particles dynamics are modeled and the centers of each cluster of particles are used as the final seed-points.
In unsupervised machine learning and data mining, data often comes in the form of scatter point data where each point is an observation and each dimension is a different measured quantity. The easiest way (but not the only way) to apply SALR clustering to scatter point data is to bin the data: we overlay a grid on the data and then count the number of points in each bin (grid element). The result of this can be thought of as a (multi-dimensional) image.
We applied SALR clustering to a 5D data set that represents healthy and damaged cells. To the left, the count isosurfaces of a projection of the data to 3D are shown. The data set has one dense region (which are the healthy cells) with three primary low density regions extending from it. K-means does not work well and produces cluster centers too biased towards the high density region, but »SALR clustering is able to locate all regions well.«
The method of applying SALR clustering to scatter-point data is very similar to locating the center of nuclei above10, but there are a few differences.
X. Qi, F. Xing, D. J. Foran, and L. Yang, IEEE Transactions on Biomedical Engineering 59, 754 (2012). ↩ ↩2
X. Zhang, H. Su, L. Yang, and S. Zhang, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition 07-12-June, 5361 (2015). ↩
B. Parvin, Q. Yang, J. Han, H. Chang, B. Rydberg, and M. H. Barcellos-Hoff, IEEE Transactions on Image Processing 16, 615 (2007). ↩
H. Xu, C. Lu, and M. Mandal, IEEE Journal of Biomedical and Health Informatics 18, 1729 (2014). ↩
P. Quelhas, M. Marcuzzo, A. M. Mendon¸ ca, and A. Campilho, IEEE Transactions on Medical Imaging 29, 1463 (2010). ↩
H. Xu, C. Lu, R. Berendt, N. Jha, M. Mandal, and S. Member, Journal of Biomedical and Health Informatics XX, 1 (2016). ↩
J. Matas, O. Chum, M. Urban, and T. Pajdla, In British Machine Vision Conference , 384 (2002). ↩
J. Kapaldo, X. Han, and D. Mary, (submitted). ↩
SALR clustering does not handle this step as it can be very different depending on the type of image. However, the binary mask used in creating the results were generated using an adaptive Otsu threshold12, and the manuscript8 shows that excellent results can also be obtained using the binary mask created by gLoG filtering followed by thresholding. ↩
Abstractly, the problems are almost the same: locate the center of partially overlapping convex regions. ↩
The distance transform will only work when the locally convex regions are about the same size in all directions. See Supplemental Note 3 of the manuscript8. ↩
N. Otsu, IEEE transactions on systems, man, and cybernetics 9, 62 (1979). ↩