Currently, breast cancer diagnosis is an extensively researched topic. An effective method to diagnose breast cancer is to use histopathological images. However, extracting features from these images is a challenging task. Thus, we propose a method that uses phylogenetic diversity indexes to characterize images for creating a model to classify histopathological breast images into four classes - invasive carcinoma, in situ carcinoma, normal tissue, and benign lesion. The classifiers used were the most robust ones according to the existing literature: XGBoost, random forest, multilayer perceptron, and support vector machine. Moreover, we performed content-based image retrieval to confirm the classification results and suggest a ranking for sets of images that were not labeled. The results obtained were considerably robust and proved to be effective for the composition of a CADx system to help specialists at large medical centers.
Keywords: Breast cancer Computer-aided diagnosis Content-based image retrieval Histopathological images Medical images