Evaluation and comparison of open source software suites for data mining and knowledge discovery

Altalhi, Abdulrahman H.; Luna, J. M.; Vallejo, M. A.; Ventura, S.

Publicación: WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY
2017
VL / 7 - BP / - EP /
abstract
The growing interest in the extraction of useful knowledge from data with the aim of being beneficial for the data owner is giving rise to multiple data mining tools. Research community is specially aware of the importance of open source data mining software to ensure and ease the dissemination of novel data mining algorithms. The availability of these tools at no cost, and also the chance of better understanding of the approaches by examining their source code, provides the research community with an opportunity to tune and improve the algorithms. Documentation, updating, variety of algorithms, extensibility, and interoperability among others can be major issues to motivate users for opting for a specific open source data mining tool. The aim of this paper is to evaluate 19 open source data mining tools and to provide the research community with an extensive study based on a wide set of features that any tool should satisfy. The evaluation is carried out by following two methodologies. The first one is based on scores provided by experts to produce a subjective judgment of each tool. The second procedure performs an objective analysis about which features are satisfied by each tool. The ultimate aim of this work is to provide the research community with an extensive study on different features included in any data mining tool, either from a subjective and an objective point of view. Results reveal that RapidMiner, Konstanz Information Miner, and Waikato Environment for Knowledge Analysis are the tools that include higher percentage of these features. (C) 2017 John Wiley & Sons, Ltd

Access level