Clasificación de datos basado en compresión

Authors

Avid Roman Gonzalez

DOI:

https://doi.org/10.33017/RevECIPeru2012.0012/

Keywords:

classification, NCD, data compression, metric similarity

Abstract

The increased volume of data in this digital age is enormous, the task of analyzing, processing, identifying and classify them for to have a good data mining system where we can index the information contained regardless the amount and data type, it is no easy task. That is the reason for it is becoming more necessary to develop more effective methods to facilitate these tasks automatically. This paper presents an overview of different works performed throughout the world that use data compression techniques as a basis for developing a classification method, these techniques are based on Kolmogorov Complexity and use this complexity for implement a similarity metrics between data. The main contribution of these methods is, no need a feature extraction process for classification, which makes it a parameter-free method, so it can be applied to any type of data, whether text, images, audio, etc.