Mislabeled Data

User 1273 | 6/4/2015, 2:13:25 PM

Hello, if I have a small percentage of mislabeled data into a training set what is the best strategy to deal with this in order to minimize errors during classification?

Comments

User 1592 | 6/8/2015, 4:52:26 AM

Hi Vince, It is desirable to clean the data and fix the wrong labels before the training. In case it is not possible you will have to train using the wrong labels. The problem is more difficult when you have a skewed data set with mislabels. If the dataset is balanced and you have a small percentage of wrong labels it is not so bad.