POSITIVE LABEL FREQUENCY THRESHOLD ALGORITHM FOR IMBALANCED CLASS DISTRIBUTION
Author’s Name : M. Kiruthiga
Volume 03 Issue 01 Year 2016 ISSN No: 2349-252X Page no: 18-21
Abstract:
Class imbalance is one of the major issues in classification. It degrades the performance of data mining. It mostly occurs by the non-experts labeling the object. Online outsourcing systems, such as Amazon’s Mechanical Turk, allow users to label the same objects with lack of quality. Thus, an agnostic algorithm Positive LAbel frequency Threshold (PLAT) is projected to handle the problem of imbalanced noisy labeling. The main objective is to generate the training dataset and integrate labels of examples. This method is used to resolve the issue of minority sample and also able to deal with imbalanced multiple noisy labeling. The algorithm is applied to the imbalanced dataset collected from UCI repository and the obtained result shows that the PLAT performs better than other methods.
Keywords:
repeated labeling, majority voting, positive and negative labels
References:
- C. L. Black and C. J. Merz. UCI repository of machine learning database [Online]. Available: http://archive.ics.uci.edu/ml/, 1998.
- V. Chawla, L. O. Hall, K. W. Bowyer, and W. P. Kegelmeyer, “SMOTE: Synthetic minority oversampling technique,” J. Artif. Intell. Res., vol. 16, pp. 321–357, 2002.
- P. Donmez, J. G. Carbonell, and J. Schneider, “Efficiently learning the accuracy of labeling sources for selective sampling,” in Proc.15th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2009, pp. 259–268.
- A. Estabrooks, T. Jo, and N. Japkowicz, “A multiple resampling method for learning from imbalanced data sets,” Comput. Intell., vol. 20, no. 1, pp. 18–36, 2004.
- H. He and E. A. Garcia, “Learning from imbalanced data,” IEEE Trans. Knowl. Data Eng., vol. 21, no. 9, pp. 1263–1284, Sep. 2009.
- H. Kajino, Y. Tsuboi, and H. Kashima, “A convex formulation for learning from crowds,” in Proc. 26th AAAI Conf. Artif. Intell., 2012, pp. 73–79.
- A. Kumar and M. Lease, “Modeling annotator accuracies for supervised learning,” in Proc. 4th ACM WSDM Workshop Crowd sourcing Search Data Mining, 2011, pp. 19–22.
- X. Y. Liu, J. Wu, and Z. H. Zhou, “Exploratory under sampling for class imbalance learning,” in Proc. IEEE 6th Int. Conf. Data Mining, 2006, pp. 965–969.
- H. Y. Lo, J. C. Wang, H. M., Wang, and S. D., Lin, “Cost-sensitive multi-label learning for audio tag annotation and retrieval,” IEEE Trans. Multimedia, vol. 13, no. 3, pp. 518–529, Jun. 2011.
- C. Parker, “On measuring the performance of binary classifiers,” Knowl. Inform. Syst., vol. 35, no. 1, pp. 131–152, 2013.
- V. S. Sheng, “Simple multiple noisy label utilization strategies,” in Proc. IEEE 11th Int. Conf. Data Mining, 2011, pp. 635–644.
- V. S. Sheng, F. Provost, and P. Ipeirotis, “Get another label? Improving data quality and data mining using multiple, nosiy labeler,” in Proc. 14th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2008, pp. 614–662.
- P. Smyth, M. C. Burl, U. M. Fayyad, P. Perona, and P. Baldi, “Inferring ground truth from subjective labeling of venus images,” Adv. Neural Inform. Process. Syst., vol. 8, pp. 1085–1092, 1995.
- R. Snow, B. O’Connor, D. Jurafsky, and A. Ng, “Cheap and fast— But is it good?” in Proc. Conf. Empirical Methods Natural Lang. Process., 2008, pp. 254–263.
- C. Strapparava and R. Mihalcea, “SemEval-2007 Task 14: Affective text,” in Proc. 4th Int. Workshop Semantic Eval., 2007, pp. 70–74.
- P. Welinder and P. Perona, “Online crowdsourcing: Rating annotators and obtaining cost-effective labels,” in Proc. Workshop Adv.Comput. Vis. Humans Loop, 2010, pp. 25–32.
- J. Whitehill, P. Ruvolo, T. Wu, J. Bergsma, and J. Movellan, “Whose vote should count more: Optimal integration of labels from labelers of unknown expertise,” in Proc. Adv. Neural Info. Process. Syst. 22, 2009, pp. 2035–2043.
- J. Zhang, X. Wu, and Victor S. Sheng,”Imbalanced Multiple Noisy Labeling”, vol 27, feb 2015.