1. The complete title of one (or more) paper(s) published in the open literature describing the work that the author claims describes a human-competitive result: A Multiobjective Genetic Programming-Based Ensemble for Simultaneous Feature Selection and Classification ------------------------------------------------------------------------------- 2. The name, complete physical mailing address, e-mail address, and phone number of EACH author of EACH paper(s): Kaustuv Nag Chandpur, PO - Vidyanagar, Burdwan, West Bengal, India - 741319 kaustuv.nag@gmail.com 091-9232682253 Nikhil R. Pal Electronics and Communication Sciences Unit, Indian Statistical Institute, 203 B. T. Road, Calcutta, India - 700108 nrpal59@gmail.com 091-9433905237 ------------------------------------------------------------------------------- 3. The name of the corresponding author (i.e., the author to whom notices will be sent concerning the competition): Kaustuv Nag kaustuv.nag@gmail.com ------------------------------------------------------------------------------- 4. The abstract of the paper(s): We present an integrated algorithm for simultaneous feature selection (FS) and designing of diverse classifiers using a steady state multiobjective genetic programming (GP), which minimizes three objectives: 1) false positives (FPs); 2) false negatives (FNs); and 3) the number of leaf nodes in the tree. Our method divides a c-class problem into c binary classification problems. It evolves c sets of genetic programs to create c ensembles. During mutation operation, our method exploits the fitness as well as unfitness of features, which dynamically change with generations with a view to using a set of highly relevant features with low redundancy. The classifiers of ith class determine the net belongingness of an unknown data point to the ith class using a weighted voting scheme, which makes use of the FP and FN mistakes made on the training data. We test our method on eight microarray and 11 text data sets with diverse number of classes (from 2 to 44), large number of features (from 2000 to 49151), and high feature-to-sample ratio (from 1.03 to 273.1). We compare our method with a bi-objective GP scheme that does not use any FS and rule size reduction strategy. It depicts the effectiveness of the proposed FS and rule size reduction schemes. Furthermore, we compare our method with four classification methods in conjunction with six features selection algorithms and full feature set. Our scheme performs the best for 380 out of 474 combinations of data sets, algorithm and FS method. ------------------------------------------------------------------------------- 5. A list containing one or more of the eight letters (A, B, C, D, E, F, G, or H) that correspond to the criteria (see above) that the author claims that the work satisfies: B ------------------------------------------------------------------------------- 6. A statement stating why the result satisfies the criteria that the contestant claims (see examples of statements of human-competitiveness as a guide to aid in constructing this part of the submission): In January 2013, Qinbao Song, Jingjie Ni, and Guangtao Wang published a paper in volume 25, issue 1 of IEEE Transactions on Knowledge and Data Engineering. The title of that work was, "A Fast Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data". In that work, they produced results of 28 state-of-the-art pairs of feature selection algorithms (including the full feature set) and classification algorithms for several data sets. In our paper, we have compared the results obtained by our method with the results reported by them and found that in 27 of the cases improvement in performance of our method was statistically significant over the corresponding state-of-the-art pair of feature selection algorithm and classification algorithm. For the remaining one pair of feature selection algorithm and classification algorithm, we found our result to be comparable. Therefore, we claim that the following criteria is satisfied: (B) The result is equal to or better than a result that was accepted as a new scientific result at the time when it was published in a peer-reviewed scientific journal. ------------------------------------------------------------------------------- 7. A full citation of the paper (that is, author names; publication date; name of journal, conference, technical report, thesis, book, or book chapter; name of editors, if applicable, of the journal or edited book; publisher name; publisher city; page numbers, if applicable): Title: A Multiobjective Genetic Programming-Based Ensemble for Simultaneous Feature Selection and Classification Authors: Kaustuv Nag and Nikhil R Pal Date of acceptance: February 5, 2015 Date of publication: March 6, 2015 Date of current version: January 13, 2016 Issue Date: Feb. 2016 Journal: IEEE Transactions on Cybernetics ISSN : 2168-2267 Editor in Chief: Jun Wang Recommending Associate Editor: S. Mostaghim Publisher: IEEE Publisher City: New York Volume: 46 Issue (No.): 2 Pages: 499-510 DOI: 10.1109/TCYB.2015.2404806 Link to publisher's website: http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7055929 Note: We have permission from IEEE to post the published version of the paper in website subject to wining the Humies 2016 award. However, it needs to include the IEEE copyright notice © 2016 and the DOI:10.1109/TCYB.2015.2404806 as part of the citation. ------------------------------------------------------------------------------- 8. A statement either that "any prize money, if any, is to be divided equally among the co-authors" OR a specific percentage breakdown as to how the prize money, if any, is to be divided among the co-authors: "any prize money, if any, is to be divided equally among the co-authors" ------------------------------------------------------------------------------- 9. A statement stating why the authors expect that their entry would be the "best,": Classification is one of the most frequently addressed and important problems in machine learning and pattern recognition. Increase in the number of features, the number of classes, and the feature-to-sample ratio enhances the difficulties associated with a classification task. Our experimental results empirically demonstrate that the proposed method can satisfactorily solve 19 classification tasks with diverse number of classes (from 2 to 44), large number of features (from 2000 to 49151), and high feature-to-sample ratio (from 1.03 to 273.1). Because, for these problems, the improvement in performance of the proposed method is statistically significant over 27 pairs of state-of-the-art feature selection algorithms (including the full feature set) and classification algorithms and comparable with the remaining pair of feature selection algorithm and classification algorithm. Moreover, it outperformed another GP based classification technique. There exist several embedded methods for feature selection and classification using evolutionary computing. But, according to our knowledge, none of these methods has been empirically shown to be so effective in solving problems with such large number of classes, large number of features, and high feature-to-sample ratio. Another notable issue is that, unlike most of the GP-based systems, for some cases, the proposed method found very small classification rules even for difficult problems. Thus, this work is a major step in machine learning and pattern recognition using evolutionary computation technique. ------------------------------------------------------------------------------- 10. an indication of the general type of genetic or evolutionary computation used, such as GA (genetic algorithms), GP (genetic programming), ES (evolution strategies), EP (evolutionary programming), LCS (learning classifier systems), GE (grammatical evolution), GEP (gene expression programming), DE (differential evolution), etc.: GP (Genetic Programming)