Ford Classification Challenge

 

Background

This challenge problem was motivated by a potential automotive application.  Abstractly, this problem amounts to classification of finite data sequences, in contrast to the more commonly encountered problem of classification based on feature vectors.   The length of the sequences reflects the time available for making the classification decision.  Presumably, the task would be easier if the sequence length were increased, but this would violate the requirements of the application.  The number of examples available is somewhat limited; this reflects the nature and cost of the data acquisition process.  This problem does not appear to have a simple solution that emerges from visual inspection of the sequences.  This distinguishes it  from others in our experience where, at least in some range of operation, examples from opposite classes are readily differentiated.


Challenge
 

Data samples from an automotive subsystem were collected in batches of 500 samples per diagnostic session.  The objective is a classifier that will determine whether a certain symptom exists or does not exist after examining the samples.  To this end, batches of 500 samples were collected when the symptom exists and batches of 500 samples were collected when the symptom does not exist.   Some facts you may want to know about this data:

 

   Fig 1:  Example of a pattern when the symptom does not exist

 

Fig 2:  Example of a pattern when the symptom  exists

 

 

 Two data sets are provided in this challenge.

  1. Ford_A

Data samples of known and hidden classification were collected in typical operating conditions, with minimal noise contamination.

  1. Ford_B

Data samples of known classification were collected in typical operating conditions, while data samples of hidden classification were collected under noisy conditions

Note: the two data sets are not related to each other and should not be mixed.

Download  Ford_A (6.55 MB)
Download  Ford_B (7.60 MB)

Download  Ford_A_valid_labels   (1 KB)

Download  Ford_B_valid_labels   (1 KB)

Download  Ford_A_test_labels   (1 KB)

Download  Ford_B_test_labels   (1 KB)

Name
 
Number of Training Patterns Number of Validation Patterns
 
Number of Testing Patterns
Ford_A 3271 330 1320
Ford_B 3306 330 810



Evaluation
 

At the start of the challenge (development phase), participants will only have access to labeled training data (where + 1 indicates that the symptom exists and -1 indicates that the symptom does not exist) and unlabeled validation and test data. In this phase, feedback on your submissions will be provided based on your classifier performance on the validation data only. The validation labels will be made available one month before the end of the challenge. The final ranking will be only based on your classifier performance on test data, to be revealed only when the challenge is over.  Data for both sets will be provided in the following ASCII format:

 

 

Evaluation Criteria

The classification performance used in the evaluation is the accuracy of the classifier and in case of a tie the false positive rate of the classifier will also be used.

Definitions:

                                                                     Confusion Matrix

  Predicted Negative Predicted Positive
Negative Examples a b
Positive Examples c d

  

Accuracy                                =     (a + d) / (a + b + c +d)               (1)

 

False positive rate                    =     b / (a + b)                                  (2)

 

Submissions

Email the results to nn_classification@comcast.net

During the development phase (Dec 2007 - Feb 1st. 2008), validation set labels will not be revealed, but you will get a feed back on your classifier performance based on [dataname]_valid.res that you email.  On Feb 2nd, the validation set results will be revealed.  The ranking of the contestants will be based on their classifier performance on the testing set only.

The results on each dataset should be formatted in ASCII files according to the following table:

Filename Development Challenge Description File Format
[dataname]_train.res optional compulsory Classifier outputs for training examples  + / -  1 indicating classifications
[dataname]_valid.res compulsory optional Classifier outputs for validation examples + / -  1 indicating classifications
[dataname]_test.res optional compulsory Classifier outputs for test examples + / -  1 indicating classifications

Submitted files must be in a .zip archive format.

zip your_name_results.zip *.res 

 


 

Participation

Participation is open to everyone from December 2007.  Deadline for submission of your classification results and your documentation is  March 08th. 2008.

 

Plan For Result Dissemination

The classification results as well as the ranking of the participants will be disclosed in the competition workshop WCCI 2008.

 

Schedule

December 2007    Competition begins.

March 08, 2008    Deadline for submitting final results.

 

Contact information


Dr. Mahmoud Abou-Nasr

Dr. Lee Feldkamp

 

Results

 Download  Results

I had the pleasure of meeting some of you personally in WCCI2008.  I would like to extend the organizers' thanks to all who participated in the Ford Classification Challenge, wishing you all the best an all your future endeavors. 

Congratulations to the competition winners:

Ford_A:                       D'yakonov Alexander G    (Russia)

Ford_B:                       Gavin Cawley                  (UK)

Ford_A & Ford_B        Lv Jun                            (China)