|
Ford Classification Challenge |
Background
This challenge problem was motivated by a potential automotive application. Abstractly, this problem amounts to classification of finite data sequences, in contrast to the more commonly encountered problem of classification based on feature vectors. The length of the sequences reflects the time available for making the classification decision. Presumably, the task would be easier if the sequence length were increased, but this would violate the requirements of the application. The number of examples available is somewhat limited; this reflects the nature and cost of the data acquisition process. This problem does not appear to have a simple solution that emerges from visual inspection of the sequences. This distinguishes it from others in our experience where, at least in some range of operation, examples from opposite classes are readily differentiated.
Challenge
Data samples from an automotive subsystem were collected in batches of 500 samples per diagnostic session. The objective is a classifier that will determine whether a certain symptom exists or does not exist after examining the samples. To this end, batches of 500 samples were collected when the symptom exists and batches of 500 samples were collected when the symptom does not exist. Some facts you may want to know about this data:

Fig 1: Example of a pattern when the symptom does not exist

Fig 2: Example of a pattern when the symptom exists
Two data sets are provided in this challenge.
Data samples of known and hidden classification were collected in typical operating conditions, with minimal noise contamination.
Data samples of known classification were collected in typical operating conditions, while data samples of hidden classification were collected under noisy conditions
Note: the two data sets are not related to each other and should not be mixed.
Download
Ford_A (6.55 MB)
Download Ford_B (7.60 MB)
Download Ford_A_valid_labels (1 KB)
Download Ford_B_valid_labels (1 KB)
Download Ford_A_test_labels (1 KB)
Download Ford_B_test_labels (1 KB)
|
Name |
Number of Training Patterns |
Number of Validation Patterns |
Number of Testing Patterns |
| Ford_A | 3271 | 330 | 1320 |
| Ford_B | 3306 | 330 | 810 |
Evaluation
Evaluation Criteria
The classification performance used in the evaluation is the accuracy of the classifier and in case of a tie the false positive rate of the classifier will also be used.
Definitions:
Confusion Matrix
| Predicted Negative | Predicted Positive | |
| Negative Examples | a | b |
| Positive Examples | c | d |
Accuracy = (a + d) / (a + b + c +d) (1)
False positive rate = b / (a + b) (2)
Submissions
Email the results to nn_classification@comcast.net
During the development phase (Dec 2007 - Feb 1st. 2008), validation set labels will not be revealed, but you will get a feed back on your classifier performance based on [dataname]_valid.res that you email. On Feb 2nd, the validation set results will be revealed. The ranking of the contestants will be based on their classifier performance on the testing set only.
The results on each dataset should be formatted in ASCII files according to the following table:
| Filename | Development | Challenge | Description | File Format |
| [dataname]_train.res | optional | compulsory | Classifier outputs for training examples | + / - 1 indicating classifications |
| [dataname]_valid.res | compulsory | optional | Classifier outputs for validation examples | + / - 1 indicating classifications |
| [dataname]_test.res | optional | compulsory | Classifier outputs for test examples | + / - 1 indicating classifications |
Submitted files must be in a .zip archive format.
zip your_name_results.zip *.res
In addition to your classifications for Ford_A and Ford_B sets, you are required to submit a document of 2-6 pages IEEE format, in which you describe the methodology that you used to classify the data and the software that you used. Only documents prepared in PDF format will be accepted.
Paper Size: US Letter format (8.5" x 11").
Paper Length: Maximum 6 pages, including figures, tables & references. Paper Formatting: double column, single spaced, 10pt font.
Margins: Left, Right, and Bottom: 0.75" (19mm). The top margin must be 0.75" in (19 mm), except for the title page where it must be 1" (25 mm).
File Size Limitation: 5.0MB.
Do not number your manuscript pages.
Note: Violations of any of the above paper specifications may result in disqualification.
Each participant is allowed only one final submission.
Participation
Participation is open to everyone from December 2007. Deadline for submission of your classification results and your documentation is March 08th. 2008.
Plan For Result Dissemination
The classification results as well as the ranking of the participants will be disclosed in the competition workshop WCCI 2008.
Schedule
December 2007 Competition begins.
March 08, 2008 Deadline for submitting final results.
Contact information
Results
Download Results
I had the pleasure of meeting some of you personally in WCCI2008. I would like to extend the organizers' thanks to all who participated in the Ford Classification Challenge, wishing you all the best an all your future endeavors.
Congratulations to the competition winners:
Ford_A: D'yakonov Alexander G (Russia)
Ford_B: Gavin Cawley (UK)
Ford_A & Ford_B Lv Jun (China)