Winners’ notes. CNSlab team on music instruments recognition

By Robert Coleman and Daniel Schoonover (CNSlab) from Cognitive NeuroSystems Lab, Department of Cognitive Science, UC Irvine, USA – 3rd in Music Instruments track of ISMIS 2011 Contest: Music Information Retrieval.

Two training datasets were provided, one larger one containing data taken from single instruments, and one smaller one with data from combinations of exactly two instruments. These two datasets contained both similar as well as unique labels. Overall, there were 32 literally distinct classes contained in the two training sets. Since the problem was one of multi-way classification, the first approach was the multi-layer perceptron. With 35 hidden neurons, the MLP was trained using Levenberg-Marquadt updating. The MLP was then used to evaluate the test set, and the top 2 activations (of the 32 output nodes) were assigned as labels to that sample point. This model performed with 38% accuracy. These results led us to believe that further investigation should be done on the test data, as MLP’s should perform significantly better than the nearest neighbor approach. Also, many inconsistencies existed between the two training set labels i.e. ‘alto sax’ and ‘saxophone’.  To investigate the distribution of the test samples, 32 ‘dummy’ scripts were submitted, each of which containing only one instrument class for both instruments and for every test sample. The resulting classification accuracy was collected for all the classes and represented the distribution of the preliminary test samples. Additionally, it was known that the preliminary and final test set was randomly chosen from the entire test set. Using this knowledge, the resulting distribution was used as priors on the 32 classes. Upon scrutinizing the returned test distribution, it was noticed that many of the classes which had similar names i.e. ‘clarinet’ vs. ‘B-flat clarinet’ only appeared as one class in the preliminary test set. With this knowledge, the classes which did not appear at all in the preliminary test set were either deleted, or their data combined with the classes which had similar names.

During initial investigation of the training data a traditional random forest (RF) classifier was used to test the baseline classifiability of the single instrument training dataset (details of the algorithm can be found in L. Breiman 2001).  A forest of 1000 decision stumps, each maximally ten splits deep, was trained.  Initial performance of this classifier was very good with error > 0.9%.  However, the traditional RF classifier is designed to handle discrete, scalar target values.  For this problem, training on the mixed interment data, with each datum belonging to two classes, would normally not have been feasible.  However, our group devised a method to train this algorithm using both the single instrument and mixed instrument training data.  We did so by generating new training sets, with one instance of the single instrument training data, and randomly sampling the mixed training data, with repeats and a non-uniform distribution that matched the prior information about the final test set that was gained from the dummy scripts, and labeling each repeated with one or the other of the two class labels provided by the training data.  This allowed the RF algorithm to be trained in a bootstrap-like method seeing the same datum several times, and seeing them with both labels attached to that datum.  Out-of-bag training error was optimal for the RF at roughly 300 trees, again each maximally ten splits deep.  Probability outputs for each class were obtained by the proportion of votes for that class to the total number of trees.

Initial leaderboard submissions determined classification success of the test data for this RF was 54.66% overall.  Next, a submission was made to the leaderboard by mirroring just the most probable RF class for each entry e.g. “cello,cello; violin,violin;…”.  Results from this submission had a leader board determined classification success of 46.02%, informing us that this RF algorithm was correctly selecting one of the two instruments in the test data 92% of the time, and the addition of the second most probable instrument correctly selecting the second instrument for roughly 16% of the entries.

The final model used a voting scheme to decide on the two instrument labels for each test sample. The first label was chosen from the highest RF vote. To decide instrument two, the two independently best performing MLPs were used with the RF probabilities. The output activations from the MLP’s and RF’s were weighted by each other, and by the prior distribution. Discarding the selection from the RF for label one, the highest vote from this ensemble was used to create the second label.

Special thanks to Dr Max Welling, Eli Bowen. All analysis was done in MATLAB, using the Neural Network and Randomforest-MATLAB toolboxes.

— Robert Coleman, Daniel Schoonover

%d bloggers like this: