## IEEE ICDM Contest – Overview of Top Solutions, part 2

November 5, 2010 1 Comment

**As previously announced, we publish further descriptions of top solutions of ****the IEEE ICDM Contest: TomTom Traffic Prediction for Intelligent GPS Navigation.**

**This time you may read**

**reports**

**on**

**Task 2 “Jams” and Task 3 “GPS”. Solutions of Task 1 can be found in Part 1.**

**If you want to ask the authors any questions feel free to comment below the post.**

### Task 2 “Jams”

By Łukasz Romaszko (*lukasz21212121*),** ****the winner, from the ****University of Warsaw****, Poland.**** **

The algorithm focuses on computing ranks for each street (the higher rank means greater probability of traffic jam) and the number of streets to be jammed. The highest ranked streets are given as the output. In particular it uses an adaptation of the k-Nearest Neighbors algorithm. The following description of the algorithm is simplified. The scheme presenting the algorithm idea is given below.

Computing street ranks and the number* p *of jammed streets consists of several steps. From the training data set there are chosen two ordered sets: *Similarity* and *LengthSimilarity*.* Similarity* is used to compute street ranks, while *LengthSimilarity* to determinate number *p*. On the basis of *Similarity* set and special functions algorithm generates an array RankingList. In the next step *RankingList* will be slightly modified by taking into consideration the locations of streets. The top *p* streets are given as the output.

*Algorithm*

Let us denote by *T* the test data: the set of identifiers of the 5 excluded segments followed by a sequence of major roads where the first jams occurred during initial 20 minutes of the simulation.

*Generating Similarity and LengthSimilarity ordered sets*

At the beginning the algorithm compares sequences from Training data to* T*. Two different measures were used to compute *Similarity *and* LengthSimilarity*. Sequences are compared based on the indices positions of roads which were jammed in both sequences. Let Δ be the difference between length of current sequence D and length of T. |T| denotes the number of jammed roads in T. The measure used to generate *Similarity* assumes that sequences of similar Δ length are more reliable. It is worth emphasizing that the metric used to generate *Similarity *set has the greatest influence on the result.

Sequences evaluation in *LengthSimilarity* was similar to that in* Similarity*, but took into consideration only sequences of Δ <=10. The algorithm for computing values in *Similarity* is described below: Read more of this post