November 5, 2010 1 Comment
As previously announced, we publish further descriptions of top solutions of the IEEE ICDM Contest: TomTom Traffic Prediction for Intelligent GPS Navigation. This time you may read reports on Task 2 “Jams” and Task 3 “GPS”. Solutions of Task 1 can be found in Part 1.
If you want to ask the authors any questions feel free to comment below the post.
By Łukasz Romaszko (lukasz21212121), the winner, from the University of Warsaw, Poland.
The algorithm focuses on computing ranks for each street (the higher rank means greater probability of traffic jam) and the number of streets to be jammed. The highest ranked streets are given as the output. In particular it uses an adaptation of the k-Nearest Neighbors algorithm. The following description of the algorithm is simplified. The scheme presenting the algorithm idea is given below.
Computing street ranks and the number p of jammed streets consists of several steps. From the training data set there are chosen two ordered sets: Similarity and LengthSimilarity. Similarity is used to compute street ranks, while LengthSimilarity to determinate number p. On the basis of Similarity set and special functions algorithm generates an array RankingList. In the next step RankingList will be slightly modified by taking into consideration the locations of streets. The top p streets are given as the output.
Let us denote by T the test data: the set of identifiers of the 5 excluded segments followed by a sequence of major roads where the first jams occurred during initial 20 minutes of the simulation.
Generating Similarity and LengthSimilarity ordered sets
At the beginning the algorithm compares sequences from Training data to T. Two different measures were used to compute Similarity and LengthSimilarity. Sequences are compared based on the indices positions of roads which were jammed in both sequences. Let Δ be the difference between length of current sequence D and length of T. |T| denotes the number of jammed roads in T. The measure used to generate Similarity assumes that sequences of similar Δ length are more reliable. It is worth emphasizing that the metric used to generate Similarity set has the greatest influence on the result.
Sequences evaluation in LengthSimilarity was similar to that in Similarity, but took into consideration only sequences of Δ <=10. The algorithm for computing values in Similarity is described below: Read more of this post