http://stevehanov.ca/blog... Finding Bieber: On removing duplicates from a set of documents | Steve Hanov