Matt McCutchen's Web SiteStreaming-model clustering  (Top, Clusterer code, Simplified outliers proof, Bottom).  Email me about this page.

Streaming-model clustering

Status: dormant; 2007-2008 (supersedes any conflicting remarks left on this page; see the home page for definitions)

I have been working with Samir Khuller on streaming algorithms for variations of the k-center clustering problem.  Our paper titled "Streaming algorithms for k-center clustering with outliers and with anonymity" appeared in the APPROX2008 conference.  Here is its SpringerLink page; you can download the published paper there if you have a subscription.  In addition, the authors' version is available here; I may make corrections and updates to it in the future.

Note: Springer holds the copyright to this paper.  I am offering it here pursuant to the copyright transfer agreement, but further distribution may require copyright permission from Springer.

Last update: On 2008-08-28, I updated the publication notice and added a reference to this Web page.

FileSizeModification time
clustering.pdf2232462008-08-28 18:42:38 +0000

Here are my slides from the conference, which include nice demonstrations of the parallelized Scaling Algorithm and the streaming algorithm for outliers but do not describe the streaming algorithm for anonymity (see the paper for that).

FileSizeModification time
clustering-slides.pdf8020142008-08-27 03:22:02 +0000

Clusterer code[Top]

Here is a package of supplementary code in Haskell.  Currently it contains an implementation of the essential parts of the streaming algorithm for clustering with outliers and a simple demonstration.

Last update: On 2008-06-18, initial posting.

FileSizeModification time
clusterer.tar.bz234852008-06-18 20:08:43 +0000

Simplified outliers proof[Top]

Here is a simplified proof of correctness of the offline algorithm for k-center clustering with outliers.  (The algorithm is described in this paper.)  I found this proof on 2008-08-16.  I may add it to the main clustering paper at some point.

FileSizeModification time
outliers-proof.pdf577232010-12-08 16:16:18 +0000

Matt McCutchen's Web SiteStreaming-model clustering  (Top, Clusterer code, Simplified outliers proof, Bottom).  Email me about this page.
Modification time of this page's main source file: 2010-12-08 16:15:45 +0000
Except where otherwise noted, Matt McCutchen waives his copyright to the content of this site.  This site comes with absolutely no warranty.  Why?