- Finish revising the paper.

[match/match.git] / paper / paper.tex
diff --git a/paper/paper.tex b/paper/paper.tex

index 2882ac9..7ce6770 100644 (file)
--- a/paper/paper.tex
+++ b/paper/paper.tex
@@ -25,7 +25,9 @@
  \usepackage{pstricks}
  \usepackage{delarray}
  
-\title{Assigning Reviewers to Proposals}
+\def\code#1{\texttt{#1}}
+
+\title{Assigning Papers to Reviewers}
  
  \author{
  Samir Khuller\thanks{Research currently supported by CCF-0728839.
@@ -74,7 +76,7 @@ score indicating his/her level of interest in reviewing the paper and an
  (Some organizations may choose to use a single set of scores for both
  desirability and expertise. We believe that making this distinction may better
  model the real-world objective.)
-A reviewer may also report a conflict of interest with a particular paper,
+A reviewer may also declare a conflict of interest with a particular paper,
  meaning that he/she is forbidden to review the paper.
  
  We do not consider stable marriage type preference lists,
@@ -83,16 +85,16 @@ to produce. In this scheme, the papers are essentially grouped
  into a few categories.
  
  Let $N$ be the number of papers and $P$ be the number of reviewers.
-Suppose that each paper needs $r$ reviews, so a total of $rN$
+Suppose that each paper needs $q$ reviews, so a total of $qN$
  reviews need to be generated.
  Ideally, from the perspective of the papers, we would like to
-assign each paper the $r$ most qualified reviewers for the paper.
+assign each paper the $q$ most qualified reviewers for the paper.
  Of course, this could lead to a load imbalanced solution where
  the load on some program committee members is very high, and the
  load on others is low. On the other hand, we could insist 
  on a perfectly load balanced solution in which the number
  of papers assigned to each program committee member does not
-exceed $L= \lceil rN/P  \rceil$. However, this may lead to a solution
+exceed $L= \lceil qN/P  \rceil$. However, this may lead to a solution
  which is not optimal from the perspective of the papers.
  
  One of our goals is to study precisely this tradeoff, and allow each
@@ -107,11 +109,21 @@ have most liked to review.
  We would like to spread the load of the stinkers as evenly as possible.
   
  
-\section{Formulation as a Min-Cost Flow Problem}
+\section{Formulation as a Min-Cost Max-Flow Problem}
+
+\begin{figure*}
+\begin{center}
+\def\eg#1{\fbox{#1}}
+\centerline{\includexfig{flow}}
+\caption{Flow Construction.}
+\label{flow-fig}
+\end{center}
+\end{figure*}
  
-Our main approach is to formulate this as a min-cost flow problem.
+We formulate the assignment problem as a min-cost max-flow problem,
+where each unit of flow represents one review.
  The construction is somewhat involved in order to incorporate all
-the appropriate incentives.  It makes
+the desired incentives.  In several places, it makes
  use of sets of ``parallel edges'' of different costs connecting a
  single pair of nodes $(x, y)$ to allow flow to be sent from $x$ to $y$
  at a cost that grows faster than linear in the amount of the flow.  For
@@ -119,11 +131,12 @@ example, if there are five unit-capacity edges from $x$ to $y$ of costs
  1, 3, 5, 7, and 9, then any integer amount $f \le 5$ of flow can be
  sent from $x$ to $y$ at a cost of $f^2$.
  
-The construction is done as follows: we have a source $s$ and a sink $t$.
+An example of the graph is shown in Figure~\ref{flow-fig}.
+We have a source $s$ and a sink $t$.
  For each paper $j$ we create a set of nodes $p^1_j, p^2_j,p^3_j$, and for each
  reviewer $i$ we create a set of nodes $r^1_i, r^2_i, r^3_i$.  (The rationale
-for these sets is discussed below.)  See Figure~\ref{flow-fig} for an
-example.  Flow can pass from $s$ through one or more of the nodes $r^t_i$ and
+for these sets is discussed below.)
+Flow can pass from $s$ through one or more of the nodes $r^t_i$ and
  one or more of the nodes $p^t_j$ to the sink to represent a review
  by reviewer $i$ of paper $j$.
  
@@ -148,33 +161,41 @@ unless the load imbalance in the second solution is outweighed by
  other benefits.
  
  For each reviewer $i$ and proposal $j$, there is a unit-capacity edge from $i$
-to $j$ allowing that assignment to be made, unless the reviewer declared a
-conflict of interest, in which case the edge is not present.  The edge has
-cost $(10 + p_{ij})^2$, where $p_{ij}$ is the preference value
-expressed by reviewer $i$ for proposal $j$.
-We assume $p_{ij}$ is a value between 1 and 40 (as used by NSF).
-The cost function was chosen to provide an incentive to avoid really bad
-assignments without completely masking the difference between a good assignment
-and an excellent assignment.
-
-These purely additive assignment costs make the algorithm prefer better
-assignments in general but do nothing to promote fairness among reviewers or
-among papers.  To do that, we introduce additional reviewer-side costs and
-paper-side bonuses.  With respect to a
-reviewer $i$, we classify papers as interesting (preference value 1 to 15), boring
-(16 to 25), or very boring (26 to 40).  The edge for reviewer $i$ and paper
+to $j$ allowing that pair to be assigned, unless the reviewer declared a
+conflict of interest, in which case the edge is not present.  The edge cost is
+based on the desirability value $d_{ij}$ stated by reviewer $i$ for proposal
+$j$.  For values on the NSF scale of 1 (best) to 40 (worst), we chose the cost
+function $(10 + d_{ij})^2$, in an attempt to provide an incentive to avoid
+really bad matched pairs without completely masking the difference between a
+good matched pair and an excellent one.  This choice seeks only to achieve a
+natural relationship between a linear preference scale as normally interpreted
+and the costs to be used in the optimization.  We realize that strategic
+reviewers will take the cost function into account in choosing what desirability
+values to submit, in which case its form matters little.
+
+Alongside these purely additive per-review costs,
+we want to avoid an individual reviewer
+getting too many papers he/she does not like.
+With respect to a reviewer $i$, we classify papers as ``interesting'',
+``boring'', or ``very boring'' based on their desirability values;
+the thresholds for these classes are currently
+the same for all reviewers.  The edge for reviewer $i$ and paper
  $j$ leaves from $r^1_i$ if $j$ is interesting, $r^2_i$ if $j$ is boring, or
  $r^3_i$ if $j$ is very boring.  Since all edges from the source enter $r^1_i$,
-flow for boring and very boring assignments is forced to pass through a set of
-parallel edges from $r^1_i$ to $r^2_i$, and flow for very boring assignments
+flow for boring and very boring papers is forced to pass through a set of
+parallel edges from $r^1_i$ to $r^2_i$, and flow for very boring papers
  must pass through an additional set of parallel edges from $r^2_i$ to $r^3_i$.
  In each of these sets, we make the cost of the $l$th edge an increasing
  function of $l$ to provide an incentive to balance the load of boring
-assignments in the same way as for overload.
-
-Similarly, we wish to guarantee each paper at least one or two good reviews.
-With respect to a paper $j$, we classify reviewers as expert (preference value
-1 to 10), knowledgeable (11 to 20), or general (21 to 40).  Edges representing
+papers in the same way as for overload.
+
+Finally, we would like at least one or two of each paper's reviews to be well
+qualified, if possible.  The method is the same as that for reviewer fairness.
+With respect to a paper $j$, we classify reviewers as ``expert'',
+``knowledgeable'', or ``general'' by comparing the expertise values to uniform
+thresholds.  (Since this is the only place the expertise values are used, we
+effectively assume expertise takes on only these three values.)
+Edges representing
  expert reviews enter $p^1_j$, edges for knowledgeable reviews enter $p^2_j$,
  and edges for general reviews enter $p^3_j$; the edge to the sink leaves
  $p^3_j$.  A paper's first knowledgeable (or expert) review scores a bonus $c_2$
@@ -187,33 +208,75 @@ from $p^1_j$ to $p^2_j$ and from $p^2_j$ to $p^3_j$ in order to reach the sink.
  The choice to offer bonuses for two reviews was based on the value $r = 3$;
  this would be easy to change for other values of $r$.
  
-The cost of a flow is the sum of its reviewer overload costs,
-assignment costs, and reviewer boring / very boring load costs,
-minus paper bonuses.  Any one of those components can be traded off against
-the others.  We attempted to assign reasonable weights to each component,
-but it is difficult to know without testing the algorithm on real data.
-In any event, all the parameters are easy to tune to realize the priorities
-of a particular application.
-
-\begin{figure*}
-\begin{center}
-\centerline{\includexfig{flow}}
-\caption{Flow Construction.}
-\label{flow-fig}
-\end{center}
-\end{figure*}
-
-
-\section{Experimental Results} 
-Waiting for data from NSF. 
-
-Synthetic Data.
-
-\section{Conclusions}
-
-The source code for the current version of the proposal matcher may be
-browsed or downloaded at:
-\[\hbox{\url{TODO}}\]
+In the example in Figure~\ref{flow-fig},
+paper 1 is interesting to reviewer 1 and boring to reviewers 2 and 3.
+Reviewer 2 is expert on paper 1, with reviewers 1 and 3 merely knowledgeable.
+(Reviewer edges for paper 2 are not shown.)
+This illustrates how, in principle,
+the desirability and expertise relations might differ.
+Each is taken into account at a different stage of the construction.
+
+The cost of a flow (assignment) is the sum of its reviewer overload costs,
+per-review costs, and reviewer boring / very boring load costs,
+minus paper bonuses.  Any one of these components can be traded off against
+the others.
+
+\section{Evaluation}
+
+We have implemented the matching algorithm, based on the construction above, in
+Haskell.
+The construction is intended to illustrate
+ways to model real concerns that we find reasonable a priori.
+We do not have enough experience with real-world
+instances to be confident that each part of the construction serves its intended
+purpose or that the parameter values we have chosen are suitable.
+Fortunately, the parameter values are easy to change in the source code of our
+implementation, and even substantive changes to the graph structure are not too
+hard to make.  At a minimum, we believe that min-cost max-flow provides a
+reasonable framework for global optimization of an assignment.
+
+We have worked with Michael Hicks, program chair of POPL 2012, to apply our
+method to the paper assignment problem for that conference.
+The set of POPL 2012 reviewers consisted of the
+program committee (PC) and an external review committee (ERC),
+where the ERC served two purposes:
+\begin{itemize}
+\item To provide up to one knowledgeable (or expert) review per paper if
+prior knowledge of the topic was hard to find on the PC.
+\item To provide all reviews of papers with an author on the PC (``PC papers''),
+which were considered to
+pose an implicit conflict of interest to all PC members.
+\end{itemize}
+The special policy for ERC reviews of non-PC papers was realized via a more
+complex paper-side gadget, not described here.
+Based on an evaluation tool he wrote as well as manual inspection,
+Dr.~Hicks was satisfied that the decisions made by the matching tool
+closely reflected the best he could have done manually.
+
+We are looking for additional opportunities to apply the matching tool.
+Anyone interested is invited to contact us so we can help adapt it
+to the scenario and document the experience gained.
+
+%Waiting for data from NSF. 
+
+%Synthetic Data.
+
+\section{Getting the Tool}
+
+A distribution containing the source code for the matching tool as well as this
+document may be browsed or downloaded at (NOT YET):
+\[\hbox{\url{https://mattmccutchen.net/match/}}\]
+There are currently two branches:
+\begin{itemize}
+\item \code{master} has the tool as originally designed for NSF, with no
+distinction between desirability and expertise.
+\item \code{popl2012} is the basis of the version used for POPL 2012.  The main
+differences are that it has separate desirability and expertise, support for
+``fixing'' previously chosen reviewer-paper pairs (buggy, however),
+and the special ERC gadget.
+\end{itemize}
+We regret that we do not currently have a single well-maintained version of the
+tool to recommend.
  
  \begin{thebibliography}{99}