X-Git-Url: https://mattmccutchen.net/match/match.git/blobdiff_plain/f3084293a1cf94d98fb44a7d427f012361c28afc..17e0995d8f3673797f5280189e28e727db454f7f:/paper/paper.tex diff --git a/paper/paper.tex b/paper/paper.tex index 2882ac9..7ce6770 100644 --- a/paper/paper.tex +++ b/paper/paper.tex @@ -25,7 +25,9 @@ \usepackage{pstricks} \usepackage{delarray} -\title{Assigning Reviewers to Proposals} +\def\code#1{\texttt{#1}} + +\title{Assigning Papers to Reviewers} \author{ Samir Khuller\thanks{Research currently supported by CCF-0728839. @@ -74,7 +76,7 @@ score indicating his/her level of interest in reviewing the paper and an (Some organizations may choose to use a single set of scores for both desirability and expertise. We believe that making this distinction may better model the real-world objective.) -A reviewer may also report a conflict of interest with a particular paper, +A reviewer may also declare a conflict of interest with a particular paper, meaning that he/she is forbidden to review the paper. We do not consider stable marriage type preference lists, @@ -83,16 +85,16 @@ to produce. In this scheme, the papers are essentially grouped into a few categories. Let $N$ be the number of papers and $P$ be the number of reviewers. -Suppose that each paper needs $r$ reviews, so a total of $rN$ +Suppose that each paper needs $q$ reviews, so a total of $qN$ reviews need to be generated. Ideally, from the perspective of the papers, we would like to -assign each paper the $r$ most qualified reviewers for the paper. +assign each paper the $q$ most qualified reviewers for the paper. Of course, this could lead to a load imbalanced solution where the load on some program committee members is very high, and the load on others is low. On the other hand, we could insist on a perfectly load balanced solution in which the number of papers assigned to each program committee member does not -exceed $L= \lceil rN/P \rceil$. However, this may lead to a solution +exceed $L= \lceil qN/P \rceil$. However, this may lead to a solution which is not optimal from the perspective of the papers. One of our goals is to study precisely this tradeoff, and allow each @@ -107,11 +109,21 @@ have most liked to review. We would like to spread the load of the stinkers as evenly as possible. -\section{Formulation as a Min-Cost Flow Problem} +\section{Formulation as a Min-Cost Max-Flow Problem} + +\begin{figure*} +\begin{center} +\def\eg#1{\fbox{#1}} +\centerline{\includexfig{flow}} +\caption{Flow Construction.} +\label{flow-fig} +\end{center} +\end{figure*} -Our main approach is to formulate this as a min-cost flow problem. +We formulate the assignment problem as a min-cost max-flow problem, +where each unit of flow represents one review. The construction is somewhat involved in order to incorporate all -the appropriate incentives. It makes +the desired incentives. In several places, it makes use of sets of ``parallel edges'' of different costs connecting a single pair of nodes $(x, y)$ to allow flow to be sent from $x$ to $y$ at a cost that grows faster than linear in the amount of the flow. For @@ -119,11 +131,12 @@ example, if there are five unit-capacity edges from $x$ to $y$ of costs 1, 3, 5, 7, and 9, then any integer amount $f \le 5$ of flow can be sent from $x$ to $y$ at a cost of $f^2$. -The construction is done as follows: we have a source $s$ and a sink $t$. +An example of the graph is shown in Figure~\ref{flow-fig}. +We have a source $s$ and a sink $t$. For each paper $j$ we create a set of nodes $p^1_j, p^2_j,p^3_j$, and for each reviewer $i$ we create a set of nodes $r^1_i, r^2_i, r^3_i$. (The rationale -for these sets is discussed below.) See Figure~\ref{flow-fig} for an -example. Flow can pass from $s$ through one or more of the nodes $r^t_i$ and +for these sets is discussed below.) +Flow can pass from $s$ through one or more of the nodes $r^t_i$ and one or more of the nodes $p^t_j$ to the sink to represent a review by reviewer $i$ of paper $j$. @@ -148,33 +161,41 @@ unless the load imbalance in the second solution is outweighed by other benefits. For each reviewer $i$ and proposal $j$, there is a unit-capacity edge from $i$ -to $j$ allowing that assignment to be made, unless the reviewer declared a -conflict of interest, in which case the edge is not present. The edge has -cost $(10 + p_{ij})^2$, where $p_{ij}$ is the preference value -expressed by reviewer $i$ for proposal $j$. -We assume $p_{ij}$ is a value between 1 and 40 (as used by NSF). -The cost function was chosen to provide an incentive to avoid really bad -assignments without completely masking the difference between a good assignment -and an excellent assignment. - -These purely additive assignment costs make the algorithm prefer better -assignments in general but do nothing to promote fairness among reviewers or -among papers. To do that, we introduce additional reviewer-side costs and -paper-side bonuses. With respect to a -reviewer $i$, we classify papers as interesting (preference value 1 to 15), boring -(16 to 25), or very boring (26 to 40). The edge for reviewer $i$ and paper +to $j$ allowing that pair to be assigned, unless the reviewer declared a +conflict of interest, in which case the edge is not present. The edge cost is +based on the desirability value $d_{ij}$ stated by reviewer $i$ for proposal +$j$. For values on the NSF scale of 1 (best) to 40 (worst), we chose the cost +function $(10 + d_{ij})^2$, in an attempt to provide an incentive to avoid +really bad matched pairs without completely masking the difference between a +good matched pair and an excellent one. This choice seeks only to achieve a +natural relationship between a linear preference scale as normally interpreted +and the costs to be used in the optimization. We realize that strategic +reviewers will take the cost function into account in choosing what desirability +values to submit, in which case its form matters little. + +Alongside these purely additive per-review costs, +we want to avoid an individual reviewer +getting too many papers he/she does not like. +With respect to a reviewer $i$, we classify papers as ``interesting'', +``boring'', or ``very boring'' based on their desirability values; +the thresholds for these classes are currently +the same for all reviewers. The edge for reviewer $i$ and paper $j$ leaves from $r^1_i$ if $j$ is interesting, $r^2_i$ if $j$ is boring, or $r^3_i$ if $j$ is very boring. Since all edges from the source enter $r^1_i$, -flow for boring and very boring assignments is forced to pass through a set of -parallel edges from $r^1_i$ to $r^2_i$, and flow for very boring assignments +flow for boring and very boring papers is forced to pass through a set of +parallel edges from $r^1_i$ to $r^2_i$, and flow for very boring papers must pass through an additional set of parallel edges from $r^2_i$ to $r^3_i$. In each of these sets, we make the cost of the $l$th edge an increasing function of $l$ to provide an incentive to balance the load of boring -assignments in the same way as for overload. - -Similarly, we wish to guarantee each paper at least one or two good reviews. -With respect to a paper $j$, we classify reviewers as expert (preference value -1 to 10), knowledgeable (11 to 20), or general (21 to 40). Edges representing +papers in the same way as for overload. + +Finally, we would like at least one or two of each paper's reviews to be well +qualified, if possible. The method is the same as that for reviewer fairness. +With respect to a paper $j$, we classify reviewers as ``expert'', +``knowledgeable'', or ``general'' by comparing the expertise values to uniform +thresholds. (Since this is the only place the expertise values are used, we +effectively assume expertise takes on only these three values.) +Edges representing expert reviews enter $p^1_j$, edges for knowledgeable reviews enter $p^2_j$, and edges for general reviews enter $p^3_j$; the edge to the sink leaves $p^3_j$. A paper's first knowledgeable (or expert) review scores a bonus $c_2$ @@ -187,33 +208,75 @@ from $p^1_j$ to $p^2_j$ and from $p^2_j$ to $p^3_j$ in order to reach the sink. The choice to offer bonuses for two reviews was based on the value $r = 3$; this would be easy to change for other values of $r$. -The cost of a flow is the sum of its reviewer overload costs, -assignment costs, and reviewer boring / very boring load costs, -minus paper bonuses. Any one of those components can be traded off against -the others. We attempted to assign reasonable weights to each component, -but it is difficult to know without testing the algorithm on real data. -In any event, all the parameters are easy to tune to realize the priorities -of a particular application. - -\begin{figure*} -\begin{center} -\centerline{\includexfig{flow}} -\caption{Flow Construction.} -\label{flow-fig} -\end{center} -\end{figure*} - - -\section{Experimental Results} -Waiting for data from NSF. - -Synthetic Data. - -\section{Conclusions} - -The source code for the current version of the proposal matcher may be -browsed or downloaded at: -\[\hbox{\url{TODO}}\] +In the example in Figure~\ref{flow-fig}, +paper 1 is interesting to reviewer 1 and boring to reviewers 2 and 3. +Reviewer 2 is expert on paper 1, with reviewers 1 and 3 merely knowledgeable. +(Reviewer edges for paper 2 are not shown.) +This illustrates how, in principle, +the desirability and expertise relations might differ. +Each is taken into account at a different stage of the construction. + +The cost of a flow (assignment) is the sum of its reviewer overload costs, +per-review costs, and reviewer boring / very boring load costs, +minus paper bonuses. Any one of these components can be traded off against +the others. + +\section{Evaluation} + +We have implemented the matching algorithm, based on the construction above, in +Haskell. +The construction is intended to illustrate +ways to model real concerns that we find reasonable a priori. +We do not have enough experience with real-world +instances to be confident that each part of the construction serves its intended +purpose or that the parameter values we have chosen are suitable. +Fortunately, the parameter values are easy to change in the source code of our +implementation, and even substantive changes to the graph structure are not too +hard to make. At a minimum, we believe that min-cost max-flow provides a +reasonable framework for global optimization of an assignment. + +We have worked with Michael Hicks, program chair of POPL 2012, to apply our +method to the paper assignment problem for that conference. +The set of POPL 2012 reviewers consisted of the +program committee (PC) and an external review committee (ERC), +where the ERC served two purposes: +\begin{itemize} +\item To provide up to one knowledgeable (or expert) review per paper if +prior knowledge of the topic was hard to find on the PC. +\item To provide all reviews of papers with an author on the PC (``PC papers''), +which were considered to +pose an implicit conflict of interest to all PC members. +\end{itemize} +The special policy for ERC reviews of non-PC papers was realized via a more +complex paper-side gadget, not described here. +Based on an evaluation tool he wrote as well as manual inspection, +Dr.~Hicks was satisfied that the decisions made by the matching tool +closely reflected the best he could have done manually. + +We are looking for additional opportunities to apply the matching tool. +Anyone interested is invited to contact us so we can help adapt it +to the scenario and document the experience gained. + +%Waiting for data from NSF. + +%Synthetic Data. + +\section{Getting the Tool} + +A distribution containing the source code for the matching tool as well as this +document may be browsed or downloaded at (NOT YET): +\[\hbox{\url{https://mattmccutchen.net/match/}}\] +There are currently two branches: +\begin{itemize} +\item \code{master} has the tool as originally designed for NSF, with no +distinction between desirability and expertise. +\item \code{popl2012} is the basis of the version used for POPL 2012. The main +differences are that it has separate desirability and expertise, support for +``fixing'' previously chosen reviewer-paper pairs (buggy, however), +and the special ERC gadget. +\end{itemize} +We regret that we do not currently have a single well-maintained version of the +tool to recommend. \begin{thebibliography}{99}