Add a slightly cleaned-up version of the paper, superseding the "notes"
[match/match.git] / paper / paper.tex
CommitLineData
f3084293
MM
1\documentclass[11pt]{article}
2\usepackage{url}
3\usepackage[square,comma,numbers]{natbib}
4\usepackage{color}
5
6% Figure stuff. To add a new figure named `foo.fig', put
7% `\includexfig{foo}' in the appropriate place and add `foo' to the
8% figures line at the top of the Makefile.
9\usepackage{ifpdf}
10\ifpdf
11 \usepackage[pdftex]{graphicx}
12 \def\includexfig#1{\input{#1.pdf_t}}
13\else
14 \usepackage[dvips]{graphicx}
15 \def\includexfig#1{\input{#1.ps_t}}
16\fi
17
18\usepackage{amssymb, amsmath, amsthm}
19\newtheorem{theorem}{Theorem}
20\newtheorem{lemma}{Lemma}
21
22\usepackage[letterpaper,left=2.6cm,right=2.6cm,top=2.6cm,bottom=2.6cm]{geometry}
23
24\usepackage{floatflt}
25\usepackage{pstricks}
26\usepackage{delarray}
27
28\title{Assigning Reviewers to Proposals}
29
30\author{
31Samir Khuller\thanks{Research currently supported by CCF-0728839.
32Email:{\tt samir@cs.umd.edu}.}\\
33Dept.\ of Computer Science\\
34University of Maryland, College Park MD 20742.
35\and
36Richard Matthew McCutchen\thanks{
37The bulk of this work was done while Matt was at the Dept.\ of Computer Science,
38University of Maryland, College Park, and supported by REU Supplement to
39CCF-0430650. Current email: {\tt matt@mattmccutchen.net}.}\\
40%Department of Computer Science\\
41%University of Maryland, College Park, MD 20742.
42}
43
44\date{}
45
46\begin{document}
47
48\maketitle
49
50%\begin{abstract}
51%
52%\end{abstract}
53
54\section{Introduction}
55Assignment problems arise in a variety of settings.
56For funding agencies such as NSF program directors
57that co-ordinate panels, assigning proposals to
58reviewers is a major challenge. It is important that each proposal receive
59sufficient review by qualified experts, and at the same time we would like to
60roughly balance the workload across reviewers and to honor the reviewers'
61preferences for which proposals they would like to read. The same
62issue arises for a program committee chair, who may have to assign
63literally hundreds of papers to a program committee consisting of
64thirty to forty program committee members.
65
66%{\em What does CMT use? What does Easychair use?}
67
68From now on we will focus on the problem of assigning papers to
69reviewers.
70We assume that each reviewer is given access to the
71list of papers to be reviewed, and gives each paper both a ``desirability''
72score indicating his/her level of interest in reviewing the paper and an
73``expertise'' score indicating how qualified he/she is to evaluate the paper.
74(Some organizations may choose to use a single set of scores for both
75desirability and expertise. We believe that making this distinction may better
76model the real-world objective.)
77A reviewer may also report a conflict of interest with a particular paper,
78meaning that he/she is forbidden to review the paper.
79
80We do not consider stable marriage type preference lists,
81because a strict ranking of papers would be rather tedious
82to produce. In this scheme, the papers are essentially grouped
83into a few categories.
84
85Let $N$ be the number of papers and $P$ be the number of reviewers.
86Suppose that each paper needs $r$ reviews, so a total of $rN$
87reviews need to be generated.
88Ideally, from the perspective of the papers, we would like to
89assign each paper the $r$ most qualified reviewers for the paper.
90Of course, this could lead to a load imbalanced solution where
91the load on some program committee members is very high, and the
92load on others is low. On the other hand, we could insist
93on a perfectly load balanced solution in which the number
94of papers assigned to each program committee member does not
95exceed $L= \lceil rN/P \rceil$. However, this may lead to a solution
96which is not optimal from the perspective of the papers.
97
98One of our goals is to study precisely this tradeoff, and allow each
99reviewer to be assigned up to $L + C$ papers,
100where $C$ is the {\em load tolerance}. We consider the question:
101is it possible to obtain a high quality
102assignment with a fairly low value of $C$? One can also ask whether,
103in such an assignment, the reviewers receive the papers that they would
104have most liked to review.
105
106{\em Stinkers} are papers that pretty much no-one wanted to review.
107We would like to spread the load of the stinkers as evenly as possible.
108
109
110\section{Formulation as a Min-Cost Flow Problem}
111
112Our main approach is to formulate this as a min-cost flow problem.
113The construction is somewhat involved in order to incorporate all
114the appropriate incentives. It makes
115use of sets of ``parallel edges'' of different costs connecting a
116single pair of nodes $(x, y)$ to allow flow to be sent from $x$ to $y$
117at a cost that grows faster than linear in the amount of the flow. For
118example, if there are five unit-capacity edges from $x$ to $y$ of costs
1191, 3, 5, 7, and 9, then any integer amount $f \le 5$ of flow can be
120sent from $x$ to $y$ at a cost of $f^2$.
121
122The construction is done as follows: we have a source $s$ and a sink $t$.
123For each paper $j$ we create a set of nodes $p^1_j, p^2_j,p^3_j$, and for each
124reviewer $i$ we create a set of nodes $r^1_i, r^2_i, r^3_i$. (The rationale
125for these sets is discussed below.) See Figure~\ref{flow-fig} for an
126example. Flow can pass from $s$ through one or more of the nodes $r^t_i$ and
127one or more of the nodes $p^t_j$ to the sink to represent a review
128by reviewer $i$ of paper $j$.
129
130Each paper has an edge of capacity $r$ to
131the sink, indicating that it needs $r$ reviews. In general, these
132edges will constitute the min cut, so any max flow will saturate them
133and thereby provide all the required reviews. We take the min-cost
134max flow in order to provide the reviews in the ``best'' possible way.
135
136Each reviewer has a zero-cost edge of capacity $L$ from the source so that
137he/she can be assigned $L$ papers. If that were all, we would get a perfectly
138load-balanced solution, but we may be able to improve the quality of the
139assignments by allowing some imbalance.
140Therefore, we allow each reviewer to be overloaded by up to $C$ papers
141($C$ is the load imbalance parameter) via a set
142of $C$ additional unit-capacity edges from the source. We make the cost of the
143$l$th edge an increasing function $f(l)$ to provide an incentive to
144balance load across reviewers. Since $2f(1) < f(1) + f(2)$,
145a solution that loads two reviewers each by $L+1$ will be preferred
146to a solution that loads one reviewer by $L$ and the other by $L+2$
147unless the load imbalance in the second solution is outweighed by
148other benefits.
149
150For each reviewer $i$ and proposal $j$, there is a unit-capacity edge from $i$
151to $j$ allowing that assignment to be made, unless the reviewer declared a
152conflict of interest, in which case the edge is not present. The edge has
153cost $(10 + p_{ij})^2$, where $p_{ij}$ is the preference value
154expressed by reviewer $i$ for proposal $j$.
155We assume $p_{ij}$ is a value between 1 and 40 (as used by NSF).
156The cost function was chosen to provide an incentive to avoid really bad
157assignments without completely masking the difference between a good assignment
158and an excellent assignment.
159
160These purely additive assignment costs make the algorithm prefer better
161assignments in general but do nothing to promote fairness among reviewers or
162among papers. To do that, we introduce additional reviewer-side costs and
163paper-side bonuses. With respect to a
164reviewer $i$, we classify papers as interesting (preference value 1 to 15), boring
165(16 to 25), or very boring (26 to 40). The edge for reviewer $i$ and paper
166$j$ leaves from $r^1_i$ if $j$ is interesting, $r^2_i$ if $j$ is boring, or
167$r^3_i$ if $j$ is very boring. Since all edges from the source enter $r^1_i$,
168flow for boring and very boring assignments is forced to pass through a set of
169parallel edges from $r^1_i$ to $r^2_i$, and flow for very boring assignments
170must pass through an additional set of parallel edges from $r^2_i$ to $r^3_i$.
171In each of these sets, we make the cost of the $l$th edge an increasing
172function of $l$ to provide an incentive to balance the load of boring
173assignments in the same way as for overload.
174
175Similarly, we wish to guarantee each paper at least one or two good reviews.
176With respect to a paper $j$, we classify reviewers as expert (preference value
1771 to 10), knowledgeable (11 to 20), or general (21 to 40). Edges representing
178expert reviews enter $p^1_j$, edges for knowledgeable reviews enter $p^2_j$,
179and edges for general reviews enter $p^3_j$; the edge to the sink leaves
180$p^3_j$. A paper's first knowledgeable (or expert) review scores a bonus $c_2$
181by traversing a unit-capacity edge of cost $-c_2$ from $p^2_j$ to $p^3_j$,
182and an additional expert review scores another bonus $c_1$ by traversing a
183unit-capacity edge of cost $-c_1$ from $p^1_j$ to $p^3_j$.
184In addition to the bonus edges,
185there are edges of zero cost and unlimited capacity that reviews can follow
186from $p^1_j$ to $p^2_j$ and from $p^2_j$ to $p^3_j$ in order to reach the sink.
187The choice to offer bonuses for two reviews was based on the value $r = 3$;
188this would be easy to change for other values of $r$.
189
190The cost of a flow is the sum of its reviewer overload costs,
191assignment costs, and reviewer boring / very boring load costs,
192minus paper bonuses. Any one of those components can be traded off against
193the others. We attempted to assign reasonable weights to each component,
194but it is difficult to know without testing the algorithm on real data.
195In any event, all the parameters are easy to tune to realize the priorities
196of a particular application.
197
198\begin{figure*}
199\begin{center}
200\centerline{\includexfig{flow}}
201\caption{Flow Construction.}
202\label{flow-fig}
203\end{center}
204\end{figure*}
205
206
207\section{Experimental Results}
208Waiting for data from NSF.
209
210Synthetic Data.
211
212\section{Conclusions}
213
214The source code for the current version of the proposal matcher may be
215browsed or downloaded at:
216\[\hbox{\url{TODO}}\]
217
218\begin{thebibliography}{99}
219
220\bibitem{Flow}
221R. Ahuja, T. Magnanti and J. Orlin.
222Network Flows: Theory and Applications.
223{\em Prentice Hall}.
224
225
226
227\end{thebibliography}
228
229\end{document}