Commit | Line | Data |
---|---|---|
f3084293 MM |
1 | \documentclass[11pt]{article} |
2 | \usepackage{url} | |
3 | \usepackage[square,comma,numbers]{natbib} | |
4 | \usepackage{color} | |
5 | ||
6 | % Figure stuff. To add a new figure named `foo.fig', put | |
7 | % `\includexfig{foo}' in the appropriate place and add `foo' to the | |
8 | % figures line at the top of the Makefile. | |
9 | \usepackage{ifpdf} | |
10 | \ifpdf | |
11 | \usepackage[pdftex]{graphicx} | |
12 | \def\includexfig#1{\input{#1.pdf_t}} | |
13 | \else | |
14 | \usepackage[dvips]{graphicx} | |
15 | \def\includexfig#1{\input{#1.ps_t}} | |
16 | \fi | |
17 | ||
18 | \usepackage{amssymb, amsmath, amsthm} | |
19 | \newtheorem{theorem}{Theorem} | |
20 | \newtheorem{lemma}{Lemma} | |
21 | ||
22 | \usepackage[letterpaper,left=2.6cm,right=2.6cm,top=2.6cm,bottom=2.6cm]{geometry} | |
23 | ||
24 | \usepackage{floatflt} | |
25 | \usepackage{pstricks} | |
26 | \usepackage{delarray} | |
27 | ||
28 | \title{Assigning Reviewers to Proposals} | |
29 | ||
30 | \author{ | |
31 | Samir Khuller\thanks{Research currently supported by CCF-0728839. | |
32 | Email:{\tt samir@cs.umd.edu}.}\\ | |
33 | Dept.\ of Computer Science\\ | |
34 | University of Maryland, College Park MD 20742. | |
35 | \and | |
36 | Richard Matthew McCutchen\thanks{ | |
37 | The bulk of this work was done while Matt was at the Dept.\ of Computer Science, | |
38 | University of Maryland, College Park, and supported by REU Supplement to | |
39 | CCF-0430650. Current email: {\tt matt@mattmccutchen.net}.}\\ | |
40 | %Department of Computer Science\\ | |
41 | %University of Maryland, College Park, MD 20742. | |
42 | } | |
43 | ||
44 | \date{} | |
45 | ||
46 | \begin{document} | |
47 | ||
48 | \maketitle | |
49 | ||
50 | %\begin{abstract} | |
51 | % | |
52 | %\end{abstract} | |
53 | ||
54 | \section{Introduction} | |
55 | Assignment problems arise in a variety of settings. | |
56 | For funding agencies such as NSF program directors | |
57 | that co-ordinate panels, assigning proposals to | |
58 | reviewers is a major challenge. It is important that each proposal receive | |
59 | sufficient review by qualified experts, and at the same time we would like to | |
60 | roughly balance the workload across reviewers and to honor the reviewers' | |
61 | preferences for which proposals they would like to read. The same | |
62 | issue arises for a program committee chair, who may have to assign | |
63 | literally hundreds of papers to a program committee consisting of | |
64 | thirty to forty program committee members. | |
65 | ||
66 | %{\em What does CMT use? What does Easychair use?} | |
67 | ||
68 | From now on we will focus on the problem of assigning papers to | |
69 | reviewers. | |
70 | We assume that each reviewer is given access to the | |
71 | list of papers to be reviewed, and gives each paper both a ``desirability'' | |
72 | score indicating his/her level of interest in reviewing the paper and an | |
73 | ``expertise'' score indicating how qualified he/she is to evaluate the paper. | |
74 | (Some organizations may choose to use a single set of scores for both | |
75 | desirability and expertise. We believe that making this distinction may better | |
76 | model the real-world objective.) | |
77 | A reviewer may also report a conflict of interest with a particular paper, | |
78 | meaning that he/she is forbidden to review the paper. | |
79 | ||
80 | We do not consider stable marriage type preference lists, | |
81 | because a strict ranking of papers would be rather tedious | |
82 | to produce. In this scheme, the papers are essentially grouped | |
83 | into a few categories. | |
84 | ||
85 | Let $N$ be the number of papers and $P$ be the number of reviewers. | |
86 | Suppose that each paper needs $r$ reviews, so a total of $rN$ | |
87 | reviews need to be generated. | |
88 | Ideally, from the perspective of the papers, we would like to | |
89 | assign each paper the $r$ most qualified reviewers for the paper. | |
90 | Of course, this could lead to a load imbalanced solution where | |
91 | the load on some program committee members is very high, and the | |
92 | load on others is low. On the other hand, we could insist | |
93 | on a perfectly load balanced solution in which the number | |
94 | of papers assigned to each program committee member does not | |
95 | exceed $L= \lceil rN/P \rceil$. However, this may lead to a solution | |
96 | which is not optimal from the perspective of the papers. | |
97 | ||
98 | One of our goals is to study precisely this tradeoff, and allow each | |
99 | reviewer to be assigned up to $L + C$ papers, | |
100 | where $C$ is the {\em load tolerance}. We consider the question: | |
101 | is it possible to obtain a high quality | |
102 | assignment with a fairly low value of $C$? One can also ask whether, | |
103 | in such an assignment, the reviewers receive the papers that they would | |
104 | have most liked to review. | |
105 | ||
106 | {\em Stinkers} are papers that pretty much no-one wanted to review. | |
107 | We would like to spread the load of the stinkers as evenly as possible. | |
108 | ||
109 | ||
110 | \section{Formulation as a Min-Cost Flow Problem} | |
111 | ||
112 | Our main approach is to formulate this as a min-cost flow problem. | |
113 | The construction is somewhat involved in order to incorporate all | |
114 | the appropriate incentives. It makes | |
115 | use of sets of ``parallel edges'' of different costs connecting a | |
116 | single pair of nodes $(x, y)$ to allow flow to be sent from $x$ to $y$ | |
117 | at a cost that grows faster than linear in the amount of the flow. For | |
118 | example, if there are five unit-capacity edges from $x$ to $y$ of costs | |
119 | 1, 3, 5, 7, and 9, then any integer amount $f \le 5$ of flow can be | |
120 | sent from $x$ to $y$ at a cost of $f^2$. | |
121 | ||
122 | The construction is done as follows: we have a source $s$ and a sink $t$. | |
123 | For each paper $j$ we create a set of nodes $p^1_j, p^2_j,p^3_j$, and for each | |
124 | reviewer $i$ we create a set of nodes $r^1_i, r^2_i, r^3_i$. (The rationale | |
125 | for these sets is discussed below.) See Figure~\ref{flow-fig} for an | |
126 | example. Flow can pass from $s$ through one or more of the nodes $r^t_i$ and | |
127 | one or more of the nodes $p^t_j$ to the sink to represent a review | |
128 | by reviewer $i$ of paper $j$. | |
129 | ||
130 | Each paper has an edge of capacity $r$ to | |
131 | the sink, indicating that it needs $r$ reviews. In general, these | |
132 | edges will constitute the min cut, so any max flow will saturate them | |
133 | and thereby provide all the required reviews. We take the min-cost | |
134 | max flow in order to provide the reviews in the ``best'' possible way. | |
135 | ||
136 | Each reviewer has a zero-cost edge of capacity $L$ from the source so that | |
137 | he/she can be assigned $L$ papers. If that were all, we would get a perfectly | |
138 | load-balanced solution, but we may be able to improve the quality of the | |
139 | assignments by allowing some imbalance. | |
140 | Therefore, we allow each reviewer to be overloaded by up to $C$ papers | |
141 | ($C$ is the load imbalance parameter) via a set | |
142 | of $C$ additional unit-capacity edges from the source. We make the cost of the | |
143 | $l$th edge an increasing function $f(l)$ to provide an incentive to | |
144 | balance load across reviewers. Since $2f(1) < f(1) + f(2)$, | |
145 | a solution that loads two reviewers each by $L+1$ will be preferred | |
146 | to a solution that loads one reviewer by $L$ and the other by $L+2$ | |
147 | unless the load imbalance in the second solution is outweighed by | |
148 | other benefits. | |
149 | ||
150 | For each reviewer $i$ and proposal $j$, there is a unit-capacity edge from $i$ | |
151 | to $j$ allowing that assignment to be made, unless the reviewer declared a | |
152 | conflict of interest, in which case the edge is not present. The edge has | |
153 | cost $(10 + p_{ij})^2$, where $p_{ij}$ is the preference value | |
154 | expressed by reviewer $i$ for proposal $j$. | |
155 | We assume $p_{ij}$ is a value between 1 and 40 (as used by NSF). | |
156 | The cost function was chosen to provide an incentive to avoid really bad | |
157 | assignments without completely masking the difference between a good assignment | |
158 | and an excellent assignment. | |
159 | ||
160 | These purely additive assignment costs make the algorithm prefer better | |
161 | assignments in general but do nothing to promote fairness among reviewers or | |
162 | among papers. To do that, we introduce additional reviewer-side costs and | |
163 | paper-side bonuses. With respect to a | |
164 | reviewer $i$, we classify papers as interesting (preference value 1 to 15), boring | |
165 | (16 to 25), or very boring (26 to 40). The edge for reviewer $i$ and paper | |
166 | $j$ leaves from $r^1_i$ if $j$ is interesting, $r^2_i$ if $j$ is boring, or | |
167 | $r^3_i$ if $j$ is very boring. Since all edges from the source enter $r^1_i$, | |
168 | flow for boring and very boring assignments is forced to pass through a set of | |
169 | parallel edges from $r^1_i$ to $r^2_i$, and flow for very boring assignments | |
170 | must pass through an additional set of parallel edges from $r^2_i$ to $r^3_i$. | |
171 | In each of these sets, we make the cost of the $l$th edge an increasing | |
172 | function of $l$ to provide an incentive to balance the load of boring | |
173 | assignments in the same way as for overload. | |
174 | ||
175 | Similarly, we wish to guarantee each paper at least one or two good reviews. | |
176 | With respect to a paper $j$, we classify reviewers as expert (preference value | |
177 | 1 to 10), knowledgeable (11 to 20), or general (21 to 40). Edges representing | |
178 | expert reviews enter $p^1_j$, edges for knowledgeable reviews enter $p^2_j$, | |
179 | and edges for general reviews enter $p^3_j$; the edge to the sink leaves | |
180 | $p^3_j$. A paper's first knowledgeable (or expert) review scores a bonus $c_2$ | |
181 | by traversing a unit-capacity edge of cost $-c_2$ from $p^2_j$ to $p^3_j$, | |
182 | and an additional expert review scores another bonus $c_1$ by traversing a | |
183 | unit-capacity edge of cost $-c_1$ from $p^1_j$ to $p^3_j$. | |
184 | In addition to the bonus edges, | |
185 | there are edges of zero cost and unlimited capacity that reviews can follow | |
186 | from $p^1_j$ to $p^2_j$ and from $p^2_j$ to $p^3_j$ in order to reach the sink. | |
187 | The choice to offer bonuses for two reviews was based on the value $r = 3$; | |
188 | this would be easy to change for other values of $r$. | |
189 | ||
190 | The cost of a flow is the sum of its reviewer overload costs, | |
191 | assignment costs, and reviewer boring / very boring load costs, | |
192 | minus paper bonuses. Any one of those components can be traded off against | |
193 | the others. We attempted to assign reasonable weights to each component, | |
194 | but it is difficult to know without testing the algorithm on real data. | |
195 | In any event, all the parameters are easy to tune to realize the priorities | |
196 | of a particular application. | |
197 | ||
198 | \begin{figure*} | |
199 | \begin{center} | |
200 | \centerline{\includexfig{flow}} | |
201 | \caption{Flow Construction.} | |
202 | \label{flow-fig} | |
203 | \end{center} | |
204 | \end{figure*} | |
205 | ||
206 | ||
207 | \section{Experimental Results} | |
208 | Waiting for data from NSF. | |
209 | ||
210 | Synthetic Data. | |
211 | ||
212 | \section{Conclusions} | |
213 | ||
214 | The source code for the current version of the proposal matcher may be | |
215 | browsed or downloaded at: | |
216 | \[\hbox{\url{TODO}}\] | |
217 | ||
218 | \begin{thebibliography}{99} | |
219 | ||
220 | \bibitem{Flow} | |
221 | R. Ahuja, T. Magnanti and J. Orlin. | |
222 | Network Flows: Theory and Applications. | |
223 | {\em Prentice Hall}. | |
224 | ||
225 | ||
226 | ||
227 | \end{thebibliography} | |
228 | ||
229 | \end{document} |