X-Git-Url: https://mattmccutchen.net/match/match.git/blobdiff_plain/256efdfa511d8d7d820ac565b5125271bc58ccf6..f3084293a1cf94d98fb44a7d427f012361c28afc:/paper/paper.tex diff --git a/paper/paper.tex b/paper/paper.tex new file mode 100644 index 0000000..2882ac9 --- /dev/null +++ b/paper/paper.tex @@ -0,0 +1,229 @@ +\documentclass[11pt]{article} +\usepackage{url} +\usepackage[square,comma,numbers]{natbib} +\usepackage{color} + +% Figure stuff. To add a new figure named `foo.fig', put +% `\includexfig{foo}' in the appropriate place and add `foo' to the +% figures line at the top of the Makefile. +\usepackage{ifpdf} +\ifpdf + \usepackage[pdftex]{graphicx} + \def\includexfig#1{\input{#1.pdf_t}} +\else + \usepackage[dvips]{graphicx} + \def\includexfig#1{\input{#1.ps_t}} +\fi + +\usepackage{amssymb, amsmath, amsthm} +\newtheorem{theorem}{Theorem} +\newtheorem{lemma}{Lemma} + +\usepackage[letterpaper,left=2.6cm,right=2.6cm,top=2.6cm,bottom=2.6cm]{geometry} + +\usepackage{floatflt} +\usepackage{pstricks} +\usepackage{delarray} + +\title{Assigning Reviewers to Proposals} + +\author{ +Samir Khuller\thanks{Research currently supported by CCF-0728839. +Email:{\tt samir@cs.umd.edu}.}\\ +Dept.\ of Computer Science\\ +University of Maryland, College Park MD 20742. +\and +Richard Matthew McCutchen\thanks{ +The bulk of this work was done while Matt was at the Dept.\ of Computer Science, +University of Maryland, College Park, and supported by REU Supplement to +CCF-0430650. Current email: {\tt matt@mattmccutchen.net}.}\\ +%Department of Computer Science\\ +%University of Maryland, College Park, MD 20742. +} + +\date{} + +\begin{document} + +\maketitle + +%\begin{abstract} +% +%\end{abstract} + +\section{Introduction} +Assignment problems arise in a variety of settings. +For funding agencies such as NSF program directors +that co-ordinate panels, assigning proposals to +reviewers is a major challenge. It is important that each proposal receive +sufficient review by qualified experts, and at the same time we would like to +roughly balance the workload across reviewers and to honor the reviewers' +preferences for which proposals they would like to read. The same +issue arises for a program committee chair, who may have to assign +literally hundreds of papers to a program committee consisting of +thirty to forty program committee members. + +%{\em What does CMT use? What does Easychair use?} + +From now on we will focus on the problem of assigning papers to +reviewers. +We assume that each reviewer is given access to the +list of papers to be reviewed, and gives each paper both a ``desirability'' +score indicating his/her level of interest in reviewing the paper and an +``expertise'' score indicating how qualified he/she is to evaluate the paper. +(Some organizations may choose to use a single set of scores for both +desirability and expertise. We believe that making this distinction may better +model the real-world objective.) +A reviewer may also report a conflict of interest with a particular paper, +meaning that he/she is forbidden to review the paper. + +We do not consider stable marriage type preference lists, +because a strict ranking of papers would be rather tedious +to produce. In this scheme, the papers are essentially grouped +into a few categories. + +Let $N$ be the number of papers and $P$ be the number of reviewers. +Suppose that each paper needs $r$ reviews, so a total of $rN$ +reviews need to be generated. +Ideally, from the perspective of the papers, we would like to +assign each paper the $r$ most qualified reviewers for the paper. +Of course, this could lead to a load imbalanced solution where +the load on some program committee members is very high, and the +load on others is low. On the other hand, we could insist +on a perfectly load balanced solution in which the number +of papers assigned to each program committee member does not +exceed $L= \lceil rN/P \rceil$. However, this may lead to a solution +which is not optimal from the perspective of the papers. + +One of our goals is to study precisely this tradeoff, and allow each +reviewer to be assigned up to $L + C$ papers, +where $C$ is the {\em load tolerance}. We consider the question: +is it possible to obtain a high quality +assignment with a fairly low value of $C$? One can also ask whether, +in such an assignment, the reviewers receive the papers that they would +have most liked to review. + +{\em Stinkers} are papers that pretty much no-one wanted to review. +We would like to spread the load of the stinkers as evenly as possible. + + +\section{Formulation as a Min-Cost Flow Problem} + +Our main approach is to formulate this as a min-cost flow problem. +The construction is somewhat involved in order to incorporate all +the appropriate incentives. It makes +use of sets of ``parallel edges'' of different costs connecting a +single pair of nodes $(x, y)$ to allow flow to be sent from $x$ to $y$ +at a cost that grows faster than linear in the amount of the flow. For +example, if there are five unit-capacity edges from $x$ to $y$ of costs +1, 3, 5, 7, and 9, then any integer amount $f \le 5$ of flow can be +sent from $x$ to $y$ at a cost of $f^2$. + +The construction is done as follows: we have a source $s$ and a sink $t$. +For each paper $j$ we create a set of nodes $p^1_j, p^2_j,p^3_j$, and for each +reviewer $i$ we create a set of nodes $r^1_i, r^2_i, r^3_i$. (The rationale +for these sets is discussed below.) See Figure~\ref{flow-fig} for an +example. Flow can pass from $s$ through one or more of the nodes $r^t_i$ and +one or more of the nodes $p^t_j$ to the sink to represent a review +by reviewer $i$ of paper $j$. + +Each paper has an edge of capacity $r$ to +the sink, indicating that it needs $r$ reviews. In general, these +edges will constitute the min cut, so any max flow will saturate them +and thereby provide all the required reviews. We take the min-cost +max flow in order to provide the reviews in the ``best'' possible way. + +Each reviewer has a zero-cost edge of capacity $L$ from the source so that +he/she can be assigned $L$ papers. If that were all, we would get a perfectly +load-balanced solution, but we may be able to improve the quality of the +assignments by allowing some imbalance. +Therefore, we allow each reviewer to be overloaded by up to $C$ papers +($C$ is the load imbalance parameter) via a set +of $C$ additional unit-capacity edges from the source. We make the cost of the +$l$th edge an increasing function $f(l)$ to provide an incentive to +balance load across reviewers. Since $2f(1) < f(1) + f(2)$, +a solution that loads two reviewers each by $L+1$ will be preferred +to a solution that loads one reviewer by $L$ and the other by $L+2$ +unless the load imbalance in the second solution is outweighed by +other benefits. + +For each reviewer $i$ and proposal $j$, there is a unit-capacity edge from $i$ +to $j$ allowing that assignment to be made, unless the reviewer declared a +conflict of interest, in which case the edge is not present. The edge has +cost $(10 + p_{ij})^2$, where $p_{ij}$ is the preference value +expressed by reviewer $i$ for proposal $j$. +We assume $p_{ij}$ is a value between 1 and 40 (as used by NSF). +The cost function was chosen to provide an incentive to avoid really bad +assignments without completely masking the difference between a good assignment +and an excellent assignment. + +These purely additive assignment costs make the algorithm prefer better +assignments in general but do nothing to promote fairness among reviewers or +among papers. To do that, we introduce additional reviewer-side costs and +paper-side bonuses. With respect to a +reviewer $i$, we classify papers as interesting (preference value 1 to 15), boring +(16 to 25), or very boring (26 to 40). The edge for reviewer $i$ and paper +$j$ leaves from $r^1_i$ if $j$ is interesting, $r^2_i$ if $j$ is boring, or +$r^3_i$ if $j$ is very boring. Since all edges from the source enter $r^1_i$, +flow for boring and very boring assignments is forced to pass through a set of +parallel edges from $r^1_i$ to $r^2_i$, and flow for very boring assignments +must pass through an additional set of parallel edges from $r^2_i$ to $r^3_i$. +In each of these sets, we make the cost of the $l$th edge an increasing +function of $l$ to provide an incentive to balance the load of boring +assignments in the same way as for overload. + +Similarly, we wish to guarantee each paper at least one or two good reviews. +With respect to a paper $j$, we classify reviewers as expert (preference value +1 to 10), knowledgeable (11 to 20), or general (21 to 40). Edges representing +expert reviews enter $p^1_j$, edges for knowledgeable reviews enter $p^2_j$, +and edges for general reviews enter $p^3_j$; the edge to the sink leaves +$p^3_j$. A paper's first knowledgeable (or expert) review scores a bonus $c_2$ +by traversing a unit-capacity edge of cost $-c_2$ from $p^2_j$ to $p^3_j$, +and an additional expert review scores another bonus $c_1$ by traversing a +unit-capacity edge of cost $-c_1$ from $p^1_j$ to $p^3_j$. +In addition to the bonus edges, +there are edges of zero cost and unlimited capacity that reviews can follow +from $p^1_j$ to $p^2_j$ and from $p^2_j$ to $p^3_j$ in order to reach the sink. +The choice to offer bonuses for two reviews was based on the value $r = 3$; +this would be easy to change for other values of $r$. + +The cost of a flow is the sum of its reviewer overload costs, +assignment costs, and reviewer boring / very boring load costs, +minus paper bonuses. Any one of those components can be traded off against +the others. We attempted to assign reasonable weights to each component, +but it is difficult to know without testing the algorithm on real data. +In any event, all the parameters are easy to tune to realize the priorities +of a particular application. + +\begin{figure*} +\begin{center} +\centerline{\includexfig{flow}} +\caption{Flow Construction.} +\label{flow-fig} +\end{center} +\end{figure*} + + +\section{Experimental Results} +Waiting for data from NSF. + +Synthetic Data. + +\section{Conclusions} + +The source code for the current version of the proposal matcher may be +browsed or downloaded at: +\[\hbox{\url{TODO}}\] + +\begin{thebibliography}{99} + +\bibitem{Flow} +R. Ahuja, T. Magnanti and J. Orlin. +Network Flows: Theory and Applications. +{\em Prentice Hall}. + + + +\end{thebibliography} + +\end{document}