A Crowdsourcing Engine for Mechanized Labor

Microtask crowdsourcing implies decomposing a difficult problem into smaller pieces. For that a special human-computer platform like CrowdFlower or Amazon Mechanical Turk is used to submit tasks for human workers motivated by either micropayments or altruism to solve. Examples of successful crowdsourcing applications are food nutrition estimation, natural language processing, criminal invasion detection, and other problems so-called “AI-hard”. However, these platforms are proprietary and requiring additional software for maintaining the output quality. This paper presents the design, architecture and implementation details of an open source engine for executing microtaskbased crowdsourcing annotation stages. The engine controls the entire crowdsourcing process including such elements as task allocation, worker ranking, answer aggregation, agreement assessment, and other means for quality control. The present version of the software is implemented as a three-tier system, which is composed of the application level for the enduser worker interface, the engine level for the Web service controlling the annotation process, and the database level for the data persistence. The RESTful API is used for interacting with the engine. The methods for controlling the annotation are implemented as processors that are initialized using the dependency injection mechanism for achieving the loose coupling principle. The functionality of the engine has been evaluated by both using unit tests and replication of a semantic similarity assessment experiment.


Introduction
Nowadays, crowdsourcing is a popular and a very practical approach for producing and analyzing data, solving complex problems that can be splitted into many simple D.A. Ustalov. A Crowdsourcing Engine for Mechanized Labor. Trudy ISP RAN /Proc. ISP RAS, vol. 27, issue 3, 2015, pp. 351-364 352 and verifiable tasks, etc. Amazon's MTurk 1 , a well known online labor marketplace, promotes crowdsourcing as the artificial artificial intelligence. In the mechanized labor genre of crowdsourcing, a requester submits a set of tasks that are solved by the crowd workers on the specialized platform. Usually, the workers receive micropayments for their performance; hence, it is of high interest to reach the happy medium between the cost and the quality. The work, as described in this paper, presents an engine for controlling a crowdsourcing process. The rest of this paper is organized as follows. Section 2 reviews the related work. Section 3 defines the problem of lacking the control software for crowdsourcing. Section 4 presents a two-layer approach for crowdsourcing applications separating the engine from the end-user application. Section 5 describes the implementation of such an engine. Section 6 briefly evaluates the present system. Section 7 concludes with final remarks and directions for the future work.

Related Work
There are several approaches for controlling the entire crowdsourcing process. Whitehill et al. proposed the GLAD 2 model that, for the first time, connects such variables as task difficulty, worker experience and answer reliability for image annotation [1]. Bernstein et al. created the Soylent word processor, which automatically submits text formatting and rewriting tasks to the crowd on MTurk [2]. The paper also introduces the Find-Fix-Verify workflow, which had highly influenced many other researchers in this field of study. Demartini, Difallah & Cudré-Mauroux developed ZenCrowd, another popular approach for controlling crowdsourcing, which was originally designed for mapping the natural language entities to the Linked Open Data [3]. ZenCrowd is based on the EM-algorithm and deploys the tasks to MTurk. The idea of providing an integrated framework for a crowdsourcing process is not novel and has been addressed by many authors both in academia and the industry, e.g. WebAnno [4], OpenCorpora [5] and Yet Another RussNet [6]. However, the mentioned products are problem-specific and using them for crowdsourcing different tasks may be non-trivial. Moreover, that software do often force the only possible approach for controlling the process of crowdsourcing, which in some cases may result in suboptimal performance.

Task Allocation
Lee, Park & Park created a dynamic programming method for task allocation among workers showing that consideration of worker's expertise increases the output quality [7].
Yuen, King & Leung used probabilistic matrix factorization to allocate tasks in the similar manner that recommender systems do [8]. Karger, Oh & Shah proposed a budget-optimal task allocation algorithm inspired by belief propagation and low-rank matrix approximation being suitable for inferring correct answers from those submitted by the workers [9].

Worker Ranking
Welinder & Perona presented an online algorithm for estimating annotator parameters that requires expert annotations to assess the performance of the workers [10]. Difallah, Demartini & Cudré-Mauroux used social network profiles for determining the worker interests and preferences in order to personalize task allocation [11]. Daltayanni, de Alfaro & Papadimitriou developed the WorkerRank algorithm for estimating the probability of getting a job on the oDesk online labor marketplace utilizing employer implicit judgements [12].

Answer Aggregation
The answers are often aggregated with majority voting, which is highly efficient for small number of annotators per question [9]. Some works use a fixed number of answers to aggregate [5]. Sheshadri & Lease released SQUARE 3 , a Java library containing implementations of various consensus methods for crowdsourcing [13], i.e. such methods as ZenCrowd [3], majority voting, etc. Meyer et al. developed DKPro Statistics 4 implementing various popular statistical agreement, correlation and significance analysis methods that can be internally used in answer aggregation methods [14].

Cost Optimization
Satzger et al. presented an auction-based approach for crowdsourcing allowing workers to place bids on relevant tasks and receive payments for their completion [15]. Gao & Parameswaran proposed algorithms to set and vary task completion rewards over time in order to meet the budget constraints using Markov decision processes [16]. Tran-Thanh et al. developed the Budgeteer algorithm for crowdsourcing complex workflows under budget constraints that involves inter-dependent micro-tasks [17].

Related Work
Hosseini et al. defines the four pillars of crowdsourcing making it possible to represent the crowdsourcing system C as the following quadruple [18]: C = (W, R, T, P).
(1) Here, W is the set of workers who benefit from their participation in the process C, R is the task requester who benefits from the crowd work deliverables, T is the set of human intelligence tasks provided by the requester R, and P is the crowdsourcing platform that connects these elements. Unfortunately, there is no open and customizable software for controlling C. This problem is highly topical since using MTurk, the largest crowdsourcing platform, is not possible outside the U.S. making it interesting to develop an independent substitution that can be hosted.

Approach
The reference model of a typical mechanized labor crowdsourcing process is present at Fig. 1 and consists of the following steps repeated until either convergence is achieved or the requester stops the process: 1. a worker requests a task from the system, 2. the system allocates a task for that worker, 3. the worker submits an answer for that task, 4. the system receives and aggregates the answer, 5. the system updates the worker and task parameters.

Use Case Diagram
Modern recommender systems like PredictionIO 5 and metric optimization tools like MOE 6 separate the application layer from the engine layer to simplify integration into the existent systems. In crowdsourcing, it is possible to separate the worker annotation interface (the application) and the crowdsourcing control system (the engine) for the same reason. The use case diagram present at Fig. 2 shows two actors-the requester and the application-interacting with the engine. The application works with the engine through the specialized programming interface (API) and the requester works with the engine using the specialized graphical user interface (GUI).

Sequence Diagram
The sequence diagram at Fig. 3 shows the interaction between those elements: a worker uses the end-user application that is connected to the engine that actually controls the process and provides the application with the appropriate data.

Implementation
The proposed system is implemented in the Java programming language as a RESTful Web Service using such APIs as JAX-RS 7 within the Dropwizard 8 framework. The primary data storage is PostgreSQL 9 , a popular open source objectrelational database.

Class Diagram
The class diagram at Fig. 4 represents the crowdsourcing system as according to the equation 1. The Process class defines a system C and specifies how its elements W, T and A should be processed by the corresponding implementations of these interfaces. Particularly, an actual processor inherits that abstract class and implements one or many of the following interfaces: WorkerRanker, TaskAllocator, AnswerAggregator. The reason for that is the dependency uncertainty of each particular processor implementation that has been approached by the dependency injection mechanism 10 .

Fig. 4. UML Class Diagram
For example, an implementation of the majority voting technique, which is a popular approach for answer aggregation, should inherit the AnswerAggregator interface and provide the implementation of the aggregate method that returns an AnswerAggregation instance representing the aggregated answer for the given Task instance. In order to access the answers stored in the database, the corresponding data access object-AnswerDAO-should be injected. Since that the answers cannot be fetched without the correct process identifier, the corresponding Process instance should be injected, too. Direct injection of Process to AnswerAggregator and vice versa causes a circular dependency. The cycle has been successfully broken by injecting a lazily initialized Process provider instead of its actual instance. On startup, the application configures itself with the provided configuration files, setting up the top-level Guice 11 dependency injector. After establishing a database connection, a database-aware child injector has been created, because it is not possible to achieve during the framework bootstrapping stage. Then, for each defined process, the application initializes a child injector containing processspecific bindings, and that injector is inherited from the database-aware one. Finally, the application exposes these processes by the RESTful API.

Package Diagram
The system is composed of several packages responsible for its functionality. Since that the Dropwizard framework is used, the most of boilerplate code is already included in the framework. However, such a sophisticated initialization requires additional middleware resulting in the package hierarchy represented at Fig. 5 detailed in Table 1.  mtsar.processors Actual implementations of the methods for controlling workers, tasks, answers. mtsar.resources Resources exposed by the RESTful API.

mtsar.views
View models used by the GUI.

Evaluation
The system functionality is tested using JUnit 12 . At the present moment, only classes contained in the mtsar.processors and mtsar.resources packages are provided with the appropriate unit tests. The continuous integration practice is followed by triggering a build on Travis CI 13 for each change to ensure that all the unit tests have been successfully passed. In order to make sure the system works, the RUSSE 14 crowdsourced dataset has been used (see [19] for details). The russe process has been configured to use the zero worker ranker that simply ranks any worker with zero rank, inverse count task allocator that allocates the task with the lowest number of available answers, and the majority voting answer aggregator (Fig. 6). Then, the workers, tasks and answers stored in this dataset have been submitted into the system via the RESTful API and the conducted experiment showed that no data have been lost during this activity and the engine does allocate tasks and aggregate answers correctly w.r.t. the chosen processors.