Crowd-Voc is a vocabulary for describing data that has been produced through crowdsourcing. Crowdsourcing is understood broadly, as any type of collective activity to solve a problem or achieve a goal. Data refers to the contributions of the crowd in their consolidated form, which are released in form of a 'dataset' which can be used in other contexts. The first goal of the vocabulary is to allow crowdsourcing projects to describe what the data is about and how it came about to facilitate reuse and sharing across projects. The vocabulary should also help repeat and reproduce research that involves crowdsourcing by capturing the key design parameters of a crowdsourcing task and the way crowd answers have been validated, aggregated and otherwise processed before their release as a dataset. Finally, the vocabulary sets the ground for acknowledging contributions by the crowd, towards a fairer model for licensing crowdsourced data.

This is version 0.1

Introduction

Namespace declarations

Table 1: Namespaces used in the document
crowd-voc<http://qrowd-project.eu/def/crowd-voc#>
owl<http://www.w3.org/2002/07/owl#>
rdfs<http://www.w3.org/2000/01/rdf-schema#>
xsd<http://www.w3.org/2001/XMLSchema#>
dcterms<http://purl.org/dc/terms/#>
org<http://www.w3.org/ns/org#>
s<http://schema.org/>
skos<http://www.w3.org/2004/02/skos/core#>
foaf<http://xmlns.com/foaf/0.1/>
ex<http://example.com/>

Overview

The core of the vocabulary is shown in the figure below. We reuse as much as possible existing vocabularies. The main entity is Task, a subclass of prov:Activity that describes the crowdsourcing task, ultimately generating a dcat:Dataset. An instance of Task is linked to a Crowd, which is a subclass of prov:Agent. Crowds consists of instances of Crowdworker, a type of prov:Person. A crowdworker may be part of many crowds, assembled to run different tasks. To establish the link between crowds and crowdworkers, we reuse the Organization ontology, making Crowd an instance of org:Organization, allowing the usage of the org:hasMember property. Crowdworkers are identified by ids, depending on the platform where the crowdsourcing task is being run. The aim is not to store personal data about contributors, which is not accessible when running a crowdsourcing task, but to be able to distinguish between multiple contributors - this is needed when validating and aggregating the data submitted by the crowd. The crowd of a task refers to the set of contributors to the task rather than all users registered on the crowdsourcing platform.

A task is comprised by an rdf:list of Questions that models the order on which questions are asked to crowdworkers. Each question has a hasText property, and a rdf:seeAlso that links to a more complex representation of the question, if available.

contract classification
Example of Task
    ex:aTask
        a prov:Activity;
        a crowd-voc:Task;
        dcterms:description "My crowdsourcing task";
        prov:wasAssociatedWith ex:Crowd;
        crowd-voc:hasQuestionList (ex:question1,ex:question2) .

    ex:Crowd
    	a org:Organization;
        org:hasMember ex:a_crowdworker

    ex:a_crowdworker
    	a prov:Person;


    ex:question1
    	a crowd-voc:Question;
        crowd-voc:hasText "Identify cars in images".
    ex:question2
    	a crowd-voc:Question;
        crowd-voc:hasText "Now identify cars at traffic lights in images".

    ex:aCrowd
    	a crowd-voc:Crowd;
        a prov:Agent;
        vocab-org:hasMember ex:Crowdworker.

    ex:Crowdworker
    	a prov:Person.

    ex:generatedDataset 
    	a dcat:dataset;
        dcterms:title "Dataset generated by the task";
        prov:wasGeneratedBy ex:aTask .

 	

A task is a collection of different activities that ultimately results in a dataset. It can consist of subtasks of the same or different types. Our aim is to describe the design of the crowdsourcing effort, including standard classes of tasks, which are approached in a particular way in crowdsourcing (e.g., classification, sentiment analysis, transcription, review etc.) and activities the crowd was asked to carry out (e.g. answer questions, check a website, translate from one language to another etc.). The reader is provided with a broad overview of the workflow used, rather than a fully specified data or control flow, as in a workflow language. Each task (and all its subtasks) is linked to an input dataset (e.g. a collection of images) and an output dataset, which the crowd has helped produce. The output dataset is a consolidated version of the answers provided by the crowd, including validation, aggregation etc. A task can contain several questions, which refer to the exact wording used to seek contributions (e.g. "Identify all people, places, and organisations in the following paragraph").

The below figure describes the subdivision of a task in subtasks, to model more complex situations on which different tasks (possibly with different crowds) are applied. The subtask list is modeled as a rdf:list of Tasks. Each subtask takes as input the dataset generated by the previous task.

Task and subtask
Task division in subtasks
    ex:aTask
        a prov:Activity;
        dcterms:description "My crowdsourcing task";
        prov:wasAssociatedWith ex:aCrowd;
        prov:used ex:InputDataset;
        prov:generated ex:OutputDataset;
        crowd-voc:hasSubTasks (ex:question1,ex:question2) .

    ex:subTask1
    	a crowd-voc:Task;
        dcterms:description "Brexit tweets";
        a crowd-voc:ClassificationTask .
        crowd-voc:hasAggregation ex:s1_aggregation
        crowd-voc:hasQCMechanism ex:s1_qc
        crowd-voc:hasQuestion ex:s1_q1


    ex:subTask2
    	a crowd-voc:Task;
        dcterms:description "Sentiment analysis on tweets";
        a crowd-voc:SentimentAnalysisTask .
        crowd-voc:hasAggregation ex:s2_aggregation
        crowd-voc:hasQCMechanism ex:s2_qc
        crowd-voc:hasQuestion ex:s2_q1

    ex:s1_aggregation
    	a crowd-voc:AggregationMethod
	
    ex:s2_aggregation
    	a crowd-voc:AggregationMethod

    ex:s1_qc
    	a crowd-voc:QCMechanism
	
    ex:s2_qc
    	a crowd-voc:QCMechanism
	
    ex:s1_q1
    	a crowd-voc:Question;
    	crowd_voc:question_text "Does the tweet mention Brexit?"

    ex:s2_q1
    	a crowd-voc:Question;
    	crowd_voc:question_text "Is this a happy tweet?"


 	

Vocabulary description

Classes

  • Task
  • ContentVerificationTask
  • ClassificationTask
  • MetadataFindingTask
  • MediaTranscriptionTask
  • RankingTask
  • ModerationTask
  • FeedbackTask
  • SentimentAnalysisTask
  • ContentCreationTask
  • Question
  • QCMechanism
  • AggregationMethod

Task

c back to ToC or Class ToC

IRI: TaskIRI

A crowdsourcing task

characteristics (is disjoint, in range of, has subclasses, etc)
class or property name c

ContentVerificationTask

c back to ToC or Class ToC

IRI: TaskIRI

A task where the crowdworkers are asked to check predefined properties of an item by carrying out specific checks or consulting external sources

characteristics (is disjoint, in range of, has subclasses, etc)
class or property name c

ClassificationTask

c back to ToC or Class ToC

IRI: TaskIRI

A generic classification task on text or other media

characteristics (is disjoint, in range of, hassubclasses, etc)
class or property name c

MetadataFindingTask

c back to ToC or Class ToC

IRI: TaskIRI

A task of finding a set of predefined characteristics of an item, for instance the author or year of a book

characteristics (is disjoint, in range of, hassubclasses, etc)
class or property name c

MediaTranscriptionTask

c back to ToC or Class ToC

IRI: TaskIRI

A generic task of transcription for different types of media

characteristics (is disjoint, in range of, has subclasses, etc)
class or property name c

RankingTask

c back to ToC or Class ToC

IRI: TaskIRI

A task ranking different items

characteristics (is disjoint, in range of, has subclasses, etc)
class or property name c

ModerationTask

c back to ToC or Class ToC

IRI: TaskIRI

A task moderating content from other tasks

characteristics (is disjoint, in range of, has subclasses, etc)
class or property name c

ContentCreationTask

c back to ToC or Class ToC

IRI: TaskIRI

A task producing new content, for instance free labels or descriptions for items, shortened versions of text, media etc.

characteristics (is disjoint, in range of, has subclasses, etc)
class or property name c

SentimentAnalysisTask

c back to ToC or Class ToC

IRI: TaskIRI

A sentiment analysis task on text or other media

characteristics (is disjoint, in range of, has subclasses, etc)
class or property name c

SubTask

c back to ToC or Class ToC

IRI: TaskIRI

A SubTask that is part of CrowdSourcing Task, for example, an image labelling task might consists of one subTask about identifying a certain entity in the image, and another about cropping the image to center the

characteristics (is disjoint, in range of, has subclasses, etc)
class or property name c

Question

c back to ToC or Class ToC

IRI: QuestionIRI

A question that is shown to the Crowdworker

characteristics (is disjoint, in range of, hassubclasses, etc)
class or property name c

Quality control mechanism

c back to ToC or Class ToC

IRI: QualityControlIRI

Quality control mechanism applied to a task: verifiable/control question, pre-screening/test question, honeypot(sample-based filtering), agreement-based/consensus, none

characteristics (is disjoint, in range of, hassubclasses, etc)
class or property name c

Properties

Reused properties

  • dcterms:hasPart
  • prov:used
  • prov:generated
  • prov:wasAssociatedWith
  • vocab-org:hasMember

Own properties

  • hasAggregationMethod
  • hasQCMechanism
  • hasQuestionList
  • hasSubTasks
  • hasIncentive
  • hasMotivation

hasAggregationMethod dp

IRI:

has characteristics: functional

has domain
Task c
has range
AggregationMethod c

hasQCMechanism dp

IRI:

has characteristics: functional

has domain
Task c
has range
AggregationMethod c

hasQuestion dp

A subTask has a question IRI:

has characteristics: functional

has domain
Task c
has range
rdf:list of Question c

hasSubTasks dp

List of subtasks of a task IRI:

has characteristics: functional

has domain
Task c
has range
rdf:list of Task c

hasMotivation dp

Motivation of the task. Possible values are: monetary rewards, fun, personal achievement, social belonging, social status, altruism. IRI:

has characteristics: functional

has domain
Task c
has range
Literal c

hasWorkflow dp

Workflow of the task: Find-Fix-Verify, Iterative Improvement, Crowd-guided workflow, Threshold-based workflow, Sequential interdependent, Parallel IRI:

has characteristics: functional

has domain
SubTask c
has range
Literal c

Acknowledgements

Crowd-Voc was set up by the QROWD project (http://qrowd-project.eu/), which received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 723088. We would also like to thank the participants of the project networking workshop help at HCOMP2018 in Zurich, Switzerland (https://hcompnetworking.wordpress.com/).