Data Curation Example

In this example, you create a project to classify cars based on their size and curate other car attributes.

Project Overview

You want to create a project to classify cars based on their size and curate other car attributes.

To create this project, you must perform the following tasks:

  1. List out your project requirements.

  2. Identify

    1. sample input data that you can use for labeling.

    2. sample taxonomic data that you can use for classification.

    3. sample lookup data that you can use for curation.

  3. Configure project metadata.

  4. Manage your project input and output fields.

  5. Create the workflow that you want to implement in your project.

  6. Add users to your project and assign them project roles.

  7. Add a dataset to the project.

  8. Start labeling input text.

  9. View batch status reports.

This document explains how you can perform each of the tasks listed above. Specific sections in this document also contain sample data that you can use to easily create and implement this project in Taskmonk.

Listing Project Requirements

In this project, you want to:

  • Classify cars based on their size.

  • Curate car attributes.

You can download the sample files used for this project below.

Sample Input Data

For the purposes of this example, we shall use open source images of various cars available on Wikipedia.

  1. You select details of cars you want to use as input for this project. You capture this data under the following heads:

    • Car, the brand name of the car.

    • Image, link to the image of the car.

    • Size, the size classificaiton of the car.

    • Color, the color of the car exterior.

    • Doors, the number of doors in the car

    • SeatType, the type of seats installed in the car.

  2. You save the Microsoft Excel sheet as Curation_Input.xlsx.

     

     

  3. You create a spreadsheet to capture the size details of the car as a classification scheme. To know more about the taxonomy file, see Understanding Taxonomy Files. You save the Microsoft Excel sheet as Taxonomy_Car_Size.xlsx.

     

     

  4. You create a spreadsheet to capture the various attributes associated with cars, and their possible attribute values. To know more about the curation file, see Understanding Curation Files. You save the Microsoft Excel sheet as Curation_Cars.xlsx on your hard drive.

     

     

Download Source File

You can follow the steps listed above to create your Microsoft Excel sheets; you can also download and use these files in your project:

Each downloadable file is available as a ZIP file. To use it, download the file and unzip its contents.

Configuring Project Metadata

Project Metadata is the first tab that appears when you create a project. The Project Metadata tab enables you to provide basic information, such as the name, process, and project type, associated with your project. You can also upload any documentation that you may want to add to your project.

  1. To create the project, click the Create Project floating button on the left side of the Projects page.


    The Project Metadata tab associated with your new project appears.

     

     

  2. Enter Data Curation - Vehicle as the Project Name and Taxonomy and Curation as the Process.

  3. Click Project Type > Text-Based.

  4. Select Lookup > Curation.

  5. Enable Project Pipeline is False by default, do not change this for the current project.

  6. Click Next.
    The Documents sub-tab appears.

  7. You can upload documents associated with the project if required. This is an optional step, and you can skip it for now.
    Click Next. The Project Lookup Files sub-tab appears.

     

     

  8. Click Choose File, navigate to and select the sample taxonomy file.

  9. Click Submit to upload the file into the system.

  10. Click Next. The Curation sub-tab appears.

     

     

  11. Click the Choose File button associated with the Upload File button, navigate to and select the sample curation file, Curation_Cars.xlsx.

  12. Click Upload File to upload the file into the system.

  13. Click Next.

The Task Design tab appears. Use this tab to manage your project input and output fields.

Managing Project Input and Output Fields

Taskmonk uses the project type that you specify to add input and/or output fields to projects as required. You can modify these later. In this instance, you selected Text-Based as the Project Type, and Taskmonk does not add any fields to the Task Design sub-tabs. You can create input fields by providing the input file format and importing the column headers from the same.

Creating Input Field Details

  1. Click Browse Input File, navigate to the sample input file and upload it into the system. The following input fields get added:

    • Field Name: car. Field Type: Text

    • Field Name: image. Field Type: Text

    • Field Name: color. Field Type: Text

    • Field Name: size. Field Type: Text

    • Field Name: door. Field Type: Text

    • Field Name: seatType. Field Type: Text

  2. Change the Field Type for image to Image.

     

Creating Output Field Details

  1. Click the Output Field tab to display the Output Field UI.

     

     

  2. Click Create Output Field. The Create Output Field modal appears.

     

     

  3. Enter Taxonomy_Name as the Name.

  4. Select Taxonomy as the Set Data Type.

  5. Select the All Levels check box to indicate that this field must be available to users at all execution levels, such as labeling, quality analysis, and so on.

  6. Click Create to save the new output field and return to the Task Design > Output Fields tab.

  7. Click Create Output Field again to create the next output field. The Create Output Field modal appears.

  8. Enter Taxonomy_Path as the Name.

  9. Select Taxonomy as the Set Data Type.

  10. Select the All Levels check box to indicate that this field must be available to users at all execution levels, such as labeling, quality analysis, and so on.

  11. Click Create to save the new output field and return to the Task Design > Output Fields tab.

  12. Click Create Output Field again to create the next output field. The Create Output Field modal appears.

  13. Enter Vehicle_Type as the Name.

  14. Select Curation_PT as the Set Data Type.

  15. Enter Car, Truck as the Possible Values.

  16. Select the All Levels check box to indicate that this field must be available to users at all execution levels, such as labeling, quality analysis, and so on.

  17. Click Create to save the new output field and return to the Task Design > Output Fields tab.

  18. Click Create Output Field again to create the next output field. The Create Output Field modal appears.

  19. Enter New as the Name.

  20. Select Curation as the Set Data Type.

  21. Select Attribute Values as Field Type.

  22. Select the All Levels check box to indicate that this field must be available to users at all execution levels, such as labeling, quality analysis, and so on.

  23. Click Create to save the new output field and return to the Task Design > Output Fields tab.

  24. Click Next twice to move to the next step, Quality Workflows.

Creating Quality Workflows

The Quality Workflow tab enables you to specify how you want to ensure output quality. It also helps you create the execution levels required for your quality workflows. For example, you want to create the following levels for this project:

  • Analyst

  • QA

  • Delivery Lead

  1. In this instance, you want to enable a Maker-Editor workflow. Click the Execution Method field and select Maker-Editor from the drop-down list that appears.

  2. By default, Taskmonk creates the Analyst role for you. Do not make any changes to this.

  3. Click the Add Execution Level button on the right side of the page. The Add Execution Level modal appears.

     

     

  4. Enter QA in the Execution Level Name field.

  5. You can skip all other fields here. Click Add.
    Taskmonk adds the new execution level, closes the modal, and displays the updated Quality Workflow tab. Repeat the process to add all required execution levels.

     

     

  6. Click Next to move to the next step, Process Logic.

Creating Process Logic

Quality Workflow > Process Logic tab allows you to decide the logic based on which datasets are moved from one execution level to the next. For the current project, you want to enable the following rules:

  • 85% of the project datasets move from level 1 (Analyst) to level 2 (QA).

  • Datasets where Vehicle_Type is Car move from level 2 (QA) to level 3 (Delivery Head).

To do so:

  1. Click the Add Process Logic button on the right-hand side of the page. Add process logic page appears.

     

     

  2. Click the From Level dropdown and select L1 [Analyst].

  3. Click the To Level dropdown and select L2 [QA].

  4. Click the Add Rule icon to load the add rule UI.

  5. Click the Add New Rule icon. The Rule Type dropdown appears.

  6. Click the Rule Type dropdown and select PercentageRule.

  7. Click the Percentage dropdown and select 85%. Percentage Scope is Project by default. Do not change this.

  8. Click Submit to save the rule.

     

     

  9. Click Back to return to the Quality Workflow tab.

  10. Click the Add Process Logic button on the Process Logic subtab to add the next rule.

  11. Click the From Level dropdown and select L2 [QA].

  12. Click the To Level dropdown and select L3 [Delivery Head].

  13. Click the Add Rule icon to load the add rule UI.

  14. Click the Add New Rule icon. The Rule Type dropdown appears.

  15. Click the Rule Type dropdown and select ResultValueRule.

  16. Click the Field Name dropdown and select L1_Vehicle Type.

  17. Click the Operator dropdown and select Equals. Set Match Value as Car

  18. Click Submit to save the rule.

     

     

  19. Click Next to move to the next step, Managing Users and Roles.

Managing Users and Roles

You must now add users to your project and assign the execution levels you just created to them.

  1. Click the Users tab just above the Quality Workflow tab.
    The Users > Manage Users tab appears.

  2. Click the Add button in the top-right section of the tab.
    The Select Users modal appears.

  3. Corresponding to each execution level, click the Select Users field and select the desired user from the drop-down list that appears.

     

     

  4. Click Add to add the selected users to the project.

  5. Close the modal.
    The Manage Users tab reloads to display the updated user details.

Managing Project Datasets

Your project is now configured. Congratulations!

Before you can start labeling, you must upload the input files containing the raw data to be labeled.

  1. Click the Datasets tab. The Datasets page appears. Use this page to manage datasets for your project.

  2. Taskmonk organizes datasets into batches to simplify management and tracking. To add a new dataset, click Add Batch. The Add Batch modal appears.

  3. Enter Batch 1 as the name for the batch that you want to import in the Add New Batch field. You can ignore the other fields.

     

     

  4. Click Submit. This creates a new batch of data for your project and adds it to the Pending tab of the Datasets page. You can now upload datasets into the batch, as required.

  5. To add a dataset to the batch, click the Import button under the Tasks(Import/Export) column. The Import Task modal appears.

  6. Click Choose Files, select the sample input file (Curation_Input.xlsx) from your computer and click Import.

  7. Once the dataset is imported, click Close to exit the modal.

     

 

Labeling Text Using Taskmonk

Your project is now ready for work.

  1. Log in as an Analyst and click the My Tasks icon at the top of the page. The My Tasks page appears.

     

     

  2. Click the Get Tasks button adjacent to the Data Curation - Vehicle project. The labeling UI associated with this project appears.

     


    You can see the following project details in the labeling UI:

    • Batch Name (Batch 1) in the top-left section of the page.

    • Input fields (Image) on the top half of the page.

    • Output fields, Taxonomy fields, Curation fields on the bottom half of the page.

      For detailed information on working with a typical labeling UI, see Labeling Data.

Viewing Batch Status Report

  1. Go to the Projects page and click Reports > View for the Data Curation - Vehicle project. The Reports page appears.

  2. Click on Dataset Progress Reports to view the batch status report. This shows the total number of tasks pending and completed at each level for all batches.

     

     

Downloadable Sample Files

  File Modified

ZIP Archive Curation_Input.xlsx.zip

Jun 26, 2021 by Kumar Luv

ZIP Archive Curation_Cars.xlsx.zip

Jun 26, 2021 by Kumar Luv

ZIP Archive Taxonomy_Car_Size.xlsx.zip

Jun 26, 2021 by Kumar Luv

© 2020 Taskmonk Technology Pvt. Ltd. All Rights Reserved .