Managing Project Datasets

Manage datasets containing the project input and output data.

Use the Datasets tab to manage data that you feed into your project.

You must have the Create/Edit Batches privilege to be able to work with this feature.

The Datasets page organizes the datasets that you added to the project into the following tabs. Click on each for more information.

  • Active: Datasets on which work is currently progressing.

  • Pending: Datasets that have been imported, but on which work is yet to start.

  • Completed: Datasets on which work is completed.

  • Archived: Datasets on which work is completed and which have been archived for long-term storage. You can archive a pending batch, but you cannot archive an active batch. To archive active batches, you must first cancel the active batch and then archive it.

  • Paused: Datasets on which work started but was paused.

  • Cancelled: Datasets that were imported and then cancelled.

Adding Dataset Batches

Any newly-added batch automatically appears in the Pending tab.

To add a dataset to your project:

  1. Click the Add Batch button in the top-right corner of the Datasets tab. The Add Batch modal appears.

     

  2. Enter the name of the batch in the Batch Name field.

  3. Click to select the Priority of the batch (optional). The higher the priority value of a batch, the earlier it gets routed to labelers for processing.

  4. Enter a Message that you want to appear when labelers start work on this batch (optional).

  5. Enter the Email ID that must be notified when the tasks are imported and when work on the batch is completed (optional).

  6. Click the ETA field and select the date by which work on this batch must be completed (optional). This will be saved as the target date of this batch.

  7. Click to specify whether this batch is a Golden Batch (optional).

  8. Click Submit to add the dataset batch to the project.

Export Project Output

To export the output created for a project:

  1. Click the Ellipsis (three vertical dots) icon at the top-right corner of the Datasets table and select Export Project Output.

  2. The Export Project modal appears.

3. All is selected by default for Select Batch.

4. Select CSV from the list for Select format.

5. Select the field(s) you want to export for Select fields. By default, all the fields will be exported.

6. (Optional) For Select duration, set the Start Date and End Date to specify that you want to download only those outputs in the dataset on which work was done between the specified dates. By default, it would export for the entire lifetime of the project if nothing is selected.

7. Select the Completed tasks only checkbox to download the output associated with only the completed tasks in your project. If not selected, all the tasks will be exported.

8. Click Download to trigger the export. Once the export completes, the modal notifies you and offers a link to download the exported content.

9. Click Download File to download the exported output.

10. Click Close to return to the Dataset tab.

Configure Cloud Storage

To configure cloud storage options for your project:

  1. Click the Ellipsis icon in the top-right corner of the Datasets table and select Configure Cloud Storage. The Cloud Storage Configuration modal appears.


    The Cloud Storage Configuration modal enables you to configure the connection details that are required for you to connect with specific cloud service providers. Taskmonk enables you to store your projects with the following service providers:

    • Amazon S3

    • Microsoft Azure

    • Google Cloud

  2. Click to select the service provider that you want to use to upload your project data.

    Depending on your choice from these options, the fields displayed change.

Configuring Amazon S3 Cloud Storage in Taskmonk

  1. Enter the Bucket Name in which you want to store your project data.

  2. Enter the Region where you created the bucket. For details on the list of regions available in Amazon S3, see Amazon documentation.

  3. Enter the Access Key and Secret Key associated with your account.

  4. Click Submit to save your changes and return to the Datasets tab.

Configuring Microsoft Azure Cloud Storage in Taskmonk

  1. Enter the Account Name and Account Key provided by your service provider in the fields provided.

  2. Click Submit to save your changes and return to the Datasets tab.

Configuring Google Cloud Storage in Taskmonk

  1. Enter the Bucket Name and Auth String in the fields provided.

  2. Click Submit to save your changes and return to the Datasets tab.

Dataset Status Tabs

Paused Datasets

The Paused tab displays all the datasets for which work is paused and enables you to perform the following tasks:

Key Batch Tasks

 

© 2020 Taskmonk Technology Pvt. Ltd. All Rights Reserved .