NER Annotation Example
In this example, you create an NER annotation project to identify various entities in a resume.
Project Overview
You want to create a project that enables you to identify various entities and the relationship between them in a resume.
To create this project, you must perform the following tasks:
This document explains how you can perform each of the tasks listed above. Specific sections in this document also contain sample data that you can use to easily create and implement this project in Taskmonk.
Listing Project Requirements
In this project, you want to:
Identify various entities in a resume.
Identify the relationship between the entities as required.
Sample Input Data
You select sample resumes that you wish to annotate. You paste the text of the resume into a Microsoft Excel sheet under a column labeled Resume. You save the Microsoft Excel sheet as Resume_Summarization_Input.xlsx.
Download Source File
You can follow the steps listed above to create your Microsoft Excel sheet; you can also download and use this file in your project: Resume_Summarization_Input.xlsx.zip.
Each downloadable file is available as a ZIP file. To use it, download the file and unzip its contents.
Configuring Project Metadata
Project Metadata is the first tab that appears when you create a project. The Project Metadata tab enables you to provide basic information, such as the name, process, and project type, associated with your project. You can also upload any documentation that you may want to add to your project.
For detailed information on working with the Basic Info section of the Project Metadata tab, see Add Basic Project Information.
For detailed information on working with the Basic Info section of the Project Metadata tab, see Manage Project Documentation.
To create the project, click the Create Project floating button on the left side of the Projects page.
The Project Metadata tab associated with your new project appears.Enter Resume Summarization as the Project Name and NER as the Process.
Click Project Type > Annotation.
Click Next.
The Documents sub-tab appears.You can upload documents associated with the project if required. This is an optional step, and you can skip it for now.
Click Next.
The Task Design tab appears. Use this tab to manage your project input and output fields.
Managing Project Input and Output Fields
Project input and output fields are key elements that determine what happens in your project. The input fields that you specify here will appear as available options for input in your project. Similarly, the output fields that you configure here will appear as output options in your project execution UI. In other words, your project can only uptake and output data associated with the input and output fields that you create here.
Taskmonk uses the project type that you specify to add input and/or output fields to projects as required. You can modify these later. In this instance, you selected Annotation as the Project Type, and Taskmonk adds the following fields to the Task Design tab:
Input Field
Field Name: MediaUrl, Field Type: Image
Output Field
Field Name: Annotations, Field Type: Annotation, Mandatory: False, Disabled: False, Customer Visible: True
Field Name: Classes, Field Type: Class, Mandatory: False, Disabled: False, Customer Visible: True
Updating Input Field Details
Click the Input Field tab to display the Input Field UI.
By default, Taskmonk sets the field type to Image. To edit this:
Click the Edit Field Name icon next to MediaUrl. Name textbox appears.
Enter Resume as Name.
Click the Update Field Name icon to save the changes.
Click the Field Type dropdown and set it to Document.
Creating Output Field Details
Click the Output Field tab to display the Output Field UI.
To the Classes field, you wish to add the following possible values:
Name
College Name
Organization
Designation
Location
E-mail Address
Skills
To do so, click the Possible Values > edit icon. The Manage Possible Values page appears.
Enter Name as the Class.
Select Color as required.
Click Save and Add Another to add the value.
Enter College Name as the Field Name.
Select Color as required.
Click Add Attributes.
Enter Tier as the Field Name.
Click Field Type and set it as Dropdown.
Enter 1, 2, 3 as Possible Values.
Set 1 as Default Value.
Click Save and Add Another to add the value.
Repeat the above steps for all the classes.
Click Save & Close to save changes to the field and return to the Output Field tab.
Click Next to move to the next step, Quality Workflows.
Creating Quality Workflows
The Quality Workflow tab enables you to specify how you want to ensure output quality. It also helps you create the execution levels required for your quality workflows. For example, you want to create the following levels for this project:
Annotator.
Quality Analyst.
Delivery Lead.
In this instance, you want to enable a Maker-Checker workflow. Click the Execution Method field and select Maker-Checker from the drop-down list that appears.
By default, Taskmonk creates the Analyst role for you. Click the Update Execution Level icon. The Edit Execution Level modal appears.
Enter Annotator as the Execution Level Name.
You can skip all other fields here. Click Update to return to the Quality Workflow tab.
Click the Add Execution Level button on the right side of the page. The Add Execution Level modal appears.
Enter Quality Analyst in the Execution Level Name field.
You can skip all other fields here. Click Add.
Taskmonk adds the new execution level, closes the modal, and displays the updated Quality Workflow tab. Repeat the process to add all required execution levels.Click Next to move to the next step, Process Logic.
Creating Process Logic
Quality Workflow > Process Logic tab allows you to decide the logic based on which datasets are moved from one execution level to the next. For the current project, you want to enable the following rules:
80% of the project datasets move from level 1 to level 2.
65% of the project datasets move from level 2 to level 3.
To do so:
Click Add Process Logic button on the right-hand side of the page. Add process logic page appears.
Click the From Level dropdown and select L1 [Annotator].
Click the To Level dropdown and select L2 [Quality Analyst].
Click the Add Rule icon to load the add rule UI.
Click the Add New Rule icon. The Rule Type dropdown appears.
Click the Rule Type dropdown and select PercentageRule.
Click the Percentage dropdown and select 80%.
Click Submit to save the rule.
Repeat the above step to add rules for each level change.
Navigate to Advanced > Project > Project Settings.
Expand the Annotation Settings panel.
Ensure that NER (Enabled) is set to True.
Set NER (Autocomplete Selection) to True to ensure that only complete words are selected for annotations. This is set to True by default.
Enter at, from, to in NER Relations. NER Relations capture the relationship between various named entities.
For example, consider the following sentence: John Doe is a Bank Teller at A&G Bank. Here named entities 'Bank Teller' and 'A&G Bank' have a relationship denoted by 'at'.Click Next to save the changes.
Managing Users and Role
You must now add users to your project and assign the execution levels you just created to them.
Click the Users tab just above the Quality Workflow tab. The Users > Manage Users tab appears.
Click the Add button in the top-right section of the tab. The Select Users panel appears.
Enter the user names as required. Click Add to add users to the project.Close the panel to return to the Manage Users tab.
The Manage Users tab reloads to display the updated user details.
Managing Project Datasets
Your project is now configured. Congratulations!
Before you can start transcription and labeling, you must upload the input data containing the audio to be labeled.
Click the Datasets tab. The Datasets page appears. Use this page to manage datasets for your project.
Taskmonk organizes datasets into batches to simplify management and tracking. To add a new dataset, click Add Batch on the right side of the page. The Add Batch modal appears.
Enter Batch 1 as the name for the batch that you want to import in the Add New Batch field. You can ignore the other fields.
Click Submit. This creates a new batch of data for your project and adds it to the Pending tab of the Datasets page. You can now upload datasets into the batch, as required.
To add a dataset to the batch, click the Import button under the Tasks(Import/Export) column. The Import Task modal appears.
Click Choose Files, select the sample input file from your computer and click Import.
Once the dataset is imported, click Close to exit the modal.
Annotating NER Using Taskmonk
Your project is now ready for work.
Log in as a Transcription Analyst and click the My Tasks icon at the top of the page. The Tasks page appears.
Click the Get Tasks button adjacent to the Resume Summarization project. The labeling UI associated with this project appears.
You can see the following project details in the labeling UI:
Batch Name (Batch 1) in the top-left section of the page.
The text to be annotated.
Annotation labels and classes to the right of the text. Attributes associated with a class are displayed at the bottom-right of the page.
For detailed information on working with a typical labeling UI, see Working with the NER-Annotation UI.
Viewing Batch Status Report
Go to the Projects page and click Reports > View for the Resume Summarization project. The Reports page appears.
Click on Dataset Progress Reports to view the batch status report. This shows the total number of tasks pending and completed at each level for all batches.
Downloadable Sample Files
© 2020 Taskmonk Technology Pvt. Ltd. All Rights Reserved .