Extract images from PDF import for Digitization

You must be aware of the basic steps to setup a project.

The feature allows importing PDF files as task files. This is used for Digitisation projects. A pre-processor is configured to extract images from the PDF file, these images will be displayed as a task in the labelling UI associated with the project.

Steps:

  1. Navigate to Project Settings > Task Design > Input Field. Click on CREATE INPUT FIELD button. Add an input field of type ‘Text’. Add another field ‘annotation’ of type Text to store annotations. ImageUrl is a default field of type ‘Image’.

 

Now that the required fields are created, follow the second step to add the pre-processors.

2. Navigate to Advanced settings > Project Task > Task Properties> Task Pre-processing. Click on ADD, select Optical Character Recognition , click on Next. Click on ‘Convert PDF to images’.

Configure the fields as below and click on ADD.

Add another pre-processor for OCR.

Click on ADD, select Optical Character Recognition, click on Next. Click on ‘Optical Character Recognition’.

Configure the fields as below and click on ADD.

 

3. Importing the PDF file

Navigate to Datasets, select/create the batch for importing tasks. Click on Import. Select ‘Media’ option, upload the PDF file.

Once the above mentioned steps are followed, log in as an Analyst and click the My Tasks icon at the top of the page. The Tasks page appears. The images in the PDF file are displayed to the left of the task page.

 

To know more about Project setup click on the links below.

Creating Projects

Working with the Digitization UI

 

 

© 2020 Taskmonk Technology Pvt. Ltd. All Rights Reserved .