This page details using of Content Delivery Network integrated with the various cloud platforms

Overview

Loading time of images depends on the geographical location of the cloud storage and where it is being accessed from. Having the content closer to the access point can significantly increase the load time.

This can be seen in the access times for the same file from US and from India buckets using S3. The access is done from India

From US bucket it takes about 4 seconds

wget https://tmusa.s3.us-west-2.amazonaws.com/sample.png
sample.png - 37.20M 8.60MB/s in 4.3s

From India bucket it takes about 1 second

wget https://tmindia.s3.ap-south-1.amazonaws.com/sample.png
sample.png - 37.20M 37.3MB/s in 1.0s

This different in load time becomes significant when the network bandwidth is low and could affect productivity.

There are a few possible solutions for this

Reduce the size of the images used for labelling. This will cause quality issues and is generally not acceptable
Increase the network bandwidth of the labellers to ensure faster loading time. This is expensive and not always possible
Move the media to a bucket closer to the location of the labellers. This is also not always possible since the buckets may belong to a customer and in some cases the labellers may be located across different geographies
Setup a CDN that will cache the images across different geographies. This does not change any of the target buckets but provides a layer in front of the buckets.

AWS provides a CDN layer called CloudFront which provides cached access to S3 buckets. This document details how to setup CloudFront and how to configure Taskmonk to use Cloudfront for serving images from S3

Setting up CloudFront

Setup Public Key. CloudFront uses a Key Pair for signing the URLs for safe access. The public key is uploaded to AWS while the private key is used by the client to generate the signed url.
1. The procedure to create the key pair is detailed at https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/private-content-trusted-signers.html#private-content-creating-cloudfront-key-pairs
2. Use the public key created to create a CloudFront key

c. Give a name and paste the public key in the screen below

d. Create a key group which can be associated with the Distribution

2. Go to Amazon AWS Console and Select CloudFront from Services. Click on Create Distribution

2. Click on Origin and select the S3 bucket for which CDN needs to be setup

3. Leave Origin Path blank. This is used to serve files from a particular directory.

4. Add a user friendly name to identify the CDN

5. Select “use OAI” and create a new OAI. Select “Yes updated the bucket policy”. This is needed in order to allow Cloudfront to access the S3 buckets. Without this, the S3 bucket would need public access. Automatically updating policy is recommended else the S3 access policy will need to be manually updated with the CloudFront access info

6. Set the following Cache Options

7. Disallow public access to the cloud front for security. Select the key group created in the first step

8. Setup header forwarding to prevent cors errors

9. Keep the rest of the settings to default and create the distribution

Setting up Taskmonk for CloudFront

The taskmonk project that has been setup with S3 needs to be configured to use the Cloudfront CDN. This is done through an automated processor that changes the image url for the task on allocation. This is done so that the task results and annotations can be associated with the original url and nothing changes in the import or export process.

Create an input field to save the original URL. Since the UI needs the url to load in a specific field, the automated processor will overwrite the image url field. The original URL will be saved to a different field which will then be written back to the original field on completion
Create the automated processor as shown below

Set the settings as shown below:

a. The field that was created in step 1 to save the input field

b. The distribution domain that was setup in CloudFront

c. The private key that was created and used to setup the distribution domain. Paste the contents of the private key without the “-----BEGIN RSA PRIVATE KEY-----” and the “-----END RSA PRIVATE KEY-----” lines.

d. The id of the key pair that was created when setting up the cloud front

3. Another automated processor needs to be set to copy the original URL back to the image url field.

The settings will select the temporary field as the source and the image url field as the destination.

Setting Up Content Delivery Network

Overview

Setting up CloudFront

Setting up Taskmonk for CloudFront