Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Excerpt

This page details using of Content Delivery Network integrated with the various cloud platforms

Overview

...

Learn how to set up the Amazon CloudFront to work with Taskmonk, and how to configure Taskmonk to use the service.

Table of Contents

Overview

While working with cloud-based applications, the loading time for the application content depends on the geographical separation between the cloud storage and the location at which the images are being accessed. Storing the content close to the access point can significantly increase decrease the load time.

This can be seen in the access times for the same file from US and from India buckets using S3. The access is done from India

From US bucket it takes about 4 seconds

...

times.

For example, consider a sample image file whose copies are hosted both in the US and the India buckets using the Amazon Simple Storage Service (S3). When these files are accessed from India, the load times are:

From the US bucket: 4.3 second.

Code Block
wget  https://tmusa.s3.us-west-2.amazonaws.com/sample.png

...


sample.png -   37.20M  8.60MB/s    in 4.3s

From the India bucket it takes about : 1 second.

...

Code Block
wget  https://

...

tmusa.s3.

...

us-

...

west-

...

2.amazonaws.com/sample.png

...


sample.png -   37.20M 

...

 8.

...

60MB/s    in

...

 4.

...

3s

This different The difference in load time becomes significant the load times can have a significant effect on user productivity, especially when the network bandwidth is low and could affect productivity. There are a few possible solutions for thislimited.

Some of the ways you can reduce the disparity in the load times include:

  • Reduce the size of the images used for labelling. This

...

  • can result in quality issues and

...

  • should generally be avoided.

  • Increase the network bandwidth

...

  • available to the labellers

...

  • . This

...

  • may not always be cost-effective, and higher bandwidths may not be available at all locations.

  • Move the

...

  • content to a bucket closer to the location of the labellers. This will not be effective if the team is

...

  • spread across various geographies. There could also be regulatory restrictions on where the content can be stored.

  • Setup a Cloud Delivery Network (CDN) that will cache the images across different geographies. This

...

  • adds a layer

...

  • between the buckets and the access point allowing for lower load times.

Amazon Web Services (AWS) provides a CDN layer called the Amazon CloudFront which provides cached access to S3 buckets. This document details Learn how to setup CloudFront set up the Amazon CloudFront to work with Taskmonk and how to configure Taskmonk to use Cloudfront for serving images from S3the service.

Setting up the Amazon CloudFront

...

Info

The Amazon CloudFront uses a

...

public-private key pair to provide safe access to URLs. The public key is uploaded to the AWS

...

server and the private key is used by the client to generate the signed

...

URL.

  1. The procedure to create the key pair is detailed at https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/private-content-trusted-signers.html#private-content-creating-cloudfront-key-pairs

  2. Use the public key created to create a CloudFront key

...

  1. Create a public-private key pair. The steps for the same are listed here and are beyond the scope of this document.

  2. Navigate to CloudFront > Public keys. Click Create public key. The Create Public Key modal appears.

    Image Added

  3. Enter a name to identify your public key and paste the public key created in

...

d. Create a key group which can be associated with the Distribution

...

2. Go to Amazon AWS Console and Select CloudFront from Services. Click on Create Distribution

...

...

  1. step 1 in the fields provided. Click Create Public Key.

  2. Navigate to CloudFront > Key groups. Click Create key group. The Create Key Group modal appears.

    Image Added

  3. Enter a name to identify the key group and the appropriate public key in the fields provided.

  4. Go to the AWS console and click Services > CloudFront. The Distributions page appears.

    Image Added

  5. Click Create distribution. The Create Distribution modal appears.

    Image Added

  6. Click Origin and select the S3 bucket for which CDN needs to be

...

  1. set up.

  2. Leave Origin Path blank. This is used to serve files from a particular directory.

...

  1. Enter a

...

  1. name to identify the CDN in the field provided.

5. Select “use OAI” and create a new OAI. Select “Yes updated the bucket policy”. This is needed in order to allow Cloudfront to access the S3 buckets. Without this, the S3 bucket would need public access. Automatically updating policy is recommended else the S3 access policy will need to be manually updated with the CloudFront access info

...

6. Set the following Cache Options

...

  1. Select S3 bucket access > yes use OAI (bucket can restrict access to only CloudFront). Click Create New OAI.
    Select Bucket Policy > Yes, update the bucket policy.

    Image Added

  2. Set the cache behavior as below:

    • Path pattern: Default (* ).

    • Compress objects automatically: Yes.

    • Viewer > Viewer protocol policy: HTTP and HTTPS.

    • Viewer > Allowed HTTP methods: GET, HEAD, OPTIONS, PUT, POST, PATCH, DELETE.

      Image Added

  3. Disallow public access to the cloud front for security

...

...

  1. :

    1. Set Restrict viewer access to Yes.

    2. Set Trusted authorization type to Trusted key groups (recommended).

      Image Added

  2. Set up header forwarding to prevent cors errors

...

...

  1. :

    1. Select Cache key and origin requests > Legacy cache settings.

    2. Set Headers as Include the following headers.

    3. Select the following headers from the dropdown provided:

      • Acess-Control-Request-Method.

      • Access-Control-Request-Headers.

      • Origin.

        Image Added

  2. Leave the rest of the settings

...

  1. as is and create the distribution.

...

Setting up Taskmonk for the Amazon CloudFront

The You must configure the taskmonk project that has been setup set up with S3 needs to be configured to use the Cloudfront CDN. This is done through Amazon CloudFront. To do this, you must create an automated processor that changes the image url for the task on allocation. This is done so that the task results and annotations can be associated with the original url and nothing changes in the import or export process.to change the URL of the input file. This processor would be run on task allocation. To set up Taskmonk to work with the Amazon CloudFront:

  1. Create an input field to save temporarily hold the original URL . Since the UI needs the url to load in a specific field, the automated processor will overwrite the image url field. The original URL will be saved to a different field which will then be written back to the original field on completion

  2. Create the automated processor as shown below

Image Removed

Set the settings as shown below:

Image Removed

a. The field that was created in step 1 to save the input field

b. The distribution domain that was setup in CloudFront

c. The private key that was created and used to setup the distribution domain. Paste the contents of the private key without the “-----BEGIN RSA PRIVATE KEY-----” and the “-----END RSA PRIVATE KEY-----” lines.

d. The id of the key pair that was created when setting up the cloud front

3. Another automated processor needs to be set to copy the original URL back to the image url field.

...

The settings will select the temporary field as the source and the image url field as the destination.

...

  1. of the input file.

  2. Navigate to Advanced > Project Task > Task Properties > Task Pre-processing.

  3. Click the Add button on the right side of the page. A list of processors appears.

  4. Click On Task Allocation Format. Select Redirect Cloud Front and click Next. The settings for the automated processor appear.

    1. Set Select field to store temporary values as the field created in step 1.

    2. Set Distribution Domain as the distribution domain set up in the Amazon CloudFront.

    3. Set Private Key to match the private key created when setting up the distribution domain.

    4. Set Key Pair Id as the id of the key pair created when setting up the Amazon CloudFront.

  5. Navigate to Advanced > Project Task > Task Properties > Task Post-processing.

  6. Click the Add button on the right side of the page. A list of processors appears.

  7. Click On Task Completion. Select Copy Fied Value and click Next. The settings for the automated processor appear.

  8. Set the processor to copy the value from the temporary field to the field containing the original URL.