Working with the Amazon CloudFront

Learn how to set up the Amazon CloudFront to work with Taskmonk, and how to configure Taskmonk to use the service.

Overview

While working with cloud-based applications, the loading time for the application content depends on the geographical separation between the cloud storage and the location at which the images are being accessed. Storing the content close to the access point can significantly decrease the load times.

For example, consider a sample image file whose copies are hosted both in the US and the India buckets using the Amazon Simple Storage Service (S3). When these files are accessed from India, the load times are:

From the US bucket: 4.3 second.

wget  https://tmusa.s3.us-west-2.amazonaws.com/sample.png sample.png -   37.20M  8.60MB/s    in 4.3s

 

From the India bucket: 1 second.

wget  https://tmin.s3.india.amazonaws.com/sample.png sample.png -   37.20M  23.60MB/s    in 1s

 

The difference in the load times can have a significant effect on user productivity, especially when the network bandwidth is limited.

Some of the ways you can reduce the disparity in the load times include:

  • Reduce the size of the images used for labelling. This can result in quality issues and should generally be avoided.

  • Increase the network bandwidth available to the labellers. This may not always be cost-effective, and higher bandwidths may not be available at all locations.

  • Move the content to a bucket closer to the location of the labellers. This will not be effective if the team is spread across various geographies. There could also be regulatory restrictions on where the content can be stored.

  • Setup a Cloud Delivery Network (CDN) that will cache the images across different geographies. This adds a layer between the buckets and the access point allowing for lower load times.

 

Amazon Web Services (AWS) provides a CDN layer called the Amazon CloudFront which provides cached access to S3 buckets. Learn how to set up the Amazon CloudFront to work with Taskmonk and how to configure Taskmonk to use the service:

Setting up the Amazon CloudFront

The Amazon CloudFront uses a public-private key pair to provide safe access to URLs. The public key is uploaded to the AWS server and the private key is used by the client to generate the signed URL.

  1. Create a public-private key pair. The steps for the same are listed here and are beyond the scope of this document.

  2. Navigate to CloudFront > Public keys. Click Create public key. The Public Key modal appears.

     

  3. Enter a name to identify your public key and paste the public key created in step 1 in the fields provided. Click Create Public Key.

  4. Navigate to CloudFront > Key groups. Click Create key group. The Create Key Group modal appears.

     

  5. Enter a name to identify the key group and the appropriate public key in the fields provided.

  6. Go to the AWS console and click Services > CloudFront. The Distributions page appears.

     

  7. Click Create distribution. The Create Distribution modal appears.

     

  8. Click Origin and select the S3 bucket for which CDN needs to be set up.

  9. Leave Origin Path blank. This is used to serve files from a particular directory.

  10. Enter a name to identify the CDN in the field provided.

  11. Select S3 bucket access > yes use OAI (bucket can restrict access to only CloudFront). Click Create New OAI.
    Select Bucket Policy > Yes, update the bucket policy.

     

  12. Set the cache behavior as below:

    • Path pattern: Default (* ).

    • Compress objects automatically: Yes.

    • Viewer > Viewer protocol policy: HTTP and HTTPS.

    • Viewer > Allowed HTTP methods: GET, HEAD, OPTIONS, PUT, POST, PATCH, DELETE.

       

  13. Restrict public access to the cloud front for security:

    1. Set Restrict viewer access to Yes.

    2. Set Trusted authorization type to Trusted key groups (recommended).

       

  14. Set up header forwarding to prevent cors errors:

    1. Select Cache key and origin requests > Legacy cache settings.

    2. Set Headers as Include the following headers.

    3. Select the following headers from the dropdown provided:

      • Acess-Control-Request-Method.

      • Access-Control-Request-Headers.

      • Origin.

         

  15. Leave the rest of the settings as is and create the distribution.

Setting up Taskmonk for the Amazon CloudFront

You must configure the Taskmonk project set up with S3 to use the Amazon CloudFront. To do this, you must create an automated processor to change the URL of the input file. This processor should run on task allocation. To set up Taskmonk to work with the Amazon CloudFront:

  1. Create an input field to temporarily hold the original URL of the input file.

  2. Navigate to Advanced > Project Task > Task Properties > Task Pre-processing.

  3. Click the Add button on the right side of the page. A list of processors appears.

  4. Click On Task Allocation Format. Select Redirect Cloud Front and click Next. The settings for the automated processor appear.

    1. Set Select field to store temporary values as the field created in step 1.

    2. Set Distribution Domain as the distribution domain set up in the Amazon CloudFront.

    3. Set Private Key to match the private key created when setting up the distribution domain.

    4. Set Key Pair Id as the id of the key pair created when setting up the Amazon CloudFront.

  5. Navigate to Advanced > Project Task > Task Properties > Task Post-processing.

  6. Click the Add button on the right side of the page. A list of processors appears.

  7. Click On Task Completion. Select Copy Fied Value and click Next. The settings for the automated processor appear.

  8. Set the processor to copy the value from the temporary field to the field containing the original URL.

© 2020 Taskmonk Technology Pvt. Ltd. All Rights Reserved .