Amazon EMR: Create Cluster

 

First: Sample Cluster Prerequisites

Prior to getting your EMR cluster set up, you need to ensure that you’ve completely finished the following prerequisites.

 

Signing Up for an AWS Account:

In case you still don’t have an AWS account, go over those steps and create one.

How to sign up for a new account

  1. Go to the following link https://portal.aws.amazon.com/billing/signup.
  2. Go over the instructions that are provided online.

Creating S3 Bucket:

You bucket and your folder names include the below limitations:

  • Contains nothing but letters, hyphens, numbers and periods.
  • Not ending in numbers.

Upon creating the bucket, select it then click on Create folder, change the name New folder to another name which identifies it, then click on Save.

Creating EC2 Key Pair:

An EC2 key pair is needed for connecting over a secure channel to the nodes in the cluster with SSH protocol. Go over 1 of the below procedures based on the operating system that you are using.

  • Create Key Pair through EC2 with Windows Instances.
  • Create Key Pair through EC2 with Linux Instances and Mac OS.

 

 

Second: Sample EMR Cluster Launching

Launching Sample Cluster:

How to launch sample EMR cluster?

  1. Login to Management Console and head straight to the EMR console using this link https://console.aws.amazon.com/elasticmapreduce/.

 

  1. Click on Create cluster.

  1. From the page Create Cluster – Quick Options, confirm all default values excluding the below fields:
  • Type in a unique Cluster name for distinguishing and identifying your cluster, like First Sample EMR Cluster.

 

  • For the section of Security and access, select the EC2 key pair which you previously created as a prerequisite.
  1. Click on Create cluster.

This will then take you straight to the cluster status page which has the cluster Summary. This page may be utilized for the sake of monitoring how the cluster creation is progressing and viewing cluster status details. When the cluster creation tasks are completed, the items that are found on the status page will get updated directly. If they don’t automatically update, select the refresh icon which is located on the right side, otherwise get your browser refreshed for receiving the new updates.

For the section Network and hardware, locate the instance status of Master and Core. You will notice that the status will be changing from Provisioning then Bootstrapping then Waiting while the cluster creation process takes place.

 

Upon the appearance of links for Security groups for Master and Security Groups for Core & Task, you are capable of move on to the next step. However, you must be patient while waiting for the cluster to successfully start and being in the Waiting state.

Quick Options (Summary)

This table shows the fields and default values for launching your cluster in EMR console with the option Quick cluster configuration.

Console fieldsDefault valuesDescriptions
Cluster nameMy cluster(Optional)
An identifying name for the cluster with no need to make it unique.
LoggingEnableWith this being enabled, a detailed log data will be written to the S3 folder which was specified.
S3 folders3://aws-logs-account_number-region/elasticmapreduce/Gives the path to which folder of an S3 bucket for writing your log data.
Launch modeClusterThis option specifies whether to launch a long-running cluster or a cluster that terminates after running any steps that you specify.
Having the Cluster option, your cluster will keep on running till you choose to terminate it.
Releaseemr-5.30.0For the EMR release version to be utilized upon the cluster creation.
ApplicationsCore HadoopChooses open-source applications from the a great data ecosystem for installing on the cluster.
Instance typem5.xlargeChooses which EC2 instance type is going to be initialized for the instances running in the cluster.
Number of instances3To choose how many EC2 instances are going to be initialized. Every instance reflects a node that is found in the EMR cluster. A minimum of  node is needed, and it will correspond to the master node.
EC2 key pairChoose an optionChooses which EC2 key pair is going to be used for performing the connection to the nodes located in the cluster over an SSH connection. It is better if you go ahead and create then specify an EC2 key pair on your own so that it’s something unique and well known to you.
PermissionsDefaultFor choosing which AWS Identity and Access Management roles this cluster is going to be using. You are capable of selecting Custom for the sake of specifying the roles you need. It is better for you to rely on the default roles as a start.

AWS


AUTHOR