This article provides a detailed overview of using Glue Tags & how to get started with Glue Tags in AWS.


What is an AWS Glue Tag?

A tag is a label that you assign to an AWS resource. Each tag consists of a key and an optional value, both of which you define. You’re capable of optionally assigning your very own tags on specific Glue resources, so that you can effectively manage your resources.

AWS Glue Tag - AWS Tag

AWS Glue Tag – AWS Tag


AWS Glue Tags Benifits:
  • Organizing + identifying resources.
  • Creating cost accounting reports.
  • Restricting access to specific resources.

Identity and Access Management: It allows you to control the users you wish to grant permission for performing the following actions on tags: creating, editing, or deleting.

The below listed resources can be tagged:

AWS Glue Tag - Advanced Properties

AWS Glue Tag – Advanced Properties

  • Machine learning transforms
  • Development endpoints
  • Crawlers
  • Jobs
  • Triggers

Additional Details:

Never forget to add the glue:TagResource action when creating your policy for enabling to get your Glue resources tagged.

Considee the following aspects while tagging.

  • Maximum of Fifty tags allowed for each entity.
  • Tags will reflect key-value pairs listed in the following format {“string”: “string” …}
  • You need the tag key when creating a tag on a specific object, but it’s optional to add the tag value.
  • Case sensitive: Tag key + Tag value.
  • The prefix “aws” is not allowed to be included in tag keys and tag values.
  • Max length for tag key= 128 UTF-8. No null or empty tag keys.
  • Max length for tag value= 256 UTF-8. No null or empty tag values.

How AWS Glue Tags looks like-

Creating a specific job while having tags assigned to it.

  • CLI based Approach
aws glue create-job --name job-test-tags--role MyJobRole--command Name=glueetl,ScriptLocation=S3://aws-glue-scripts//prod-job1 --tags '{"key1" : "value1", "key2 : "value2"}'
  • CloudFormation JSON
{
  "Description": "AWS Glue Job Test Tags",
  "Resources": {
    "MyJobRole": {
      "Type": "AWS::IAM::Role",
      "Properties": {
        "AssumeRolePolicyDocument": {
          "Version": "2012-10-17",
          "Statement": [
            {
              "Effect": "Allow",
              "Principal": {
                "Service": [
                  "glue.amazonaws.com"
                ]
              },
              "Action": [
                "sts:AssumeRole"
              ]
            }
          ]
        },
        "Path": "/",
        "Policies": [
          {
            "PolicyName": "root",
            "PolicyDocument": {
              "Version": "2012-10-17",
              "Statement": [
                {
                  "Effect": "Allow",
                  "Action": "*",
                  "Resource": "*"
                }
              ]
            }
          }
        ]
      }
    },
    "MyJob": {
      "Type": "AWS::Glue::Job",
      "Properties": {
        "Command": {
          "Name": "glueetl",
          "ScriptLocation": "s3://aws-glue-scripts//prod-job1"
        },
        "DefaultArguments": {
          "--job-bookmark-option": "job-bookmark-enable"
        },
        "ExecutionProperty": {
          "MaxConcurrentRuns": 2
        },
        "MaxRetries": 0,
        "Name": "cf-job1",
        "Role": {
          "Ref": "MyJobRole",
          "Tags": {
            "key1": "value1",
            "key2": "value2"
          }
        }
      }
    }
  }
}

AWS Glue Tag on IAM Policies
AWS Glue Tag - Tags on IAM

AWS Glue Tag – Tags on IAM

You can also control access to certain types of AWS Glue resources using AWS tags. These are used for giving or denying access according to the keys that you place on development endpoints, jobs, triggers and crawlers.

  • Condition element.
  • glue:resourceTag context key
Example:
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": "glue:*",
    "Resource": "*",
    "Condition": {
      "StringEquals": {
        "glue:resourceTag/Name": "CloudySave"
      }
    }
  }]
}
Additional Details:

Only jobs, development endpoints, crawlers and triggers can have condition context keys.


Resource-Level Permissions for Specific Objects

  • Always better to choose least privilege based access.
  • Enter client’s IAM policy not to confuse with the API operations that support ARNs for the “Resource” statement, while others don’t support it.
  • The below example details an IAM policy that supports API operations for both actions of “GetClassifier” and “GetJobRun”.
  • “Resource” will get defined as “*”, since ARNs are not supported by Glue for both jobs and classifier runs.
  • Few operations, including “GetDatabase” and GetTable”, support ARNs. You may add your ARNs in the 2nd part of your policy.
{
  "Version": "2012-10-17",
  "Statement": [{
      "Effect": "Allow",
      "Action": [
        "glue:GetClassifier*",
        "glue:GetJobRun*"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "glue:Get*"
      ],
      "Resource": [
        "arn:aws:glue:us-east-1:123456789012:catalog",
        "arn:aws:glue:us-east-1:123456789012:database/default",
        "arn:aws:glue:us-east-1:123456789012:table/default/e*1*",
        "arn:aws:glue:us-east-1:123456789012:connection/connection2"
      ]
    }
  ]
}

AWS Glue Console Permissions
AWS Glue Tag - Glue Console Permissions

AWS Glue Tag – Glue Console Permissions

A user needs necessary permissions for their account in order to be able to operate with the Glue console. Also, permissions from the below services are required:

  • Displaying Logs: CloudWatch Logs permissions.
  • Listing and passing roles: IAM permissions.
  • Working with stacks: CloudFormation permissions.
  • Listing instances, VPCs, other objects, security groups and subnets: EC2 permissions.
  • Listing objects and buckets + saving and retrieving scripts: S3 permissions.
  • Working with clusters: Redshift permissions.
  • Listing instances: RDS permissions

Here are few awesome resources on AWS Services:
AWS S3 Bucket Details
AWS Glue Tags
AWS S3 File Explorer
AWS Cost Optimization

  • CloudySave is an all-round one stop-shop for your organization & teams to reduce your AWS Cloud Costs by more than 55%.
  • Cloudysave’s goal is to provide clear visibility about the spending and usage patterns to your Engineers and Ops teams.
  • Have a quick look at CloudySave’s Cost Caluculator to estimate real-time AWS costs.
  • Sign up Now and uncover instant savings opportunities.

 


AUTHOR