Article Categories

Selected Reading

How to use Boto3 to get the definition of all the Glue jobs at a time?

Boto3 Python Server Side Programming Programming

AWS Glue is a managed ETL service that helps you prepare data for analytics. Using the boto3 library, you can programmatically retrieve the complete definitions of all Glue jobs in your AWS account, including their configurations, roles, and parameters.

Understanding get_jobs() vs list_jobs()

There are two key methods for working with Glue jobs ?

list_jobs() − Returns only job names
get_jobs() − Returns complete job definitions with all configurations

Prerequisites

Before running the code, ensure you have ?

AWS credentials configured (via AWS CLI, environment variables, or IAM roles)
boto3 library installed: pip install boto3
Appropriate IAM permissions for Glue operations

Getting All Glue Job Definitions

The following code retrieves complete definitions for all Glue jobs in your account ?

import boto3
from botocore.exceptions import ClientError

def get_definition_of_glue_jobs():
    session = boto3.session.Session()
    glue_client = session.client('glue')
    try:
        response = glue_client.get_jobs()
        return response
    except ClientError as e:
        raise Exception("boto3 client error in get_definition_of_glue_jobs: " + e.__str__())
    except Exception as e:
        raise Exception("Unexpected error in get_definition_of_glue_jobs: " + e.__str__())

# Get all job definitions
jobs_response = get_definition_of_glue_jobs()
print(f"Found {len(jobs_response['Jobs'])} jobs")

# Print job names and types
for job in jobs_response['Jobs']:
    print(f"Job: {job['Name']}, Type: {job['Command']['Name']}")

Output Structure

The response contains a Jobs array with detailed information for each job ?

{
  'Jobs': [
    {
      'Name': '01_PythonShellTest1',
      'Role': 'arn:aws:iam::123456789012:role/glue-execution-role',
      'CreatedOn': datetime.datetime(2021, 1, 6, 19, 59, 19),
      'Command': {
        'Name': 'pythonshell',
        'ScriptLocation': 's3://my-bucket/scripts/test.py',
        'PythonVersion': '3'
      },
      'DefaultArguments': {
        '--job-bookmark-option': 'job-bookmark-disable'
      },
      'MaxRetries': 0,
      'Timeout': 2880,
      'GlueVersion': '2.0'
    }
  ],
  'NextToken': 'pagination-token-if-more-results'
}

Handling Pagination

For accounts with many jobs, use the NextToken for pagination ?

def get_all_glue_jobs():
    session = boto3.session.Session()
    glue_client = session.client('glue')
    
    all_jobs = []
    next_token = None
    
    try:
        while True:
            if next_token:
                response = glue_client.get_jobs(NextToken=next_token)
            else:
                response = glue_client.get_jobs()
            
            all_jobs.extend(response['Jobs'])
            
            # Check if there are more results
            if 'NextToken' not in response:
                break
            next_token = response['NextToken']
            
        return all_jobs
        
    except ClientError as e:
        print(f"Error retrieving Glue jobs: {e}")
        return []

# Get all jobs with pagination
all_jobs = get_all_glue_jobs()
print(f"Total jobs retrieved: {len(all_jobs)}")

Key Job Properties

Property	Description	Example
`Name`	Job identifier	'data-processing-job'
`Role`	IAM role ARN	'arn:aws:iam::123:role/glue-role'
`Command`	Job type and script location	{'Name': 'glueetl', 'ScriptLocation': 's3://...'}
`MaxRetries`	Retry attempts on failure	3

Conclusion

Use get_jobs() to retrieve complete Glue job definitions including configurations, roles, and parameters. Handle pagination with NextToken for large job lists, and implement proper error handling for production applications.

Ashish Anand

Updated on: 2026-03-25T18:19:16+05:30

597 Views

Kickstart Your Career

Get certified by completing the course

Get Started

Previous Next