Article Categories

Selected Reading

How to use Boto3 to start a crawler in AWS Glue Data Catalog

AWS Boto3 Python Server Side Programming Programming

In this article, we will see how to start a crawler in AWS Glue Data Catalog using Python's boto3 library. A crawler automatically discovers and catalogs metadata about your data sources.

Problem Statement

Use boto3 library in Python to programmatically start an AWS Glue crawler.

Algorithm to Solve This Problem

Step 1: Import boto3 and botocore exceptions to handle errors
Step 2: Define a function that accepts crawler_name as parameter
Step 3: Create an AWS session using boto3. Ensure region_name is configured in your default profile
Step 4: Create an AWS client for glue service
Step 5: Use the start_crawler() function with the crawler name
Step 6: Handle CrawlerRunningException if the crawler is already running
Step 7: Handle generic exceptions for other potential errors

Example Code

The following code demonstrates how to start a crawler in AWS Glue Data Catalog ?

import boto3
from botocore.exceptions import ClientError

def start_a_crawler(crawler_name):
    session = boto3.session.Session()
    glue_client = session.client('glue')
    try:
        response = glue_client.start_crawler(Name=crawler_name)
        return response
    except ClientError as e:
        raise Exception("boto3 client error in start_a_crawler: " + e.__str__())
    except Exception as e:
        raise Exception("Unexpected error in start_a_crawler: " + e.__str__())

# First time - start the crawler
print(start_a_crawler("Data Dimension"))

# Second time - try to start while crawler is running
print(start_a_crawler("Data Dimension"))

Output

# First time - start the crawler
{'ResponseMetadata': {'RequestId': '73e50130-*****************8e', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Sun, 28 Mar 2021 07:26:55 GMT', 'content-type': 'application/x-amz-json-1.1', 'content-length': '2', 'connection': 'keep-alive', 'x-amzn-requestid': '73e50130-***************8e'}, 'RetryAttempts': 0}}

# Second time - try to start while crawler is running
Exception: boto3 client error in start_a_crawler: An error occurred (CrawlerRunningException) when calling the StartCrawler operation: Crawler with name Data Dimension has already started

Key Points

The start_crawler() method starts a crawler immediately, regardless of its schedule
If a crawler is already running, AWS throws a CrawlerRunningException
Ensure your AWS credentials are properly configured before running the code
The response contains metadata about the request including HTTP status code and request ID

Conclusion

Using boto3's start_crawler() method allows you to programmatically trigger AWS Glue crawlers. Always handle the CrawlerRunningException to manage cases where the crawler is already active.

Ashish Anand

Updated on: 2026-03-25T18:46:41+05:30

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started

Previous Next