Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to use Boto3 to start a crawler in AWS Glue Data Catalog
In this article, we will see how to start a crawler in AWS Glue Data Catalog using Python's boto3 library. A crawler automatically discovers and catalogs metadata about your data sources.
Problem Statement
Use boto3 library in Python to programmatically start an AWS Glue crawler.
Algorithm to Solve This Problem
Step 1: Import boto3 and botocore exceptions to handle errors
Step 2: Define a function that accepts crawler_name as parameter
Step 3: Create an AWS session using boto3. Ensure region_name is configured in your default profile
Step 4: Create an AWS client for glue service
Step 5: Use the start_crawler() function with the crawler name
Step 6: Handle CrawlerRunningException if the crawler is already running
Step 7: Handle generic exceptions for other potential errors
Example Code
The following code demonstrates how to start a crawler in AWS Glue Data Catalog ?
import boto3
from botocore.exceptions import ClientError
def start_a_crawler(crawler_name):
session = boto3.session.Session()
glue_client = session.client('glue')
try:
response = glue_client.start_crawler(Name=crawler_name)
return response
except ClientError as e:
raise Exception("boto3 client error in start_a_crawler: " + e.__str__())
except Exception as e:
raise Exception("Unexpected error in start_a_crawler: " + e.__str__())
# First time - start the crawler
print(start_a_crawler("Data Dimension"))
# Second time - try to start while crawler is running
print(start_a_crawler("Data Dimension"))
Output
# First time - start the crawler
{'ResponseMetadata': {'RequestId': '73e50130-*****************8e', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Sun, 28 Mar 2021 07:26:55 GMT', 'content-type': 'application/x-amz-json-1.1', 'content-length': '2', 'connection': 'keep-alive', 'x-amzn-requestid': '73e50130-***************8e'}, 'RetryAttempts': 0}}
# Second time - try to start while crawler is running
Exception: boto3 client error in start_a_crawler: An error occurred (CrawlerRunningException) when calling the StartCrawler operation: Crawler with name Data Dimension has already started
Key Points
The start_crawler() method starts a crawler immediately, regardless of its schedule
If a crawler is already running, AWS throws a CrawlerRunningException
Ensure your AWS credentials are properly configured before running the code
The response contains metadata about the request including HTTP status code and request ID
Conclusion
Using boto3's start_crawler() method allows you to programmatically trigger AWS Glue crawlers. Always handle the CrawlerRunningException to manage cases where the crawler is already active.
