35
How to List Contents of s3 Bucket Using Boto3 Python?
S3 is a storage service from AWS. You can store any files such as CSV files or text files. You may need to retrieve the list of files to make some file operations. You'll learn how to list the contents of an S3 bucket in this tutorial.
You can list contents of the S3 Bucket by iterating the dictionary returned from my_bucket.objects.all()
method.
If You're in Hurry...
You can use the below code snippet to list the contents of the S3 Bucket using boto3.
Snippet
import boto3
session = boto3.Session( aws_access_key_id='<your_access_key_id>', aws_secret_access_key='<your_secret_access_key>')
s3 = session.resource('s3')
my_bucket = s3.Bucket('stackvidhya')
for my_bucket_object in my_bucket.objects.all():
print(my_bucket_object.key)
Output
csv_files/
csv_files/IRIS.csv
df.csv
dfdd.csv
file2_uploaded_by_boto3.txt
file3_uploaded_by_boto3.txt
file_uploaded_by_boto3.txt
filename_by_client_put_object.txt
text_files/
text_files/testfile.txt
If You Want to Understand Details, Read on…
In this tutorial, you'll learn the different methods to list contents from an S3 bucket using boto3.
You'll use boto3 resource and boto3 client to list the contents and also use the filtering methods to list specific file types and list files from the specific directory of the S3 Bucket.
If you've not installed boto3 yet, you can install it by using the below snippet.
Snippet
%pip install boto3
Boto3 will be installed successfully.
Now, you can use it to access AWS resources.
In this section, you'll use the Boto3 resource to list contents from an s3 bucket.
- Create Boto3 session using
boto3.session()
method - Create the
S3
resourcesession.resource('s3')
snippet - Create bucket object using the
resource.Bucket(<Bucket_name>)
method. - Invoke the objects.all() method from your bucket and iterate the returned collection to get the each object details and print each object name using thy attribute
key
.
Note: In addition to listing objects present in the Bucket, it'll also list the sub-directories and the objects inside the sub-directories.
Use the below snippet to list objects of an S3 bucket.
Snippet
import boto3
session = boto3.Session(aws_access_key_id='<your_access_key_id>', aws_secret_access_key='<your_secret_access_key>')
#Then use the session to get the resource
s3 = session.resource('s3')
my_bucket = s3.Bucket('stackvidhya')
for my_bucket_object in my_bucket.objects.all():
print(my_bucket_object.key)
You'll see the list of objects present in the Bucket as below in alphabetical order.
Output
csv_files/
csv_files/IRIS.csv
df.csv
dfdd.csv
file2_uploaded_by_boto3.txt
file3_uploaded_by_boto3.txt
file_uploaded_by_boto3.txt
filename_by_client_put_object.txt
text_files/
text_files/testfile.txt
This is how you can use the boto3 resource to List objects in S3 Bucket.
In this section, you'll use the boto3 client to list the contents of an S3 bucket.
- Create Boto3 session using
boto3.session()
method - Create the boto3 s3 client using the
boto3.client('s3')
method. - Invoke the list_objects_v2() method with the bucket name to list all the objects in the S3 bucket. It returns the dictionary object with the object details.
-
Iterate the returned dictionary and display the object names using the
obj[key]
.
Note: Similar to the Boto3 resource methods, the Boto3 client also returns the objects in the sub-directories.
Use the below snippet to list objects of an S3 bucket.
Snippet
import boto3
s3_client = boto3.client('s3',
aws_access_key_id='<your_access_key_id>',
aws_secret_access_key='<your_secret_access_key>'
)
objects = s3_client.list_objects_v2(Bucket='stackvidhya')
for obj in objects['Contents']:
print(obj['Key'])
You'll see the objects in the S3 Bucket listed below.
Output
csv_files/
csv_files/IRIS.csv
df.csv
dfdd.csv
file2_uploaded_by_boto3.txt
file3_uploaded_by_boto3.txt
file_uploaded_by_boto3.txt
filename_by_client_put_object.txt
text_files/
text_files/testfile.txt
This is how you can list keys in the S3 Bucket using the boto3 client.
In this section, you'll learn how to list a subdirectory's contents that are available in an S3 bucket. This will be useful when there are multiple subdirectories available in your S3 Bucket, and you need to know the contents of a specific directory.
You can use the filter() method in bucket objects and use the Prefix
attribute to denote the name of the subdirectory.
Filter()
and Prefix
will also be helpful when you want to select only a specific object from the S3 Bucket.
Use the below snippet to select content from a specific directory called csv_files from the Bucket called stackvidhya.
Snippet
import boto3
session = boto3.Session( aws_access_key_id='<your_access_key_id>', aws_secret_access_key='<your_secret_access_key>')
#Then use the session to get the resource
s3 = session.resource('s3')
my_bucket = s3.Bucket('stackvidhya')
for objects in my_bucket.objects.filter(Prefix="csv_files/"):
print(objects.key)
You'll see the list of objects present in the sub-directory csv_files in alphabetical order.
Output
csv_files/
csv_files/IRIS.csv
This is how you can list files in the folder or select objects from a specific directory of an S3 bucket.
In this section, you'll learn how to list specific file types from an S3 bucket.
This may be useful when you want to know all the files of a specific type. To achieve this, first, you need to select all objects from the Bucket and check if the object name ends with the particular type. If it ends with your desired type, then you can list the object.
It'll list the files of that specific type from the Bucket and including all subdirectories.
Use the below snippet to list specific file types from an S3 bucket.
Snippet
import boto3
session = boto3.Session( aws_access_key_id='<your_access_key_id>', aws_secret_access_key='<your_secret_access_key>')
s3 = session.resource('s3')
my_bucket = s3.Bucket('stackvidhya')
for obj in my_bucket.objects.all():
if obj.key.endswith('txt'):
print(obj.key)
You'll see all the text files available in the S3 Bucket in alphabetical order.
Output
file2_uploaded_by_boto3.txt
file3_uploaded_by_boto3.txt
file_uploaded_by_boto3.txt
filename_by_client_put_object.txt
text_files/testfile.txt
This is how you can list files of a specific type from an S3 bucket.
Boto3 currently doesn't support server side filtering of the objects using regular expressions.
However, you can get all the files using the objects.all()
method and filter it using the regular expression in the IF condition.
For example, if you want to list files containing a number in its name, you can use the below snippet. To do an advanced pattern matching search, you can refer to the regex cheat sheet.
Snippet
import re
import boto3
session = boto3.Session(aws_access_key_id='<your_access_key_id>', aws_secret_access_key='<your_secret_access_key>')
s3 = session.resource('s3')
my_bucket = s3.Bucket('stackvidhya')
substring = "\d"
for obj in my_bucket.objects.all():
if re.search(substring, obj.key):
print(obj.key)
You'll see the file names with numbers listed below.
Output
file2_uploaded_by_boto3.txt
file3_uploaded_by_boto3.txt
file_uploaded_by_boto3.txt
This is how you can list contents from a directory of an S3 bucket using the regular expression.
To summarize, you've learned how to list contents for an S3 bucket using boto3 resource and boto3 client. You've also learned to filter the results to list objects from a specific directory and filter results based on a regular expression.
If you have any questions, comment below.
35