File Download from S3 into Notebook (AWS) – SYNTASA™

Applicable to Syntasa platforms installed in an Amazon AWS environment, this articles notebook provides examples of how to download files from S3 to your notebook. Downloading files from sources other than S3 will be done in a similar fashion.

Installing aws cli pre-requisites

!pip3 install awscli --upgrade

How to download files from s3 using the aws cli

import subprocess
import shutil

bucket_name = 'your-bucket-name'
object_prefix = 'other/sample-data/csv_test'
object_key = '20220701.export.CSV'
obj_in_s3_full_path = f's3://{bucket_name}/{object_prefix}/{object_key}'
local_des_path = '/tmp/my_files'

# First lets delete all files that exist
shutil.rmtree(local_des_path, ignore_errors=True, onerror=None)

# Lets create a temporary folder locally to hold our files
os.makedirs(local_des_path, exist_ok=True)

# Now download the file from s3
command = subprocess.check_output(f'aws s3 cp {obj_in_s3_full_path} {local_des_path}/', shell=True)

# Lets validate the file was downloaded by printing the contents of the folder
!ls -lah {local_des_path}

How to download a file using boto3 -- Installing pre-requisites (upgrading pip and downloading boto3)

#First we'll install all the dependencies we'll use after this cell
!pip3 install --upgrade pip #lets upgrade pip
!pip3 install boto3
!pip3 install cloudpathlib

How to download a single file to your local notebook environment using Boto3

# Define your imports and you object location (bucket, object, destination path)
import os
import boto3
import shutil

bucket_name = 'your-bucket-name'
object_prefix = 'other/sample-data/csv_test'
object_key = '20220701.export.CSV'
local_des_path = '/tmp/my_files'

# First lets delete all files that exist
shutil.rmtree(local_des_path, ignore_errors=True, onerror=None)

# Lets create a temporary folder locally to hold our files
os.makedirs(local_des_path, exist_ok=True)

# Create a boto3 client and download the specified object
s3_client = boto3.client('s3')
s3_client.download_file(bucket_name, f'{object_prefix}/{object_key}', f'{local_des_path}/{object_key}')

# Now lets validate that the file exists in the local path we specified above
!ls -lah /tmp/my_files/

How to download an entire folder to your local notebook environment using CloudPathLib

Cloudpathlib is a library that has built in functions on top of boto3 to help download/upload files easily

import os
import pathlib
import shutil
from cloudpathlib import CloudPath

bucket_name = 'your-bucket-name'
object_prefix = 'other/sample-data/csv_test'
local_des_path = '/tmp/my_files/my_downloaded_folder/'

# First lets delete all files that exist
shutil.rmtree(local_des_path, ignore_errors=True, onerror=None)

# Lets create a local folder where we will download all the files from our s3 bucket
os.makedirs(local_des_path, exist_ok=True)

# Lets get a list of all the objects in the s3 folder we're trying to download (please note that if you have more than 5000 files or objects, you will need to use multi-threading or paginators)
cloud_path = CloudPath(f's3://{bucket_name}/{object_prefix}/')
cloud_path.download_to(local_des_path)

# Now lets validate that the folder and the files were downloaded
!ls -lah {local_des_path}

Uploading Files to an S3 bucket (For Amazon AWS Environments)

In this notebook, we'll be showing you examples of how to upload files from your local notebook environment to an s3 bucket. Uploading files from other sources will be done in similar fashion.
Let's create a dummy CSV File so that we can upload it to Amazon S3.

import csv

local_folder_path = '/tmp/my_files'
csv_file_name = 'my_sample_file.csv'

# First lets delete all files that exist
shutil.rmtree(local_folder_path, ignore_errors=True, onerror=None)

# Lets create a local folder where we will download all the files from our s3 bucket
os.makedirs(local_folder_path, exist_ok=True)


header = ['name', 'area', 'country_code2', 'country_code3']
data = ['Mexico', 758400, 'MX', 'MEX']

with open(f'{local_folder_path}/{csv_file_name}', 'w', encoding='UTF8') as f:
writer = csv.writer(f)
writer.writerow(header)
writer.writerow(data)

# Verify file exists and that we can see the contents of the file
print(f'Files in path {local_folder_path}')
!ls -lah {local_folder_path}
print(f'\nContents of File :: {csv_file_name}')
!cat {local_folder_path}/{csv_file_name}

Uploading a file to Amazon S3 using AWS CLI

import subprocess
import shutil

remote_bucket_name = 'your-bucket-name'
object_prefix = 'other/sample-data/csv_test'
csv_file_name = 'my_sample_file.csv'
obj_in_s3_full_path = f's3://{remote_bucket_name}/{object_prefix}/{csv_file_name}'
local_folder_path = '/tmp/my_files'

# Now download the file from s3
command = subprocess.check_output(f'aws s3 cp {local_folder_path}/{csv_file_name} {obj_in_s3_full_path}', shell=True)

# Lets validate the file was uploaded to s3
!aws s3 ls s3://{remote_bucket_name}/{object_prefix}/ | grep {csv_file_name}

Uploading a file to Amazon S3 using Boto3

import os
import boto3

remote_bucket_name = 'your-bucket-name'
object_prefix = 'other/sample-data/csv_test'
csv_file_name = 'my_sample_file.csv'
local_folder_path = '/tmp/my_files'

# Create a boto3 client and download the specified object
s3_client = boto3.client('s3')
s3_client.upload_file(f'{local_folder_path}/{csv_file_name}', remote_bucket_name, f'{object_prefix}/{csv_file_name}')

# Lets validate the file was uploaded to s3
!aws s3 ls s3://{remote_bucket_name}/{object_prefix}/ | grep {csv_file_name}

Uploading a file to Amazon S3 using Cloudpathlib

import os
import pathlib
import shutil
from cloudpathlib import CloudPath

remote_bucket_name = 'Your-sample-bucket'
object_prefix = 'other/sample-data/csv_test'
csv_file_name = 'my_sample_file.csv'
local_folder_path = '/tmp/my_files'

# Create a cloud path and upload from the local file
cloud_path = CloudPath(f's3://{remote_bucket_name}/{object_prefix}/{csv_file_name}')
cloud_path.upload_from(f'{local_folder_path}/{csv_file_name}', force_overwrite_to_cloud=True)

# Lets validate the file was uploaded to s3
!aws s3 ls s3://{remote_bucket_name}/{object_prefix}/ | grep {csv_file_name}

The best way to understand and learn how to perform this function is through hands-on experience. Follow the steps below to create the sample notebook in your Syntasa environment:

Download the sample notebook .ipynb file from this article.
Create a new notebook in your Syntasa environment using the import notebook option.

downloading-files-part1.ipynb
20 KB Download

{[{category.name}]}