Applicable to Syntasa platforms installed in an Amazon AWS environment, this articles notebook provides examples of how to download files from S3 to your notebook. Downloading files from sources other than S3 will be done in a similar fashion.
Installing aws cli pre-requisites
!pip3 install awscli --upgrade
How to download files from s3 using the aws cli
import subprocess
import shutil
bucket_name = 'your-bucket-name'
object_prefix = 'other/sample-data/csv_test'
object_key = '20220701.export.CSV'
obj_in_s3_full_path = f's3://{bucket_name}/{object_prefix}/{object_key}'
local_des_path = '/tmp/my_files'
# First lets delete all files that exist
shutil.rmtree(local_des_path, ignore_errors=True, onerror=None)
# Lets create a temporary folder locally to hold our files
os.makedirs(local_des_path, exist_ok=True)
# Now download the file from s3
command = subprocess.check_output(f'aws s3 cp {obj_in_s3_full_path} {local_des_path}/', shell=True)
# Lets validate the file was downloaded by printing the contents of the folder
!ls -lah {local_des_path}
How to download a file using boto3 -- Installing pre-requisites (upgrading pip and downloading boto3)
#First we'll install all the dependencies we'll use after this cell
!pip3 install --upgrade pip #lets upgrade pip
!pip3 install boto3
!pip3 install cloudpathlib
How to download a single file to your local notebook environment using Boto3
# Define your imports and you object location (bucket, object, destination path)
import os
import boto3
import shutil
bucket_name = 'your-bucket-name'
object_prefix = 'other/sample-data/csv_test'
object_key = '20220701.export.CSV'
local_des_path = '/tmp/my_files'
# First lets delete all files that exist
shutil.rmtree(local_des_path, ignore_errors=True, onerror=None)
# Lets create a temporary folder locally to hold our files
os.makedirs(local_des_path, exist_ok=True)
# Create a boto3 client and download the specified object
s3_client = boto3.client('s3')
s3_client.download_file(bucket_name, f'{object_prefix}/{object_key}', f'{local_des_path}/{object_key}')
# Now lets validate that the file exists in the local path we specified above
!ls -lah /tmp/my_files/
How to download an entire folder to your local notebook environment using CloudPathLib
- Cloudpathlib is a library that has built in functions on top of boto3 to help download/upload files easily
import os
import pathlib
import shutil
from cloudpathlib import CloudPath
bucket_name = 'your-bucket-name'
object_prefix = 'other/sample-data/csv_test'
local_des_path = '/tmp/my_files/my_downloaded_folder/'
# First lets delete all files that exist
shutil.rmtree(local_des_path, ignore_errors=True, onerror=None)
# Lets create a local folder where we will download all the files from our s3 bucket
os.makedirs(local_des_path, exist_ok=True)
# Lets get a list of all the objects in the s3 folder we're trying to download (please note that if you have more than 5000 files or objects, you will need to use multi-threading or paginators)
cloud_path = CloudPath(f's3://{bucket_name}/{object_prefix}/')
cloud_path.download_to(local_des_path)
# Now lets validate that the folder and the files were downloaded
!ls -lah {local_des_path}
Uploading Files to an S3 bucket (For Amazon AWS Environments)
In this notebook, we'll be showing you examples of how to upload files from your local notebook environment to an s3 bucket. Uploading files from other sources will be done in similar fashion.
Let's create a dummy CSV File so that we can upload it to Amazon S3.
import csv
local_folder_path = '/tmp/my_files'
csv_file_name = 'my_sample_file.csv'
# First lets delete all files that exist
shutil.rmtree(local_folder_path, ignore_errors=True, onerror=None)
# Lets create a local folder where we will download all the files from our s3 bucket
os.makedirs(local_folder_path, exist_ok=True)
header = ['name', 'area', 'country_code2', 'country_code3']
data = ['Mexico', 758400, 'MX', 'MEX']
with open(f'{local_folder_path}/{csv_file_name}', 'w', encoding='UTF8') as f:
writer = csv.writer(f)
writer.writerow(header)
writer.writerow(data)
# Verify file exists and that we can see the contents of the file
print(f'Files in path {local_folder_path}')
!ls -lah {local_folder_path}
print(f'\nContents of File :: {csv_file_name}')
!cat {local_folder_path}/{csv_file_name}
Uploading a file to Amazon S3 using AWS CLI
import subprocess
import shutil
remote_bucket_name = 'your-bucket-name'
object_prefix = 'other/sample-data/csv_test'
csv_file_name = 'my_sample_file.csv'
obj_in_s3_full_path = f's3://{remote_bucket_name}/{object_prefix}/{csv_file_name}'
local_folder_path = '/tmp/my_files'
# Now download the file from s3
command = subprocess.check_output(f'aws s3 cp {local_folder_path}/{csv_file_name} {obj_in_s3_full_path}', shell=True)
# Lets validate the file was uploaded to s3
!aws s3 ls s3://{remote_bucket_name}/{object_prefix}/ | grep {csv_file_name}
Uploading a file to Amazon S3 using Boto3
import os
import boto3
remote_bucket_name = 'your-bucket-name'
object_prefix = 'other/sample-data/csv_test'
csv_file_name = 'my_sample_file.csv'
local_folder_path = '/tmp/my_files'
# Create a boto3 client and download the specified object
s3_client = boto3.client('s3')
s3_client.upload_file(f'{local_folder_path}/{csv_file_name}', remote_bucket_name, f'{object_prefix}/{csv_file_name}')
# Lets validate the file was uploaded to s3
!aws s3 ls s3://{remote_bucket_name}/{object_prefix}/ | grep {csv_file_name}
Uploading a file to Amazon S3 using Cloudpathlib
import os
import pathlib
import shutil
from cloudpathlib import CloudPath
remote_bucket_name = 'Your-sample-bucket'
object_prefix = 'other/sample-data/csv_test'
csv_file_name = 'my_sample_file.csv'
local_folder_path = '/tmp/my_files'
# Create a cloud path and upload from the local file
cloud_path = CloudPath(f's3://{remote_bucket_name}/{object_prefix}/{csv_file_name}')
cloud_path.upload_from(f'{local_folder_path}/{csv_file_name}', force_overwrite_to_cloud=True)
# Lets validate the file was uploaded to s3
!aws s3 ls s3://{remote_bucket_name}/{object_prefix}/ | grep {csv_file_name}
The best way to understand and learn how to perform this function is through hands-on experience. Follow the steps below to create the sample notebook in your Syntasa environment:
- Download the sample notebook .ipynb file from this article.
- Create a new notebook in your Syntasa environment using the import notebook option.