What is the recommended best practice to connect to S3 using a Python Notebook?
CompletedCurious what the best practice for connecting to S3 in a Syntasa Python notebook so that I can easily port my code over to a Spark Processor with minimal need to modify. How is the configuration different when connection to a project internal S3 bucket versus a bucket that is external to my project?
-
Boto3 is required so please install that module and import similar to the below:
!pip install boto3
import boto3If the S3 bucket is within the same project as Syntasa, you can use the instance profile and credentials do not need to be passed in the code. An example is something like this where you are saving the boto3.client function call passing s3 to a variable:
s3 = boto3.client('s3')
If the bucket is outside of the Syntasa project or if you desire to use your own credentials, you'll first need to get some credentials to access the bucket. If you do not know how to do this, please contact your environment administrator. Then you'll need to pass the credentials to boto3.client similar to this where you are replacing the single quote text in the key and token information fields:
s3 = boto3.client(
's3',
aws_access_key_id='access key',
aws_secret_access_key='secret key',
aws_session_token='session token'
)Session credentials expire within a day or sooner, so you may need to get new credentials if you are running your code over time.
Then, to access resources use the variable, for example:
s3_client.create_bucket(Bucket=bucket_name)
For more understanding of Boto3, go to the official Boto3 documentation site.
Please sign in to leave a comment.
Comments
1 comment