As organizations move toward hybrid and multi-cloud architectures, it’s increasingly common to work with data spread across multiple cloud providers. Two popular services in this domain are Google Cloud Storage (GCS) and Amazon S3. Sometimes, you may need to move data between them — for instance, to centralize analytics, archive logs, or trigger pipelines hosted on a different platform.
In this article, we’ll walk through how to copy a CSV file from a GCS bucket to an Amazon S3 bucket using Python. This is done completely in memory — no local disk writing needed — making the process faster and more suitable for serverless or containerized environments.
Prerequisites
- Access to:
- GCS bucket and service account JSON key
- AWS S3 bucket and IAM credentials
- Required libraries:
- google-cloud-storage
- boto3
Step-by-Step Code Explanation
Here is the complete Python script with a detailed explanation:
from io import BytesIO
import boto3
BytesIO
: Creates an in-memory byte-stream to avoid writing files to disk.boto3:
AWS SDK for Python to interact with S3.
Install Required GCS Library
def install(package):
subprocess.call([sys.executable, "-m", "pip", "install", package])
install("google-cloud-storage")
- This function installs google-cloud-storage dynamically in case it’s not already available. This is useful in dynamic or serverless environments.
Load GCS Credentials from Connection Parameters
key_file = getConnectionParam("@InputConnection1", "keyfile")
key_file_path = '/tmp/key.json'
with open(key_file_path, 'w') as file:
file.write(key_file)
getConnectionParam()
is assumed to be a helper function that retrieves connection values securely from your system or orchestrator.- We write the GCP key file content to a temporary file. This is required by the Google SDK to authenticate.
Connecting to GCS and Downloading File to Memory
from google.cloud import storage
client = storage.Client.from_service_account_json(key_file_path)
bucket_name = getConnectionParam("@InputConnection1", "bucketName")
gcs_file_path = 'demo/random/Browser.csv'
bucket = client.get_bucket(bucket_name)
blob = bucket.blob(gcs_file_path)
file_stream = BytesIO()
blob.download_to_file(file_stream)
file_stream.seek(0)
- Authenticates using the service account JSON file.
- Downloads the file directly into memory using a
BytesIO
stream — fast and efficient.
Upload File to Amazon S3
s3_client = boto3.client('s3')
s3_bucket_name = 'syntasa-demo'
s3_path = 'demo/random/file_from_gcs.csv'
s3_client.upload_fileobj(file_stream, s3_bucket_name, s3_path)
- Initializes an S3 client using default environment credentials (can also use
aws_access_key_id
andaws_secret_access_key
explicitly). - Uploads the in-memory stream to the destination path in the specified S3 bucket.
Here is the complete code:
from io import BytesIO
import boto3
def install(package):
subprocess.call([sys.executable, "-m", "pip", "install", package])
# Install library to connect to gcs to download the file to local system
install("google-cloud-storage")
# Get keyfile from connection param
key_file = getConnectionParam("@InputConnection1","keyfile")
# Write the keyfile to local file
key_file_path = '/tmp/key.json'
# Open the file in write mode and write the content
with open(key_file_path, 'w') as file:
file.write(key_file)
# Create Connection using keyfile
from google.cloud import storage
client = storage.Client.from_service_account_json(key_file_path)
# Connect and download file to local
bucket_name = getConnectionParam("@InputConnection1","bucketName")
gcs_file_path = 'demo/random/Browser.csv'
bucket = client.get_bucket(bucket_name)
blob = bucket.blob(gcs_file_path)
##### Keep the file in-memory and write to s3 using boto3 ####
# Read the file as a stream
file_stream = BytesIO()
blob.download_to_file(file_stream)
file_stream.seek(0) # Reset stream position to beginning
# --- Initialize S3 client ---
s3_client = boto3.client('s3')
s3_bucket_name = 'syntasa-qa'
s3_path = 'demo/random/file_from_gcs.csv'
# Upload stream directly to S3
s3_client.upload_fileobj(file_stream, s3_bucket_name, s3_path)
print("File transferred from GCS to S3 successfully.")
Conclusion
With just a few lines of code, we’ve demonstrated how to securely and efficiently transfer a CSV file from GCS to S3 using Python. This can be integrated into larger workflows, run as a scheduled job, or triggered via API events.
Key Advantages:
- No temporary local file needed (except credentials).
- Secure, credential-based access.
- Efficient for serverless and automated jobs.