How to Copy a CSV File from Google Cloud Storage to Amazon S3 Using Python – SYNTASA™

As organizations move toward hybrid and multi-cloud architectures, it’s increasingly common to work with data spread across multiple cloud providers. Two popular services in this domain are Google Cloud Storage (GCS) and Amazon S3. Sometimes, you may need to move data between them — for instance, to centralize analytics, archive logs, or trigger pipelines hosted on a different platform.

In this article, we’ll walk through how to copy a CSV file from a GCS bucket to an Amazon S3 bucket using Python. This is done completely in memory — no local disk writing needed — making the process faster and more suitable for serverless or containerized environments.

Prerequisites

Access to:
- GCS bucket and service account JSON key
- AWS S3 bucket and IAM credentials
Required libraries:
- google-cloud-storage
- boto3

Step-by-Step Code Explanation

Here is the complete Python script with a detailed explanation:

from io import BytesIO
import boto3

BytesIO: Creates an in-memory byte-stream to avoid writing files to disk.
boto3: AWS SDK for Python to interact with S3.

Install Required GCS Library

def install(package):
subprocess.call([sys.executable, "-m", "pip", "install", package])
install("google-cloud-storage")

This function installs google-cloud-storage dynamically in case it’s not already available. This is useful in dynamic or serverless environments.

Load GCS Credentials from Connection Parameters

key_file = getConnectionParam("@InputConnection1", "keyfile")
key_file_path = '/tmp/key.json'
with open(key_file_path, 'w') as file:
file.write(key_file)

getConnectionParam() is assumed to be a helper function that retrieves connection values securely from your system or orchestrator.
We write the GCP key file content to a temporary file. This is required by the Google SDK to authenticate.

Connecting to GCS and Downloading File to Memory

from google.cloud import storage

client = storage.Client.from_service_account_json(key_file_path)
bucket_name = getConnectionParam("@InputConnection1", "bucketName")
gcs_file_path = 'demo/random/Browser.csv'
bucket = client.get_bucket(bucket_name)
blob = bucket.blob(gcs_file_path)
file_stream = BytesIO()
blob.download_to_file(file_stream)
file_stream.seek(0)

Authenticates using the service account JSON file.
Downloads the file directly into memory using a BytesIO stream — fast and efficient.

Upload File to Amazon S3

s3_client = boto3.client('s3')
s3_bucket_name = 'syntasa-demo'
s3_path = 'demo/random/file_from_gcs.csv'
s3_client.upload_fileobj(file_stream, s3_bucket_name, s3_path)

Initializes an S3 client using default environment credentials (can also use aws_access_key_id and aws_secret_access_key explicitly).
Uploads the in-memory stream to the destination path in the specified S3 bucket.

Here is the complete code:

from io import BytesIO
import boto3

def install(package):
subprocess.call([sys.executable, "-m", "pip", "install", package])

# Install library to connect to gcs to download the file to local system
install("google-cloud-storage")

# Get keyfile from connection param
key_file = getConnectionParam("@InputConnection1","keyfile")

# Write the keyfile to local file
key_file_path = '/tmp/key.json'

# Open the file in write mode and write the content
with open(key_file_path, 'w') as file:
file.write(key_file)

# Create Connection using keyfile
from google.cloud import storage
client = storage.Client.from_service_account_json(key_file_path)

# Connect and download file to local
bucket_name = getConnectionParam("@InputConnection1","bucketName")
gcs_file_path = 'demo/random/Browser.csv'

bucket = client.get_bucket(bucket_name)
blob = bucket.blob(gcs_file_path)

##### Keep the file in-memory and write to s3 using boto3 ####
# Read the file as a stream
file_stream = BytesIO()
blob.download_to_file(file_stream)
file_stream.seek(0) # Reset stream position to beginning

# --- Initialize S3 client ---
s3_client = boto3.client('s3')
s3_bucket_name = 'syntasa-qa'
s3_path = 'demo/random/file_from_gcs.csv'

# Upload stream directly to S3
s3_client.upload_fileobj(file_stream, s3_bucket_name, s3_path)

print("File transferred from GCS to S3 successfully.")

Conclusion
With just a few lines of code, we’ve demonstrated how to securely and efficiently transfer a CSV file from GCS to S3 using Python. This can be integrated into larger workflows, run as a scheduled job, or triggered via API events.

Key Advantages:

No temporary local file needed (except credentials).
Secure, credential-based access.
Efficient for serverless and automated jobs.