Introduction
Snowflake is a cloud-native data platform that separates compute from storage, letting you scale each independently. This guide shows you how to pull data out of Snowflake into your notebook using Python, and outlines two options, then demonstrates how to join web data with your offline (Snowflake) data
-
Pandas + Snowflake Connector
-
SQLAlchemy + Snowflake SQLAlchemy Dialect
Prerequisites
- Snowflake Account - You need a valid Snowflake account, warehouse, database, schema, user, and role with SELECT privileges.
- Python 3.7+ Environment - We’ll use Python 3.9 in examples, but any 3.7+ works.
- Network Access - Please ensure your client machine or cluster can reach your Snowflake account URL (no VPN or firewall blocking).
Install Required Libraries:
Depending on your chosen method, you must install one of the required libraries.
pip install snowflake-connector-python pandas
pip install snowflake-sqlalchemy # for SQLAlchemy support
Option A: Pandas + Snowflake Connector
import snowflake.connector
import pandas as pd
conn = snowflake.connector.connect(
user='YOUR_USER',
password='YOUR_PASSWORD',
account='YOUR_ACCOUNT', # e.g. 'abc12345.eu-west-1'
warehouse='YOUR_WAREHOUSE', # e.g. 'COMPUTE_WH'
database='YOUR_DATABASE', # e.g. 'MY_DB'
schema='YOUR_SCHEMA', # e.g. 'PUBLIC'
role='YOUR_ROLE' # e.g. 'ANALYST_ROLE'
)
Tip:
-
Account should be just the locator (
abc12345
orabc12345.eu-west-1
), withouthttps://
or extra suffixes.
Read Data into a DataFrame
sql = """
SELECT *
FROM YOUR_TABLE
WHERE YOUR_COLUMN = 'YOUR_CONDITION'
LIMIT 100
"""
# Execute & load into DataFrame
df = pd.read_sql(sql, conn)
# Peek at your data
df.head()
conn.close()|
Option B: SQLAlchemy + snowflake-sqlalchemy
from sqlalchemy import create_engine
engine = create_engine(
'snowflake://{user}:{password}@{account}/{database}/{schema}?warehouse={warehouse}&role={role}'.format(
user = 'YOUR_USER',
password = 'YOUR_PASS',
account = 'YOUR_ACCOUNT',
warehouse = 'YOUR_WAREHOUSE',
database = 'YOUR_DATABASE',
schema = 'YOUR_SCHEMA' ,
role = 'YOUR_ROLE'
)
)
df2 = pd.read_sql_table('sample_orders', engine)
Joining web and offline data
We now want to join the web (clickstream) and offline data. To do this, we need to:
Convert your Pandas DataFrame into a Spark DataFrame and register it as a temporary view
df_spark = spark.createDataFrame(df2)
df_snow_spark.createOrReplaceTempView("snowflake_view")
Join your Web and offline (snoflake_view) together
# Replace `hive_table` with your actual Hive table name and the appropriate colunmns
joined = spark.sql("""
SELECT
s.*,
h.some_column
FROM snowflake_view s
JOIN hive_table h
ON s.order_id = h.order_id
WHERE s.order_date BETWEEN DATE '2025-01-01' AND DATE '2025-03-31'
""")
joined.show(10)
Conclusion
Whether you prefer Pandas, SQLAlchemy, or PySpark, Snowflake’s rich connector ecosystem makes it straightforward to read data right into the tool of your choice. Pick the method that aligns best with your data volume and processing needs:
-
Pandas for quick ad-hoc analysis.
-
SQLAlchemy for ORM-style workflows.
As a value add, we've demonstrated how to join your web and offline data, thus allowing you to interrogate and enrich your data and analysis further.