Reading Data from Snowflake with Notebooks – SYNTASA™

Introduction

Snowflake is a cloud-native data platform that separates compute from storage, letting you scale each independently. This guide shows you how to pull data out of Snowflake into your notebook using Python, and outlines two options, then demonstrates how to join web data with your offline (Snowflake) data

Pandas + Snowflake Connector
SQLAlchemy + Snowflake SQLAlchemy Dialect

Prerequisites

Snowflake Account - You need a valid Snowflake account, warehouse, database, schema, user, and role with SELECT privileges.
Python 3.7+ Environment - We’ll use Python 3.9 in examples, but any 3.7+ works.
Network Access - Please ensure your client machine or cluster can reach your Snowflake account URL (no VPN or firewall blocking).

Install Required Libraries:

Depending on your chosen method, you must install one of the required libraries.

pip install snowflake-connector-python pandas
pip install snowflake-sqlalchemy # for SQLAlchemy support

Option A: Pandas + Snowflake Connector

import snowflake.connector
import pandas as pd

conn = snowflake.connector.connect(
user='YOUR_USER',
password='YOUR_PASSWORD',
account='YOUR_ACCOUNT', # e.g. 'abc12345.eu-west-1'
warehouse='YOUR_WAREHOUSE', # e.g. 'COMPUTE_WH'
database='YOUR_DATABASE', # e.g. 'MY_DB'
schema='YOUR_SCHEMA', # e.g. 'PUBLIC'
role='YOUR_ROLE' # e.g. 'ANALYST_ROLE'
)

Tip:

Account should be just the locator (abc12345 or abc12345.eu-west-1), without https:// or extra suffixes.

Read Data into a DataFrame

sql = """
SELECT *
FROM YOUR_TABLE
WHERE YOUR_COLUMN = 'YOUR_CONDITION'
LIMIT 100
"""
# Execute & load into DataFrame
df = pd.read_sql(sql, conn)

# Peek at your data
df.head()

conn.close()|

Option B: SQLAlchemy + snowflake-sqlalchemy

from sqlalchemy import create_engine

engine = create_engine(
'snowflake://{user}:{password}@{account}/{database}/{schema}?warehouse={warehouse}&role={role}'.format(
user = 'YOUR_USER',
password = 'YOUR_PASS',
account = 'YOUR_ACCOUNT',
warehouse = 'YOUR_WAREHOUSE',
database = 'YOUR_DATABASE',
schema = 'YOUR_SCHEMA' ,
role = 'YOUR_ROLE'
)
)


df2 = pd.read_sql_table('sample_orders', engine)

Joining web and offline data

We now want to join the web (clickstream) and offline data. To do this, we need to:
Convert your Pandas DataFrame into a Spark DataFrame and register it as a temporary view

df_spark = spark.createDataFrame(df2)
df_snow_spark.createOrReplaceTempView("snowflake_view")

Join your Web and offline (snoflake_view) together

 # Replace `hive_table` with your actual Hive table name and the appropriate colunmns
joined = spark.sql("""
SELECT
s.*,
h.some_column
FROM snowflake_view s
JOIN hive_table h
ON s.order_id = h.order_id
WHERE s.order_date BETWEEN DATE '2025-01-01' AND DATE '2025-03-31'
""")

joined.show(10)

Conclusion

Whether you prefer Pandas, SQLAlchemy, or PySpark, Snowflake’s rich connector ecosystem makes it straightforward to read data right into the tool of your choice. Pick the method that aligns best with your data volume and processing needs:

Pandas for quick ad-hoc analysis.
SQLAlchemy for ORM-style workflows.

As a value add, we've demonstrated how to join your web and offline data, thus allowing you to interrogate and enrich your data and analysis further.

Reading Data from Snowflake.ipynb
6 KB Download

{[{category.name}]}