{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "839143a8-8a8c-4a43-b9d3-fa9db499890b",
   "metadata": {},
   "source": [
    "\n",
    "<h3>Introduction </h3>\n",
    "\n",
    "Snowflake is a cloud-native data platform that separates compute from storage, letting you scale each independently. This guide shows you how to pull data out of Snowflake into your notebook using Python, and outlines two options, then demonstrates how to join web data with your offline (Snowflake) data. <br></br>\n",
    "1.Pandas + Snowflake Connector <br>\n",
    "2. SQLAlchemy + Snowflake SQLAlchemy Dialect"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "128acc06-206c-415f-b74c-6f112c3b9085",
   "metadata": {},
   "source": [
    "<h3>Prerequisites\n",
    "Snowflake Account</h3>\n",
    "\n",
    "You need a valid Snowflake account, warehouse, database, schema, user, and role with SELECT privileges.\n",
    "\n",
    "<h4>Python 3.7+ Environment</h4>\n",
    "\n",
    "We’ll use Python 3.9 in examples, but any 3.7+ works.\n",
    "\n",
    "<h4>Network Access</h4>\n",
    "\n",
    "Ensure your client machine or cluster can reach your Snowflake account URL (no VPN or firewall blocking).\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "30b6236b-e9dc-4109-9405-d8f3032b1c9e",
   "metadata": {},
   "source": [
    "<h3>Method 1:Pandas + Snowflake Connector</h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b1fea621-5786-4386-8500-122e2bfaf5f5",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install snowflake-connector-python pandas"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "aef7b180-721c-4b81-8839-215d2fb6544b",
   "metadata": {},
   "outputs": [],
   "source": [
    "import snowflake.connector\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8a58762e-72bd-4749-8b8a-1d43cd2b7c02",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 1) Fill in your own credentials here\n",
    "conn = snowflake.connector.connect(\n",
    "    user='YOUR_USER',\n",
    "    password='YOUR_PASS',\n",
    "    account='YOUR_ACCOUNT',  # ← just the account locator (no \"https://\", no extra \".snowflakecomputing.com\")\n",
    "    warehouse='YOUR_WAREHOUSE',\n",
    "    database='YOUR_DATABASE',\n",
    "    schema='YOUR_SCHEMA',\n",
    "    role='YOUR_ROLE'\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6971f7a7-a249-4178-a5b7-f576c96d2657",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define a query\n",
    "sql = \"\"\"\n",
    "SELECT *\n",
    "FROM YOUR_TABLE\n",
    "WHERE your_condition = 'example'\n",
    "LIMIT 100\n",
    "\"\"\" \n",
    "# Execute & load into DataFrame\n",
    "df = pd.read_sql(sql, conn)\n",
    "\n",
    "# Peek at your data\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dd174cb7-3e8e-4295-8d8e-029e90b6773b",
   "metadata": {},
   "outputs": [],
   "source": [
    "conn.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bf1421a2-366c-4521-b2cb-4cdc9cec432f",
   "metadata": {},
   "source": [
    "<h3> Method 2: SQLAlchemy URL</h3>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4aaf0048-ecdb-4557-9111-67a9f3e19486",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install snowflake-sqlalchemy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "29858b5a-e752-4810-bdac-f39361bc6a07",
   "metadata": {},
   "outputs": [],
   "source": [
    "from sqlalchemy import create_engine\n",
    "\n",
    "engine = create_engine(\n",
    "    'snowflake://{user}:{password}@{account}/{database}/{schema}?warehouse={warehouse}&role={role}'.format(\n",
    "    user      = 'YOUR_USER',\n",
    "    password  = 'YOUR_PASS',\n",
    "    account   = 'YOUR_ACCOUNT',\n",
    "    warehouse =  'YOUR_WAREHOUSE',\n",
    "    database  = 'YOUR_DATABASE',\n",
    "    schema    = 'YOUR_SCHEMA' ,\n",
    "    role      = 'YOUR_ROLE'\n",
    "    )\n",
    ")\n",
    "\n",
    "df2 = pd.read_sql_table('YOUR_TABLE', engine)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "07771396-1385-4652-824a-b99c2700d518",
   "metadata": {},
   "outputs": [],
   "source": [
    "df2.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "55bf0d7a-bd65-482a-8ba2-339a633151a4",
   "metadata": {},
   "source": [
    "<h3> Joining Web and Offline data </h3>  \n",
    "\n",
    "Thsi will allow you to join the data pulled from snowlfake with your web data thus allowing you to interigate your data better  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e501de5a-34c9-4977-b8c7-c7f5d8d05a53",
   "metadata": {},
   "outputs": [],
   "source": [
    "#This creates a temp view to allow the spark dataframe to be used in spark sql\n",
    "df_spark = spark.createDataFrame(df2)\n",
    "df_snow_spark.createOrReplaceTempView(\"snowflake_view\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4bd7484f-cd0a-4cd2-a214-377422a0e5ad",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Replace `hive_table` with your actual Hive table name and the appropriate colunmns\n",
    "joined = spark.sql(\"\"\"\n",
    "SELECT\n",
    "s.*,\n",
    "h.some_column\n",
    "FROM snowflake_view s\n",
    "JOIN hive_table h\n",
    "ON s.order_id = h.order_id\n",
    "WHERE s.order_date BETWEEN DATE '2025-01-01' AND DATE '2025-03-31'\n",
    "\"\"\")\n",
    "\n",
    "joined.show(10)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.9 (Recommended)",
   "language": "python",
   "name": "syntasa_kernel_python_39"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.16"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
