Get Snowflake Results in Pandas DataFrame

How-Tos FAQs
September 7, 2020

Get Started Transforming Your Data in Snowflake

Our team of advanced data analysts uses Python and R for their statistical analysis and visualization needs. They recently reached out requesting an easy way to query Snowflake using Python.

We found this to be a particularly interesting use case. While you could always use the tried and tested SQLAlchemy library to get your work done, there is a much better alternative, namely, “snowflake-connector-python.” Finally, we can extend this functionality to return the results as a Pandas DataFrame.

et snowflake results in Pandas DataFrame

In the above image, you will note that we are using the Snowflake connector library of python to establish a connection to Snowflake, much like SnowSQL works. We send our SQL queries to the required databases, tables, and other objects using the connection object. The results set is in the form of a cursor, which we will convert into a Pandas DataFrame.

_________________________________________________________________________________________

Our preferred code editor is VSCode. Its extensibility is something we find very useful, and we really enjoy working on/with it.

et snowflake results in Pandas DataFrame

Anyway, from VSCode, we have opened up a bash terminal and created a folder called “snowflake_to_pandas_df”.

et snowflake results in Pandas DataFrame

Next, we are going to create a virtual environment for our work, but this could be an optional step, depending on your needs. Read about it here .

et snowflake results in Pandas DataFrame

Activate the virtual environment and you will see the name of the virtual environment in parenthesis as above (snowflake_to_pandas_df). Also, ensure you are on Python version 3.6-3.9 (both inclusive). In our experience, 3.7.x works best. Also, ensure pip has the latest version.

Let’s first install the prerequisite Python libraries for this connector. This could take a minute to install.

et snowflake results in Pandas DataFrame

At the time of this writing, the latest Snowflake Connector version was 2.5.1. Please note, yours may vary. That said, we will be installing connector version 2.4.6, as we have had a stable experience with it. Also, we believe Snowflake implicitly recommends it by way of using it as part of their testing.

et snowflake results in Pandas DataFrame

Finally, let’s get the latest version of pandas in. With that said, we can conclude our initial setup for the Snowflake Connector.

______________________________________________________________________________

import sys
import os
import pandas as pd
import snowflake.connector as sf

Let’s start with importing the required libraries, specifically pandas and the Snowflake Connector.

# Creating a simple Python to Snowflake connector function
def snowflake_connection():
    # Initializing Snowflake Connectivity variables
    account = os.environ.get("snowflake_account")
   user = os.environ.get("snowflake_user")
    password = os.environ.get("snowflake_secret")
    warehouse = "COMPUTE_WH"
    role = "SYSADMIN"

We are going to create a function that returns the Snowflake connection object. We extract our Snowflake credentials using the os library, specifically the os.environ.get method.

# Within the same function definition

# Catching exceptions crudely
    try:
       connection = sf.connect(
            account=account,
            user=user,
            password=password,
            warehouse=warehouse,
            role=role,
        )
    except Exception as error:
        error = sys.exc_info()[0]
        message = sys.exc_info()[1]
        print(f"Error: {error}\nMessage: {message}")

Now, let’s pass the above-extracted environment variables to the Snowflake Connector and establish a connection to Snowflake. Here we try to capture the errors in a rather crude way, but feel free to go into the details of capturing exceptions in a better way.

# Returning connection object
    return connection

Finally, we can conclude our function is returning a Snowflake connection object.

Let’s create some sample data within our Snowflake instance.

CREATE OR REPLACE DATABASE TEST_1;
USE DATABASE TEST_1;
USE SCHEMA PUBLIC;

First off, let’s create a “TEST_1” database and use the “PUBLIC” schema to create our table.

CREATE OR REPLACE TABLE ORDERS
(
id NUMBER,
date DATE,
customer_name STRING,
city STRING,
amount NUMBER(10, 2)
);

Create a simple “ORDERS” table with the above-mentioned 5 columns in the PUBLIC schema.

INSERT INTO orders
SELECT 1,'08/01/2021','Mike Engeseth','Austin',10000
UNION ALL  
SELECT 2,'08/02/2021','Mike Jones','Dallas',20000
UNION ALL  
SELECT 3,'08/03/2021','John Engeseth','Boston',5000
UNION ALL  
SELECT 4,'08/04/2021','Michael Engeseth','Austin',15000
UNION ALL  
SELECT 5,'08/05/2021','Mike Coulthard','Boston',7000
UNION ALL  
SELECT 6,'08/06/2021','Paul Engeseth','Austin',25000
UNION ALL
SELECT 7,'08/10/2021','Andrew Engeseth','Dallas',15000
UNION ALL  
SELECT 8,'08/11/2021','Mike Shelby','Dallas',2000
UNION ALL  
SELECT 9,'08/20/2021','Robert Engeseth','Boston',1000
UNION ALL  
SELECT 10,'08/25/2021','Peter Engeseth','Austin',500
;

Let’s load 10 records into our table.

SELECT * FROM orders;

et snowflake results in Pandas DataFrame

Finally, that is how your dataset should look like.

_________________________________________________________________________________________

# Quick note, we’re out of the snowflake_connection function now

# Sample SQL query
sql_query = "SELECT * FROM TEST_1.PUBLIC.orders order by city;"

Let’s use a simple SQL query and get the results set that we desire within a pandas dataframe.

# Establish a connection and pull SQL query results
try:
    get_connection = snowflake_connection()
    cur = get_connection.cursor()
    cur.execute(sql_query)
    results = cur.fetchall()
except Exception as error:
    error = sys.exc_info()[0]
    message = sys.exc_info()[1]
    print(f"Error: {error}\nMessage: {message}")

The get_connection variable executes the function snowflake_connection(). As defined earlier, we don’t need to pass any parameters. The credentials to connect to Snowflake are taken from environment variables loaded into Python via the very convenient os library.

Next, we create a Snowflake results cursor that willl temporarily hold our results set within the cursor variable cur.

Then, we execute a SQL query (using fetchall) and store the results within the “results” variable.

We’ve wrapped all of the above steps within a crude try except block to catch any untoward exceptions.

# Getting the results into a pandas dataframe

sql_results = pd.DataFrame(
    results,
    columns=[col[0] for col in cur.description],
)

Finally, we call the Pandas DataFrame method and get the results perfectly aligned within a tabular/structured format. Please note, Pandas doesn’t automatically extract the columns from the Snowflake cursor object.

_________________________________________________________________________________________

# Importing required libraries
import sys
import os
import pandas as pd
import snowflake.connector as sf


# Creating a simple Python to Snowflake connector function
def snowflake_connection():
    # Initializing Snowflake Connectivity variables
    account = os.environ.get("snowflake_account")
    user = os.environ.get("snowflake_user")
    password = os.environ.get("snowflake_secret")
    warehouse = "COMPUTE_WH"
    role = "SYSADMIN"
    # Catching exceptions crudely
    try:
        connection = sf.connect(
            account=account,
            user=user,
            password=password,
            warehouse=warehouse,
            role=role,
        )
    except Exception as error:
        error = sys.exc_info()[0]
        message = sys.exc_info()[1]
        print(f"Error: {error}\nMessage: {message}")
    # Returning connection object
    return connection


# Sample SQL query
sql_query = "SELECT * FROM TEST_1.PUBLIC.orders order by city;"

# Establish a connection and pull SQL query results
try:
    get_connection = snowflake_connection()
    cur = get_connection.cursor()
    cur.execute(sql_query)
    results = cur.fetchall()
except Exception as error:
    error = sys.exc_info()[0]
    message = sys.exc_info()[1]
    print(f"Error: {error}\nMessage: {message}")

sql_results = pd.DataFrame(
    results,
    columns=[col[0] for col in cur.description],
)

et snowflake results in Pandas DataFrame

Putting it all together, we’ve used the snowflake-python-connector to establish a connection with Snowflake, retrieve the results within a Snowflake Cursor object and convert those results into a structured Pandas DataFrame object.