Overview:
- PostgreSQL is one of the most powerful and popular open source Database Management Systems.
- In Data Analysis, it is often required to write varying amount of data from a Python Program to a Relational Database Management System like PostgreSQL.
- Apart from applying various computational and statistical methods using pandas DataFrame, it is also possible to perform serialization operations like reading from and writing to a PostgreSQL table, reading dataframes from a MySQL database table and writing to it and similar other operations.
- To read from and write to SQL end-points like MySQL, PostgreSQL the pandas DataFrame class uses the SQLAlchemy engine.
Reading from a PostgreSQL table to a pandas DataFrame:
- The data to be analyzed is often from a data store like PostgreSQL table.
- Data from a PostgreSQL table can be read and loaded into a pandas DataFrame by calling the method DataFrame.read_sql() and passing the database connection obtained from the SQLAlchemy Engine as a parameter.
Example:
# Example python program to read data from a PostgreSQL table # and load into a pandas DataFrame import psycopg2 import pandas as pds from sqlalchemy import create_engine
# Create an engine instance alchemyEngine = create_engine('postgresql+psycopg2://test:@127.0.0.1', pool_recycle=3600);
# Connect to PostgreSQL server dbConnection = alchemyEngine.connect();
# Read data from PostgreSQL database table and load into a DataFrame instance dataFrame = pds.read_sql("select * from \"StudentScores\"", dbConnection);
pds.set_option('display.expand_frame_repr', False);
# Print the DataFrame print(dataFrame);
# Close the database connection dbConnection.close(); |
Output:
index Physics Chemistry Biology Mathematics Language 0 1211 57 61 76 56 67 1 1212 77 67 65 78 62 2 1213 65 71 56 63 70 |
Writing a pandas DataFrame to a PostgreSQL table:
- The following Python example, loads student scores from a list of tuples into a pandas DataFrame.
- It creates an SQLAlchemy Engine instance which will connect to the PostgreSQL on a subsequent call to the connect() method.
- Once a connection is made to the PostgreSQL server, the method to_sql() is called on the DataFrame instance , which persists the contents to a PostgreSQL table.
Example:
# Example Python program to serialize a pandas DataFrame # into a PostgreSQL table from sqlalchemy import create_engine import psycopg2 import pandas as pds
# Data - Marks scored studentScores = [(57, 61, 76, 56, 67), (77, 67, 65, 78, 62), (65, 71, 56, 63, 70) ]; # Create a DataFrame dataFrame = pds.DataFrame(studentScores, index=(1211,1212,1213), # Student ids as index columns=("Physics", "Chemistry", "Biology", "Mathematics", "Language") );
alchemyEngine = create_engine('postgresql+psycopg2://test:test@127.0.0.1/test', pool_recycle=3600); postgreSQLConnection = alchemyEngine.connect(); postgreSQLTable = "StudentScores";
try: frame = dataFrame.to_sql(postgreSQLTable, postgreSQLConnection, if_exists='fail'); except ValueError as vx: print(vx) except Exception as ex: print(ex) else: print("PostgreSQL Table %s has been created successfully."%postgreSQLTable); finally: postgreSQLConnection.close(); |
Output:
PostgreSQL Table StudentScores has been created successfully. |