Converting a pandas DataFrame into a python dictionary

Overview:

  • A pandas DataFrame can be converted into a Python dictionary using the DataFrame instance method to_dict(). The output can be specified of various orientations using the parameter orient.
  • In dictionary orientation, for each column of the DataFrame the column value is listed against the row label in a dictionary. All these dictionaries are wrapped in another dictionary, which is indexed using column labels. Dictionary orientation is specified with the string literal “dict” for the parameter orient. Dictionary orientation is the default orientation for the conversion output.
  • In list orientation, each column is made a list and the lists are added to a dictionary against the column labels. List orientation is specified with the string literal “list” for the parameter orient.
  • In series orientation, each column is made a pandas Series, and the series instances are indexed against the row labels in the returned dictionary object. Series orientation is specified with the string literal “series” for the parameter orient.
  • In split orientation, each row is made a list and they are wrapped in another list and indexed with the key "data" in the returned dictionary object. The row labels are stored in a list against the key "index". The columns labels are stored in a list against the key "columns". Split orientation is specified with the string literal “split” for the parameter orient.
  • In records orientation, each column is made a dictionary where the column elements are stored against the column name. All the dictionaries are returned as a list. Records orientation is specified with the string literal “records” for the parameter orient.
  • In index orientation, each column is made a dictionary where the column elements are stored against the column name. All the dictionaries are returned in a dictionary, which is indexed by the row labels. Index orientation is specified with the string literal “index” for the parameter orient.

Example – DataFrame to dictionary conversion in dict mode:

# Example Python program that converts a pandas DataFrame into a Python dictionary

import pandas as pds

 

# Data

data = [(1,2,3),

        (4,5,6),

        (7,8,9)];

 

# Create a DataFrame       

dataFrame = pds.DataFrame(data, index=("R1", "R2", "R3"), columns=("C1", "C2", "C3"));

print("Contents of the DataFrame:");

print(dataFrame);

 

# Convert the DataFrame to Series

dictionaryObject = dataFrame.to_dict();

print("DataFrame as a dictionary:");

print(dictionaryObject);

 

Output:

Contents of the DataFrame:

    C1  C2  C3

R1   1   2   3

R2   4   5   6

R3   7   8   9

DataFrame as a dictionary:

{'C1': {'R1': 1, 'R2': 4, 'R3': 7}, 'C2': {'R1': 2, 'R2': 5, 'R3': 8}, 'C3': {'R1': 3, 'R2': 6, 'R3': 9}}

 

Example – DataFrame to dictionary conversion in list mode:

# Example Python program that converts a pandas DataFrame into a

# Python dictionary in list mode

import pandas as pds

 

# Data

dailyTemperature = {"01/Nov/2019": [65, 62],

                    "02/Nov/2019": [62, 60],

                    "03/Nov/2019": [61, 60],

                    "04/Nov/2019": [62, 60],

                    "05/Nov/2019": [64, 62]

                   };

 

# Create DataFrame

dataFrame = pds.DataFrame(dailyTemperature, index=("max", "min"));

print("Daily temperature from DataFrame:");

print(dataFrame);

 

# Convert the DataFrame to dictionary

dictionaryInstance = dataFrame.to_dict(orient="list");

print("DataFrame as a dictionary(List orientation):");

print(dictionaryInstance);

 

Output:

Daily temperature from DataFrame:

     01/Nov/2019  02/Nov/2019  03/Nov/2019  04/Nov/2019  05/Nov/2019

max           65           62           61           62           64

min           62           60           60           60           62

DataFrame as a dictionary(List orientation):

{'01/Nov/2019': [65, 62], '02/Nov/2019': [62, 60], '03/Nov/2019': [61, 60], '04/Nov/2019': [62, 60], '05/Nov/2019': [64, 62]}

 

Example - Making a dictionary of <key vs Series> entries from a pandas DataFrame:

# Example Python program that makes a Python dictionary
# containing key-value pairs of <key, pandas.Series>
import pandas as pds

# Example Data
fruitCalories = [("Apple",   52, 0.2),
                 ("Orange",  47,  0.1),
                 ("Pineapple", 50, 0.1),
                 ("Avocado", 160, 15.0),
                 ("Kiwi",    61,  0.5)];

columnHeaders= ["Fruit", "Calories", "Fat content"];

# Create a DataFrame
fruitData       = pds.DataFrame(data = fruitCalories, columns=columnHeaders);

# Obtain a dictionary with  entries <key, pandas.Series>
nutriValsAsDict = fruitData.to_dict(orient='series');

print("Retrieving individual series from the dictionary:");
for keys in nutriValsAsDict:
    print(nutriValsAsDict[keys]);
    print(type(nutriValsAsDict[keys]));

 

Output:

Retrieving individual series from the dictionary:

0        Apple

1       Orange

2    Pineapple

3      Avocado

4         Kiwi

Name: Fruit, dtype: object

<class 'pandas.core.series.Series'>

0     52

1     47

2     50

3    160

4     61

Name: Calories, dtype: int64

<class 'pandas.core.series.Series'>

0     0.2

1     0.1

2     0.1

3    15.0

4     0.5

Name: Fat content, dtype: float64

<class 'pandas.core.series.Series'>

Example - Create a dictionary from a DataFrame that stores index, columns, data as separate entries:

# Example Python program that creates a dictionary 
# from a DataFrame which will have index, column labels
# and data as separate entries  
import pandas as pds
 
riverLengths = [("Nile", "6650", 4130),
                ("Amazon", 6400, 3976),
                ("Yangtze", 6300, 3917),
                ("Mississippi", 6275, 3902),
                ("Yenisei", 5539, 3445)
                ];

columns     = ["Name of the River", "Length(KMs)", "Length(Miles)"];

print("DataFrame:");
riverData   = pds.DataFrame(data = riverLengths, columns = columns);
print(riverData);

print("DataFrame as a dictionary with separate entries for index, column labels and data:");
riverDataDict = riverData.to_dict(orient="split");
print(riverDataDict);

Output:

DataFrame:

  Name of the River Length(KMs)  Length(Miles)

0              Nile        6650           4130

1            Amazon        6400           3976

2           Yangtze        6300           3917

3       Mississippi        6275           3902

4           Yenisei        5539           3445

DataFrame as a dictionary with separate entries for index, column labels and data:

{'index': [0, 1, 2, 3, 4], 'columns': ['Name of the River', 'Length(KMs)', 'Length(Miles)'], 'data': [['Nile', '6650', 4130], ['Amazon', 6400, 3976], ['Yangtze', 6300, 3917], ['Mississippi', 6275, 3902], ['Yenisei', 5539, 3445]]}

Example - DataFrame records stored as <Row index : <Column-name Vs Cell-value>>:

# Example Python program that creates a dictionary of 
# dictionaries from a pandas DataFrame.
# Returned dictionary stores key-value pairs in the
# form of <row index vs <column-name vs cell-value>>
import pandas as pds

# Data
countryData = [("Russia", "Moscow", 6601670, 146171015),
               ("Canada", "Ottawa", 3855100, 38048738),
               ("China", "Beijing", 3705407, 1400050000),
               ("United States of America", "Washington, D.C.", 3796742, 331449281),
               ("Brazil", "Brasília", 3287956, 210147125)]

columnHeaders = ["Country", "Capital", "Area(Sq.Miles)", "Population"]; 

# Create a pandas DataFrame
df = pds.DataFrame(data = countryData, columns=columnHeaders);
print("DataFrame:");
print(df);

# Obtain data in the form of a dictionary of dictionaries
print("DataFrame in records form <row index vs <column-name vs cell-value>>:");
recs = df.to_dict(orient="index");
print(recs);

Output:

DataFrame:

                    Country           Capital  Area(Sq.Miles)  Population

0                    Russia            Moscow         6601670   146171015

1                    Canada            Ottawa         3855100    38048738

2                     China           Beijing         3705407  1400050000

3  United States of America  Washington, D.C.         3796742   331449281

4                    Brazil          Brasília         3287956   210147125

DataFrame in records form <row index vs <column-name vs cell-value>>:

{0: {'Country': 'Russia', 'Capital': 'Moscow', 'Area(Sq.Miles)': 6601670, 'Population': 146171015}, 1: {'Country': 'Canada', 'Capital': 'Ottawa', 'Area(Sq.Miles)': 3855100, 'Population': 38048738}, 2: {'Country': 'China', 'Capital': 'Beijing', 'Area(Sq.Miles)': 3705407, 'Population': 1400050000}, 3: {'Country': 'United States of America', 'Capital': 'Washington, D.C.', 'Area(Sq.Miles)': 3796742, 'Population': 331449281}, 4: {'Country': 'Brazil', 'Capital': 'Brasília', 'Area(Sq.Miles)': 3287956, 'Population': 210147125}}

 


Copyright 2024 © pythontic.com