Pickling and Unpickling of pandas DataFrame, Series objects

Overview:

  • In Python, pickling is the process of serialising an object into a disk file or buffer.
  • Unpickling recreates an object from a file, network or a buffer and introduces it to the namespace of a Python program.
  • Any Python object can be pickled and unpickled through the dump(), load() mechanisms of the Python's pickle module.
  • The pandas DataFrame class provides the method to_pickle() to easily deal with the process of pickling. Unpickling is enabled through the read_pickle() method of the pandas module.

Example - Pickle a pandas DataFrame object:

# Example Python program that pickles a pandas
# DataFrame object into a disk file 
import pandas as pds

# Data
columnLabels = ["City", "Latitude", "Longitude"];
locData      = [("New York", 40.71, -74.00),
                ("Los Angeles", 34.05, -118.24),
                ("London", 51.50, -0.12),
                ("Paris", 48.85,  2.35),
                ("Dubai", 25.26,  55.29),
                ("Hong Kong", 22.35, 114.18),
                ("Seoul", 37.56, 126.97),
                ("Tokyo", 35.68, 139.75)];

# Create a pandas DataFrame
df = pds.DataFrame(data=locData, columns=columnLabels);

# Pickle the DataFrame into a diskfile
df.to_pickle("/examples/locdata.pkl");

Output of vi locdata.pkl:

<80>^E<95>=^D^@^@^@^@^@^@<8c>^Qpandas.core.frame<94><8c>        DataFrame<94><93><94>)<81><94>}<94>(<8c>^D_mgr<94><8c>^^pandas.core.internals.managers<94><8c>^LBlockManager<94><93><94>)<81><94>(]<94>(<8c>^Xpandas.core.indexes.base<94><8c>

_new_Index<94><93><94>h^K<8c>^EIndex<94><93><94>}<94>(<8c>^Ddata<94><8c>^Unumpy.core.multiarray<94><8c>^L_reconstruct<94><93><94><8c>^Enumpy<94><8c>^Gndarray<94><93><94>K^@<85><94>C^Ab<94><87><94>R<94>(K^AK^C<85><94>h^U<8c>^Edtype<94><93><94><8c>^BO8<94><89><88><87><94>R<94>(K^C<8c>^A|<94>NNNJÿÿÿÿJÿÿÿÿK?t<94>b<89>]<94>(<8c>^DCity<94><8c>^HLatitude<94><8c>   Longitude<94>et<94>b<8c>^Dname<94>Nu<86><94>R<94>h^M<8c>^Ypandas.core.indexes.range<94><8c>

RangeIndex<94><93><94>}<94>(h)N<8c>^Estart<94>K^@<8c>^Dstop<94>K^H<8c>^Dstep<94>K^Au<86><94>R<94>e]<94>(<8c>^Rnumpy.core.numeric<94><8c>^K_frombuffer<94><93><94>(<96><80>^@^@^@^@^@^@^@{^T®GáZD@fffff^FA@^@^@^@^@^@ÀI@ÍÌÌÌÌlH@Ãõ(\<8f>B9@<9a><99><99><99><99>Y6@Ház^T®ÇB@×£p=

×A@^@^@^@^@^@<80>RÀ<8f>Âõ(\<8f>]À¸^^<85>ëQ¸¾¿ÍÌÌÌÌÌ^B@<85>ëQ¸^^¥K@ìQ¸^^<85><8b>\@®Gáz^T¾_@^@^@^@^@^@xa@<94>h^^<8c>^Bf8<94><89><88><87><94>R<94>(K^C<8c>^A<<94>NNNJÿÿÿÿJÿÿÿÿK^@t<94>bK^BK^H<86><94><8c>^AC<94>t<94>R<94>h^Th^WK^@<85><94>h^Y<87><94>R<94>(K^AK^AK^H<86><94>h!<89>]<94>(<8c>^HNew York<94><8c>^KLos Angeles<94><8c>^FLondon<94><8c>^EParis<94><8c>^EDubai<94><8c> Hong Kong<94><8c>^ESeoul<94><8c>^ETokyo<94>et<94>be]<94>(h^Mh^O}<94>(h^Qh^Th^WK^@<85><94>h^Y<87><94>R<94>(K^AK^B<85><94>h!<89>]<94>(h&h'et<94>bh)Nu<86><94>R<94>h^Mh^O}<94>(h^Qh^Th^WK^@<85><94>h^Y<87><94>R<94>(K^AK^A<85><94>h!<89>]<94>h%at<94>bh)Nu<86><94>R<94>e}<94><8c>^F0.14.1<94>}<94>(<8c>^Daxes<94>h

@                                           

Example - Unpickle a pandas DataFrame object:

# Example Python program that pickles a pandas
# DataFrame object into a disk file 
import pandas as pds

# Storage path of a pickled object
picklePath = "/examples/locdata.pkl";

# Read the pickle and make a pandas object...here it is a DataFrame
df = pds.read_pickle(picklePath);

# Print the contents of the unpickled DataFrame
print(type(df));
print(df);

Output:

<class 'pandas.core.frame.DataFrame'>

          City  Latitude  Longitude

0     New York     40.71     -74.00

1  Los Angeles     34.05    -118.24

2       London     51.50      -0.12

3        Paris     48.85       2.35

4        Dubai     25.26      55.29

5    Hong Kong     22.35     114.18

6        Seoul     37.56     126.97

7        Tokyo     35.68     139.75

Example - Pickle a pandas Series object:

# Example Python program that pickles a 
# pandas Series instance 
import pandas as pds

# A list of measurements converted into a pandas Series  
measurements    = [60, 61.5, 62, 61.6, 62.5, 62.5, 62.2];
series          = pds.Series(data=measurements);

# Pickle the Series

seriesPicklePath = "/examples/msmts.pkl";
series.to_pickle(seriesPicklePath);

# Print the contents of the pickle file
pickleBinary = '';
with open(seriesPicklePath, mode='rb') as pickleFile:
    pickleBinary = pickleFile.read();

print(pickleBinary);    
    

Output:

b'\x80\x05\x95q\x02\x00\x00\x00\x00\x00\x00\x8c\x12pandas.core.series\x94\x8c\x06Series
\x94\x93\x94)\x81\x94}\x94(\x8c\x04_mgr\x94\x8c\x1epandas.core.internals.managers\x94\x8c
\x12SingleBlockManager\x94\x93\x94)\x81\x94(]\x94\x8c\x18pandas.core.indexes.base\x94\x8c
\n_new_Index\x94\x93\x94\x8c\x19pandas.core.indexes.range\x94\x8c\nRangeIndex\x94\x93\x94}
\x94(\x8c\x04name\x94N\x8c\x05start\x94K\x00\x8c\x04stop\x94K\x07\x8c\x04step\x94K\x01u
\x86\x94R\x94a]\x94\x8c\x12numpy.core.numeric\x94\x8c\x0b_frombuffer\x94\x93\x94
(\x968\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00N@\x00\x00\x00\x00\x00\xc0N@
\x00\x00\x00\x00\x00\x00O@\xcd\xcc\xcc\xcc\xcc\xccN@\x00\x00\x00\x00\x00@O@\x00\x00
\x00\x00\x00@O@\x9a\x99\x99\x99\x99\x19O@\x94\x8c\x05numpy\x94\x8c\x05dtype\x94
\x93\x94\x8c\x02f8\x94\x89\x88\x87\x94R\x94(K\x03\x8c\x01<\x94NNNJ\xff\xff\xff\xffJ\xff
\xff\xff\xffK\x00t\x94bK\x07\x85\x94\x8c\x01C\x94t\x94R\x94a]\x94h\rh\x10}\x94
(h\x12Nh\x13K\x00h\x14K\x07h\x15K\x01u\x86\x94R\x94a}\x94\x8c\x060.14.1\x94}\x94
(\x8c\x04axes\x94h\n\x8c\x06blocks\x94]\x94}\x94(\x8c\x06values\x94h(\x8c\x08mgr_locs
\x94\x8c\x08builtins\x94\x8c\x05slice\x94\x93\x94K\x00K\x07K\x01\x87\x94R\x94uaust
\x94b\x8c\x04_typ\x94\x8c\x06series\x94\x8c\t_metadata\x94]\x94h\x12a\x8c\x05attrs\x94}
\x94\x8c\x06_flags\x94}\x94\x8c\x17allows_duplicate_labels\x94\x88sh\x12Nub.

Example - Unpickle a pandas Series object: 

# Example Python program that unpickles
# a pandas Series object
import pandas as pds

# Pickled object storage path
pickleStoragePath   = "/valli/pythonprogs/msmts.pkl";

# Unpickle a pickled pandas Series object
unpickled           = pds.read_pickle(pickleStoragePath);

# Print the unpickled series 
print(type(unpickled));
print(unpickled);

Output:

<class 'pandas.core.series.Series'>

0    60.0

1    61.5

2    62.0

3    61.6

4    62.5

5    62.5

6    62.2

dtype: float64

 


Copyright 2024 © pythontic.com