Attributes of pandas DataFrame class

Overview:

The Python Data Analysis library pandas, provides the DataFrame class as a container for storing and manipulating two-dimensional data.  In a nutshell a pandas DataFrame is a two-dimensional array with versatile computing capabilities. The DataFrame class encapsulates a two-dimensional array – a numpy.ndarray, along with various other properties (attributes) and behavior (methods). This article provides an understanding of the pandas DataFrame by going through the attributes of the DataFrame class. The other articles in this section explore DataFrame capabilities through its methods.

Indexes of DataFrame class:

A DataFrame is a data container. An index helps to retrieve the data by specifying the location of the data. A DataFrame has two types of indexes: One is the row index and another type is a set of column indexes. The DataFrame attribute index returns the row index and the attribute columns returns the column indexes. The actual values of the DataFrame stored as an ndarray are provided by the attribute values.

Retrieving the indexes of a DataFrame:

# Python program on Indexes of a pandas DataFrame

import pandas as pd

 

# Data to load into a pandas DataFrame

data = [(11, 22, 33),

        (44, 55, 66),

        (77, 88, 99)];

 

# Create a pandas DataFrame instance

dataFrame = pd.DataFrame(data=data);

 

# Get the row and column indexes

rowIndex = dataFrame.index;

colIndex = dataFrame.columns;

 

print("Row Index:");

print(rowIndex);

 

print("Column Index:");

print(colIndex);

 

print("Data values stored by the DataFrame:");

print(dataFrame.values);

 

print("Whole DataFrame with row indexes, columns and values together:");

print(dataFrame.values);

Output:

Row Index:

RangeIndex(start=0, stop=3, step=1)

Column Index:

RangeIndex(start=0, stop=3, step=1)

Data values stored by the DataFrame:

[[11 22 33]

 [44 55 66]

 [77 88 99]]

Whole DataFrame with row indexes, columns and values together:

[[11 22 33]

 [44 55 66]

 [77 88 99]]

Data types of elements in a DataFrame:

  • A DataFrame has several columns.
  • Each column can hold different types of elements. One column of a DataFrame can hold all integers while another column have all its elements as string literals.
  • A column itself can hold objects of several types. It is possible for a DataFrame column to have some of its members as integers and some as floats and the remaining elements as an instance of a class like complex.
  • A column with heterogeneous data elements will have its type as Object.
  • The Python example below has a DataFrame with first column as all floating point numbers, second column as all integers and the last column with an integer, a floating-point number and a complex number.
  • The DataFrame attribute dtypes returns type of each column contained in a DataFrame series.

Example:

import pandas as pd

 

data = [(1.0, 2, 3),

       (0.1,  1, '.3'),

       (2.0,  3, complex(2, -1))];

      

dataFrame = pd.DataFrame(data=data);

 

print("Contents of DataFrame:");

print(dataFrame);

 

print("Types of the columns of the DataFrame:");

print(dataFrame.dtypes);

 

Output:

Contents of DataFrame:

     0  1       2

0  1.0  2       3

1  0.1  1      .3

2  2.0  3  (2-1j)

Types of the columns of the DataFrame:

0    float64

1      int64

2     object

dtype: object

 

Axes, Shape and Number of Dimensions:

  • The axes attribute of DataFrame class contains both the row axis index and the column axis index.
  • The ndim attribute returns the number of dimensions, which is 2 for a DataFrame instance. 
  • The shape attribute has the shape of the 2 dimensional matrix/DataFrame as a tuple. e.g., A shape of (2,1) means a DataFrame instance with 2 rows and 2 columns, a shape (5,1) means a DataFrame instance with 5 rows and 1 column.
  • The size attribute provides the number of elements present in the DataFrame.

 

Example:

import pandas as pd

 

data = [("A", "B", "C"),

        ("D", "E", "F"),

        ("G", "H", "I")];

dataFrame = pd.DataFrame(data=data);

print("Contents of the DataFrame");

print(dataFrame);

print("Shape of the DataFrame:");

print(dataFrame.shape);

 

print("DataFrame Axes:");

print(dataFrame.axes);

 

print("Number of Dimensions present in the DataFrame:");

print(dataFrame.ndim);

 

print("Shape of the DataFrame:");

print(dataFrame.shape);

 

print("Number of elements in the DataFrame:");

print(dataFrame.size);

 

data = [(1.0),

        (2.0),       

        (3.0),       

        (4.0),

        (5.0)];

 

dataFrame = pd.DataFrame(data=data);

print("New DataFrame:");

print(dataFrame);

 

print("Shape of the new DataFrame:");

print(dataFrame.shape);

 

print("Number of elements in the new DataFrame:");

print(dataFrame.size);

 

Output:

Contents of the DataFrame:

   0  1  2

0  A  B  C

1  D  E  F

2  G  H  I

Shape of the DataFrame:

(3, 3)

DataFrame Axes:

[RangeIndex(start=0, stop=3, step=1), RangeIndex(start=0, stop=3, step=1)]

Number of Dimensions present in the DataFrame:

2

Shape of the DataFrame:

(3, 3)

Number of elements in the DataFrame:

9

New DataFrame:

     0

0  1.0

1  2.0

2  3.0

3  4.0

4  5.0

Shape of the new DataFrame:

(5, 1)

 

Check a DataFrame is not empty:

The DataFrame attribute empty comes handy when it is required check whether the DataFrame is empty.

 

Example:

import pandas as pd

 

data = [("A", "B"),

        ("C", "D")];

 

dataFrame = pd.DataFrame(data=data);

 

print("The DataFrame:");

print(dataFrame);

print("Is the DataFrame empty:%s"%dataFrame.empty);

 

Output:

   The DataFrame:

   0  1

0  A  B

1  C  D

Is the DataFrame empty:False

 


Copyright 2024 © pythontic.com