The str_len() function of NumPy strings module

Overview:

The str_len() function of NumPy computes the length of the string for each of the strings present in a NumPy array-like containing bytes, str_ and StringDType as elements. The lengths are returned as an ndarray. When the parameter is a scalar an integer is returned as return value.
For strings constructed using bytes the length is number of bytes.
For strings constructed using unicode the length is number of unicode code points. A unicode character (as per the mapping) may require more than one code point to represent a character. Hence, the total length of an unicode string is not the number of characters present in it but the total number of code points present in the string.

Example 1 - Finding lengths of unicode strings:

In this example the strings can have a maximum length of ten. If the length of the string exceeds ten the remaining characters are simply discarded. To address this the variable length string type StringDType can be used as in the case of Example 3.

# Example Python program that finds the length of unicode strings
# present in an ndarray
import numpy

# Create a NumPy array of unicode strings
stringArray = numpy.ndarray(shape = (2, 2), dtype = numpy.dtype('U10'))

# Hello in different languages
stringArray[0][0] = "Hello" # In English
stringArray[0][1] = "こんにちは" # In Japanese
stringArray[1][0] = "안녕하세요" # In Korean
stringArray[1][1] = "สวัสดี" # In Thai
print("An ndarray of unicode strings:")
print(stringArray)

lengths = numpy.strings.str_len(stringArray)
print("Length of unicode strings:")
print(lengths)

Output:

An ndarray of unicode strings:

[['Hello' 'こんにちは']

['안녕하세요' 'สวัสดี']]

Length of unicode strings:

[[5 5]

[5 6]]

Example 2 - Finding lenghts of strings made of bytes:

# Example Python program that finds the length of a
# string of ASCII bytes and a string of Unicode
# characters using both the numpy.str_len() function
# and Python len() function
import numpy

# Get the length of ASCII bytes
msg = b"Hello world"
msgLength = numpy.strings.str_len(msg)
print(msgLength)

msgLength = len(msg)
print(msgLength)

# Get the length of an unicode string
msg = "สวัสดีโลก"
msgLength = len(msg)
print(msgLength)
msgLength = numpy.strings.str_len(msg)
print(msgLength)

Output:

Example 3 - Using StringDType:

# Example Python program that uses varibale length
# strings as the elements of an ndarray and finds the
# length of the strings using numpy.str_len() function
import numpy

# Create an ndarray whose elements are variable length strings
vStrings = numpy.ndarray(shape = (2, 2), dtype = numpy.dtypes.StringDType())

vStrings[0][0] = "A"
vStrings[0][1] = "AAAA"
vStrings[1][0] = "AAAAAA"
vStrings[1][1] = "AAAAAAAAAAAA"

lens = numpy.strings.str_len(vStrings)

# Print the array
print(vStrings)

# Print the string lengths
print(lens)

Output:

[['A' 'AAAA']

['AAAAAA' 'AAAAAAAAAAAA']]

[[ 1 4]

[ 6 12]]