NumPy - 11 Boolean Indexing
>>> names = np.array(["Bob", "Joe", "Will", "Bob", "Will", "Joe", "Joe"])
>>> data = np.array([[4, 7], [0, 2], [-5, 6], [0, 0], [1, 2], [-12, -4], [3, 4]])
>>> names
array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'], dtype='<U4')
>>> data
array([[ 4, 7],
[ 0, 2],
[ -5, 6],
[ 0, 0],
[ 1, 2],
[-12, -4],
[ 3, 4]])
Each name corresponds to a row in the data
array, and we can use Boolean indexing to filter data based on the values in names
.
Selecting Rows for a Specific Name
Like arithmetic operations, comparisons (==
!=
) with arrays are also vectorized.
>>> names == "Bob"
array([ True, False, False, True, False, False, False])
>>> data[names == "Bob"]
array([[4, 7],
[0, 0]])
- The condition
names == "Bob"
generates a Boolean array. - This Boolean array is used to index the
data
array and return the corresponding rows.
The boolean array must be of the same length as the array axis it is indexing.
Selecting Specific Columns for a Name
To select the rows where names == "Bob"
and also index the columns too.
>>> data[names == "Bob", 1: ]
array([[7],
[0]])
>>> data[names == "Bob", 1 ]
array([7, 0])
- The
1:
specifies that only the columns starting from index 1 will be included.
Selecting Everything Except a Specific Name
To select rows where the name is not “Bob”, the condition can be negated using !=
or ~
:
>>> names != "Bob"
array([False, True, True, False, True, True, True])
>>> ~(names != "Bob")
array([ True, False, False, True, False, False, False])
>>> ~(names == "Bob")
array([False, True, True, False, True, True, True])
>>> data[names != "Bob"]
array([[ 0, 2],
[ -5, 6],
[ 1, 2],
[-12, -4],
[ 3, 4]])
The ~
operator is useful to invert a boolean array referenced by a variable.
>>> cond = names == "Bob"
>>> data[ ~cond ]
array([[ 0, 2],
[ -5, 6],
[ 1, 2],
[-12, -4],
[ 3, 4]])
- The variable
cond
stores the condition, and~cond
is used to invert the Boolean array.
Selecting Multiple Names Using Boolean Operators (’&’, ‘|’)
combining conditions using the bitwise operators &
(and) and |
(or).
>>> names = np.array(["Bob", "Joe", "Will", "Bob", "Will", "Joe", "Joe"])
>>> data = np.array([[4, 7], [0, 2], [-5, 6], [0, 0], [1, 2],[-12, -4], [3,4]])
>>> mask = (names == "Bob") | (names == "Will")
>>> mask
array([ True, False, True, True, True, False, False])
>>> data[mask]
array([[ 4, 7],
[-5, 6],
[ 0, 0],
[ 1, 2]])
&
is for logical AND, and|
is for logical OR.- Important: Do not use Python’s
and
oror
for element-wise Boolean operations. Use&
and|
for Boolean arrays.
Selecting data from an array by Boolean Indexing and assigning the results to a new variable always creates a new copy of the data, even if the returned array is unchanged.
When indexing and modifying arrays with Boolean conditions, the changes are applied directly to the original array.
Modifying Data Based on a Condition
>>> data[names != "Joe"] = 7
>>> data
array([[ 7, 7],
[ 0, 2],
[ 7, 7],
[ 7, 7],
[ 7, 7],
[-12, -4],
[ 3, 4]])
- The condition
names != "Joe"
identifies rows where the name is not “Joe”. - These rows are updated to have the value
7
in thedata
array.