I want to check if a column in my dataframe has a missing value (according to a given condition), if yes i want to replace those missing values with '-'. Here's my code:
for i in range(len(sample)):
if sample['label'] != 0 & sample['attack_cat'].isnull() == True:
sample['attack_cat'] = sample['attack_cat'].fillna('-')
else:
sample['attack_cat']
I get this error: in nonzero raise ValueError( ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I checked in debug, it says:
@final
def __nonzero__(self):
raise ValueError(
f"The truth value of a {type(self).__name__} is ambiguous. "
"Use a.empty, a.bool(), a.item(), a.any() or a.all()."
)
Do u have any idea how to solve this, thanks.
CodePudding user response:
You can simply use pandas.DataFrame.loc
with a boolean mask :
mask = sample['label'].ne(0) & sample['attack_cat'].isnull()
sample.loc[mask, 'attack_cat'] = '-'
CodePudding user response:
If I'm understanding correctly I think you should just be able to define your condition and use .loc
to fill your nulls:
cond = (sample['label'] != 0) & (sample['label'].isnull())
sample.loc[cond, 'attack_cat'] = sample.loc[cond, 'attack_cat'].fillna('-')
But a few things here. If you have multiple conditions you'll need to put them in parenthesis:
(sample['label'] != 0) & (sample['label'].isnull())
rather than
sample['label'] != 0 & sample['label'].isnull()
Also, you don't need isnull = True
just isnull()
Also you're iterating through a range of numbers but not really doing anything with them...for i in range(len(sample)):
but i
doesn't show up anywhere else in your code. If you want to iterate through a dataframe and do something row by row you'll need to do something like
for index, row in sample.iterrows():
if row['label'].isnull():
etc...
or
for i in range(len(sample)):
if df.iloc[i]['label'].isnull():
etc...
And lastly, I'm a bit confused on your condition here. You're checking if the values in the column label
are not equal to 0 but also if they are null. If sample['label'].isnull()
is part of your condition, you don't need the sample['label'] != 0
as part of it as well.
CodePudding user response:
You don't need to iterate through the dataframe to fill the missing values. Here's how you could do it:
sample.loc[(sample['label'] != 0) & (sample['attack_cat'].isna()), 'attack_cat'] = '-'
Full code with sample data
# == Necessary Imports =======================================
from __future__ import annotations # Enables type annotations
import pandas as pd
# Used to generate a random sample to test the code.
import numpy as np
# == Generate Random Sample DataFrame ========================
def generate_sample_dataframe(
size: int = 20,
choices_attack_cat: list | None = None,
choices_label: list | None = None,
) -> pd.DataFrame:
"""
Generate a sample dataframe with two columns:
* 'label'
* 'attack_cat'
Parameters
----------
size : int, default=20
The number of rows in the dataframe.
choices_attack_cat : list | None, optional
The possible values for the column 'attack_cat'.
Default labels:
* True
* False
* None
choices_label : list | None, optional
The possible values for the column 'label'.
Default labels:
* 0
* 1
* 2
* 3
* 4
* 5
* None
Returns
-------
pd.DataFrame
A dataframe with two columns: 'label' and 'attack_cat'.
Examples
--------
>>> generate_sample_dataframe(size=5)
label attack_cat
0 0 True
1 0 True
2 0 True
3 0 True
4 0 True
>>> generate_sample_dataframe(size=5, choices_attack_cat=[True, False])
label attack_cat
0 0 True
1 0 True
2 0 True
3 0 True
4 0 True
>>> generate_sample_dataframe(size=5, choices_label=[0, 1, 2])
label attack_cat
0 0 True
1 0 True
2 0 True
3 0 True
4 0 True
>>> generate_sample_dataframe(
... size=5,
... choices_attack_cat=[True, False],
... choices_label=[0, 1, 2],
... )
label attack_cat
0 0 True
1 0 True
2 0 True
3 0 True
4 0 True
"""
if choices_attack_cat is None:
choices_attack_cat = [True, False, None]
elif not hasattr(choices_attack_cat, "__iter__") or isinstance(
choices_attack_cat, str
):
choices_attack_cat = [choices_attack_cat]
if choices_label is None:
choices_label = [0, 1, 2, 3, 4, 5, None]
elif not hasattr(choices_label, "__iter__") or isinstance(
choices_label, str
):
choices_label = [choices_label]
return pd.DataFrame(
{
"label": np.random.choice(choices_label, size=size),
"attack_cat": np.random.choice(choices_attack_cat, size=size),
}
)
# == Random Sample DataFrame ==================================
sample = generate_sample_dataframe(100)
# == Solution =================================================
# Replace values with "label" different from 0, and
# with missing values for column "attack_cat" with "-"
# Notes:
# - The `&` operator is the same as `and`. If you want to add an
# `or` condition, use `|`.
sample.loc[(sample['label'] != 0) & (sample['attack_cat'].isna()), 'attack_cat'] = '-'
Iterating through the dataframe
If you really want to iterate the dataframe, you could use something like this:
for index, row in sample.iterrows():
if row['label'] != 0 and pd.isna(row['attack_cat']):
sample.iloc[index]['attack_cat'] = '-'