i have a dataframe, that has a column 'A1' that contains multiple 'Hello' strings, postive as well as negative integers. I want to count the 'Hello' strings, all number >= 0 and all numbers < 0, so that i get three sums in the end.
index | A1 |
---|---|
0 | 1 |
1 | Hello |
2 | -8 |
3 | Hello |
So the Output should be for posNums 1, negNums 1 and helloCount 2
posNums = df.where(df['A1'] >= 0).sum()
This doesnt work obviously, because one cant compare string to int. But how can I add here some condition that skips the str when I count ints and vice versa?
CodePudding user response:
One way is to use pd.to_numeric:
import pandas as pd
df = pd.DataFrame({"A1": ["Hello", 1, -1, "Hello", "Hello", -2, 2, -3]})
agg_funcs = {
"negative": lambda x: x.lt(0).sum(),
"positive": lambda x: x.ge(0).sum(),
"nans": lambda x: x.isna().sum()
}
out = pd.to_numeric(df["A1"], errors="coerce").agg(agg_funcs)
out:
negative 3
positive 2
nans 3
Name: A1, dtype: int64
CodePudding user response:
Something like this you're looking for?
df = pd.DataFrame({'A1': ['hello', 1, 2, 3, 4, 5, -1, -2, -3, -4, -5, 'world']})
count_pos = df[df['A1'].apply(lambda x: isinstance(x, int) and x > 0)].count()
count_neg = df[df['A1'].apply(lambda x: isinstance(x, int) and x < 0)].count()
count_str = df[df['A1'].apply(lambda x: isinstance(x, str))].count()
Will output:
A1 5
dtype: int64
A1 5
dtype: int64
A1 2
dtype: int64