I have a dataframe:
group id
A 009x
A 010x
B 009x
B 002x
C 002x
C 003x
How do I make a new column new
that categorizes conditionally under the following three conditions by group
:
- If all
id
values consist of ONLY009x
and010x
, then categorize asg1
- If the
id
value is one of009x
or010x
AND anotherid
value is not one of009x
or010x
, then categorize asg2
- Otherwise, just print the
id
value
Desired result:
group id new
A 009x g1
A 010x g1
B 009x g2
B 002x g2
C 002x 002x
C 003x 003x
data = {
'group': ['A', 'A', 'B', 'B', 'C', 'C'],
'id': ['009x', '010x', '009x', '002x', '002x', '003x'],
}
df = pd.DataFrame(data)
df
CodePudding user response:
I hope I've understood your question right. You can use .groupby()
custom function:
def categorize_fn(x):
tmp = x["id"].isin(["009x", "010x"])
if tmp.all():
x["new"] = "g1"
elif tmp.any():
x["new"] = "g2"
else:
x["new"] = x["id"]
return x
df = df.groupby("group", group_keys=False).apply(categorize_fn)
print(df)
Prints:
group id new
0 A 009x g1
1 A 010x g1
2 B 009x g2
3 B 002x g2
4 C 002x 002x
5 C 003x 003x