I have a dataframe
df =
C1 C2
a. 2
d. 8
d. 5
d. 5
b. 3
b. 4
c. 5
a. 6
b. 7
I want to take all the rows, in which the count of the value in C1 is <= 2, and add a new col that is low, and keep the original value otherwise. So the new df will look like that:
df_new =
C1 C2 type
a. 2 low
d. 8 d
d. 5 d
d. 5 d
b. 3. b
b. 4 b
c. 5. low
a. 6. low
b. 7 b
How can I do this?
Thanks
CodePudding user response:
You can use pandas.DataFrame.groupby
and count the value of 'C1'
in each group. Then use lambda
in pandas.DataFrame.transform
and return low
or the original value of the group. Or we can use numpy.where
on the result of groupby
.
df['type'] = df.groupby('C1')['C1'].transform(lambda g: 'low' if len(g)<=2 else g.iloc[0][:-1])
# Or we can use 'numpy.where' on the result of groupby
g = df.groupby('C1')['C1'].transform('size')
df['type'] = np.where(g<=2, 'low', df['C1'].str[:-1])
print(df)
Output:
C1 C2 type
0 a. 2 low
1 d. 8 d
2 d. 5 d
3 d. 5 d
4 b. 3 b
5 b. 4 b
6 c. 5 low
7 a. 6 low
8 b. 7 b