I have a dataframe as follows:
A | B |
-----------------
AA | 101 |
AA | 102 |
AA | 103.5|
AA | 104 |
AA | 105 |
And basically, I would like to add a column which increases by 1, but if B
is a decimal number then it skips it such that I get a dataframe like this:
A | B | C
------------------------
AA | 101 | 1
AA | 102 | 2
AA | 103.5|
AA | 104 | 3
AA | 105 | 4
I tried using something like this:
df.insert(2, 'C', range(1, len(df)))
df.loc[is_integer(df['order']), 'detailed_category_id'] =...
But I'm not too sure if this is correct, so any help would be appreciated, thanks!
CodePudding user response:
You can use df['B'].eq(df['B'].astype(int))
to check if the value is an integer, then use this mask for boolean indexing of the mask's cumsum
:
m = df['B'].eq(df['B'].astype(int))
df.loc[m, 'C'] = m.cumsum()
print(df)
If you have groups in A and want to restart the count on new groups, rather use groupby.cumsum
:
df.loc[m, 'C'] = m.groupby(df['A']).cumsum()
Output:
A B C
0 AA 101.0 1.0
1 AA 102.0 2.0
2 AA 103.5 NaN
3 AA 104.0 3.0
4 AA 105.0 4.0