I am looking to derive a new row from a current row in my dataframe, and add 1 to the previous row to keep a kind of running total
df['Touch_No'] = np.where((df.Time_btween_steps.isnull()) | (df.Time_btween_steps > 30), 1, df.First_touch.shift().add(1))
I basically want to check if the column value is null, if it is then set that to "First Activity"/resets the counter, if not, add 1 to the "previous activity", to give me a running total of the number of outreach we are doing on specific people:
Expected outcome:
Time Between Steps | Touch_No
Null. |. 1
0 |. 2
5.4 |. 3
6.7 |. 4
2 |. 5
null |. 1
1 |. 2
CodePudding user response:
Answer using this. Combo of cumsum()
, groupBy()
, and cumcount()
df = pd.DataFrame(data=[None, 0, 5.4, 6.7, 2, None, 1], columns=['Time_btween_steps'])
df['Touch_No'] = np.where((df.Time_btween_steps.isnull()), (df.Time_btween_steps > 30), 1)
df['consec'] = df['Touch_No'].groupby((df['Touch_No']==0).cumsum()).cumcount()
df.head(10)
CodePudding user response:
Edited according to your clarification:
df = pd.DataFrame(data=np.array(([None, 0, 5.4, 6.7, 2, None, 1],[50,1,2,3,4,35,1])).T, columns=['Time_btween_steps', 'Touch_No'])
mask = pd.isna(df['Time_btween_steps']) | df['Time_btween_steps']>30
df['Touch_No'][~mask] = 1
df['Touch_No'][mask] = 1
Returns:
Time_btween_steps Touch_No
0 None 51
1 0 2
2 5.4 3
3 6.7 4
4 2 5
5 None 36
6 1 2
In my opinion a solution like this is much more readable. We increment by 1 where the condition is not met, and we set the ones where the condition is true to 1. You can combine these into a single line if you wish.
Old answer for posterity.
Here is a simple solution using pandas apply
functionality which takes a function.
import pandas as pd
df = pd.DataFrame(data=[1,2,3,4,None,5,0],columns=['test'])
df.test.apply(lambda x: 0 if pd.isna(x) else x 1)
Which returns:
0 2.0
1 3.0
2 4.0
3 5.0
4 0.0
5 6.0
6 1.0
Here I wrote the function in place but if you have more complicated logic, such as resetting if the number is something else, etc., you can write a custom function and pass it in instead of the lambda function. This is not the only way to do it, but if your data frame isn't huge (hundreds of thousands of rows), it should be performant. If you don't want a copy but to overwrite the array simply assign it back by prepending:
df['test'] =
before the last line.
If you want the output to be ints, you can also do:
df['test'].astype(int)
but be careful about converting None/Null to int.
CodePudding user response:
Using np.where, index values with ffill for partitioning and simple rank:
import numpy as np
import pandas as pd
sodf = pd.DataFrame({'time_bw_steps': [None, 0, 5.4, 6.7, 2, None, 1]})
sodf['touch_partition'] = np.where(sodf.time_bw_steps.isna(), sodf.index, np.NaN)
sodf['touch_partition'] = sodf['touch_partition'].fillna(method='ffill')
sodf['touch_no'] = sodf.groupby('touch_partition')['touch_partition'].rank(method='first', ascending='False')
sodf.drop(columns=['touch_partition'], axis='columns', inplace=True)
sodf