I think this problem is quite easy but I can not find the correct function of pandas to achieve it or link in stackoverflow. I tried it with cumsum but did not succeeded. My dataframe looks as follows:
import pandas as pd
# initialize data of lists.
data = {'time_id':[1,1,2,3,3,4,5,5,5,6,7],
'param1':[20,3,4,21,19,8,9,18,6,4,2]}
# Create DataFrame
df = pd.DataFrame(data)
Now i want to have a new column. And here it should always be increased after every 3rd time id. The output table should look like this:
CodePudding user response:
Subtract 1
and then use integer division by 3
, last add 1
:
df['new_col'] = df['time_id'].sub(1) // 3 1
print (df)
time_id param1 new_col
0 1 20 1
1 1 3 1
2 2 4 1
3 3 21 1
4 3 19 1
5 4 8 2
6 5 9 2
7 5 18 2
8 5 6 2
9 6 4 2
10 7 2 3
If not possible subtract 1
, e.g. because some another type of column, here timedeltas use:
df['time_id'] = pd.to_timedelta(df['time_id'], unit='s')
print (df)
time_id param1
0 0 days 00:00:01 20
1 0 days 00:00:01 3
2 0 days 00:00:02 4
3 0 days 00:00:03 21
4 0 days 00:00:03 19
5 0 days 00:00:04 8
6 0 days 00:00:05 9
7 0 days 00:00:05 18
8 0 days 00:00:05 6
9 0 days 00:00:06 4
10 0 days 00:00:07 2
df['new_col'] = pd.factorize(df['time_id'])[0] // 3 1
print (df)
time_id param1 new_col
0 0 days 00:00:01 20 1
1 0 days 00:00:01 3 1
2 0 days 00:00:02 4 1
3 0 days 00:00:03 21 1
4 0 days 00:00:03 19 1
5 0 days 00:00:04 8 2
6 0 days 00:00:05 9 2
7 0 days 00:00:05 18 2
8 0 days 00:00:05 6 2
9 0 days 00:00:06 4 2
10 0 days 00:00:07 2 3