I have a data frame like the one below, and I want to add an id column that restarts based on the node value.
node1,0.858
node1,0.897
node1,0.954
node2,3.784
node2,7.640
node2,11.592
For example, I want the output below
0, node1, 0.858
1, node1, 0.897
2, node1, 0.954
0, node2, 3.784
1, node2, 7.640
2, node2, 11.592
I have tried to use an index based on the node values but this would not rest the column's value after seeing a new node. I can use a loop but that is an anti-pattern in pandas.
CodePudding user response:
You can group by the column you wish to base the partition on and then use cumcount() or cumsum(). Then use set_index() to reassign the index to the new field. You can skip that line however if you just need the partition index as a column.
import pandas as pd
data = {'Name':['node1','node1','node1','node2','node2','node3'],
'Value':[1000,20000,40000,30000,589,682]}
df = pd.DataFrame(data)
df['New_Index'] = df.groupby('Name').cumcount()
df.set_index('New_Index', inplace = True)
display(df)