Home > Net >  Manipulate Pandas dataframe based on index and column values
Manipulate Pandas dataframe based on index and column values

Time:03-01

I have the following pandas dataframe (in code):

original_df = pd.DataFrame([['ID1', 4], ['ID1', 4], ['ID1', 4], ['ID2', 5], ['ID2', 5], ['ID2', 5], ['ID3', 6], ['ID3', 6], ['ID3', 6]], columns=['Index', 'Value'])

Based on each occurence of the change in Index value, only the first value column's integer should be maintained, and the rest of the values should be changed to 0 value.

The resulting manipulated dataframe should look like this:

desired_df = pd.DataFrame([['ID1', 4], ['ID1', 0], ['ID1', 0], ['ID2', 5], ['ID2', 0], ['ID2', 0], ['ID3', 6], ['ID3', 0], ['ID3', 0]], columns=['Index', 'Value'])

I have tried numerous different manipulation techniques, but none has worked. I have tried substitution of values, but this doesn't scale well for many rows at once. Please could someone provide any suggestions ?

CodePudding user response:

Set 0 for all values if difference with next row Series.where:

mask = original_df['Index'].ne(original_df['Index'].shift())
original_df['Value'] = original_df['Value'].where(mask, 0)

Or use:

mask = original_df['Index'].ne(original_df['Index'].shift())
original_df['Value'] = np.where(mask, original_df['Value'], 0)

print (original_df)  
  Index  Value
0   ID1      4
1   ID1      0
2   ID1      0
3   ID2      5
4   ID2      0
5   ID2      0
6   ID3      6
7   ID3      0
8   ID3      0

If groups are sorted is possible use Series.duplicated:

mask = original_df['Index'].duplicated()
original_df['Value'] = original_df['Value'].mask(mask, 0)
original_df['Value'] = np.where(mask, 0, original_df['Value'])
  • Related