I have the following pandas dataframe (in code):
original_df = pd.DataFrame([['ID1', 4], ['ID1', 4], ['ID1', 4], ['ID2', 5], ['ID2', 5], ['ID2', 5], ['ID3', 6], ['ID3', 6], ['ID3', 6]], columns=['Index', 'Value'])
Based on each occurence of the change in Index value, only the first value column's integer should be maintained, and the rest of the values should be changed to 0 value.
The resulting manipulated dataframe should look like this:
desired_df = pd.DataFrame([['ID1', 4], ['ID1', 0], ['ID1', 0], ['ID2', 5], ['ID2', 0], ['ID2', 0], ['ID3', 6], ['ID3', 0], ['ID3', 0]], columns=['Index', 'Value'])
I have tried numerous different manipulation techniques, but none has worked. I have tried substitution of values, but this doesn't scale well for many rows at once. Please could someone provide any suggestions ?
CodePudding user response:
Set 0
for all values if difference with next row Series.where
:
mask = original_df['Index'].ne(original_df['Index'].shift())
original_df['Value'] = original_df['Value'].where(mask, 0)
Or use:
mask = original_df['Index'].ne(original_df['Index'].shift())
original_df['Value'] = np.where(mask, original_df['Value'], 0)
print (original_df)
Index Value
0 ID1 4
1 ID1 0
2 ID1 0
3 ID2 5
4 ID2 0
5 ID2 0
6 ID3 6
7 ID3 0
8 ID3 0
If groups are sorted is possible use Series.duplicated
:
mask = original_df['Index'].duplicated()
original_df['Value'] = original_df['Value'].mask(mask, 0)
original_df['Value'] = np.where(mask, 0, original_df['Value'])