I have a DataFrame df
with four columns; Date
, Location
, Category
, and Value
. For each Date
and Location
, I try to change the values in column Value
which contain a value in the other column Category
with a 5 or higher and replace it by the value of the row with Category
5.
df:
Date Location Category Value
20220101 FE 1 0.23
20220101 FE 2 0.24
20220101 FE 3 0.26
20220101 FE 4 0.27
20220101 FE 5 0.28
20220101 FE 6 0.30
20220101 RP 5 0.32
20220101 RP 6 0.35
20220102 FE 1 0.20
20220102 FE 2 0.23
20220102 FE 3 0.25
20220102 FE 4 0.26
20220102 FE 5 0.28
20220102 FE 6 0.32
df_new:
Date Location Category Value
20220101 FE 1 0.23
20220101 FE 2 0.24
20220101 FE 3 0.26
20220101 FE 4 0.27
20220101 FE 5 0.28
20220101 FE 6 0.28 <-- changed with value from row with Category == 5
20220101 RP 5 0.32
20220101 RP 6 0.32 <-- changed with value from row with Category == 5
20220102 FE 1 0.20
20220102 FE 2 0.23
20220102 FE 3 0.25
20220102 FE 4 0.26
20220102 FE 5 0.28
20220102 FE 6 0.28 <-- changed with value from row with Category == 5
So far, I was only able to extract the Value
of a specific Date
and Location
of the Category
= 5.
df.loc[(df['Date'] == 20220101) & (df['Location'] == 'FE') & (df['Category'] == 5), 'Value'].iloc[0]
Is there an easy and efficient way to change the column values in the column Value
? Many thanks!
For reproducability:
df = pd.DataFrame({
'Date':[20220101, 20220101, 20220101, 20220101, 20220101, 20220101, 20220101, 20220101, 20220102, 20220102, 20220102, 20220102, 20220102, 20220102, 20220102, 20220102],
'Location':['FE', 'FE', 'FE', 'FE', 'FE', 'FE', 'RP', 'RP', 'FE', 'FE', 'FE', 'FE', 'FE', 'FE', 'RP', 'RP'],
'Category':[1, 2, 3, 4, 5, 6, 5, 6, 1, 2, 3, 4, 5, 6, 5, 6],
'Value':[0.23, 0.24, 0.26, 0.27, 0.28, 0.3, 0.32, 0.35, 0.2, 0.23, 0.25, 0.26, 0.28, 0.32, 0.34, 0.36]
})
CodePudding user response:
Assuming the Categories are in ascending order per group, you can mask
the values for Categories > 5, and groupby.ffill
:
df['Value'] = (df['Value'].mask(df['Category'].gt(5))
.groupby([df['Date'], df['Location']])
.ffill()
)
output (as new column Value2 for comparison):
Date Location Category Value Value2
0 20220101 FE 1 0.23 0.23
1 20220101 FE 2 0.24 0.24
2 20220101 FE 3 0.26 0.26
3 20220101 FE 4 0.27 0.27
4 20220101 FE 5 0.28 0.28
5 20220101 FE 6 0.30 0.28
6 20220101 RP 5 0.32 0.32
7 20220101 RP 6 0.35 0.32
8 20220102 FE 1 0.20 0.20
9 20220102 FE 2 0.23 0.23
10 20220102 FE 3 0.25 0.25
11 20220102 FE 4 0.26 0.26
12 20220102 FE 5 0.28 0.28
13 20220102 FE 6 0.32 0.28
14 20220102 RP 5 0.34 0.34
15 20220102 RP 6 0.36 0.34