I have a dataframe with these characteristics (the indexes are float values):
import pandas as pd
d = {'A': [1,2,3,4,5,6,7,8,9,10],
'B': [1,2,3,4,5,6,7,8,9,10],
'C': [1,2,3,4,5,6,7,8,9,10],
'D': ['one','one','one','one','one','two','two','two','two','two']}
df = pd.DataFrame(data=d)
df
A B C D
50.0 1 1 1 one
50.2 2 2 2 one
50.4 3 3 3 one
50.6 4 4 4 one
50.8 5 5 5 one
51.0 6 6 6 two
51.2 7 7 7 two
51.4 8 8 8 two
51.6 9 9 9 two
51.8 10 10 10 two
And a list of offsets with these values (they are also floats):
offsets = [[0.4, 0.6, 0.8], [0.2, 0.4, 0.6]]
I need to iterate through my dataframe over columns A, B and C, choosing the categorical values from column D, replacing the last values from columns A, B and C by nan according their indexes in relation to the offsets in my list, resulting in a dataframe like this:
A B C D
50.0 1 1 1 one
50.2 2 2 nan one
50.4 3 nan nan one
50.6 nan nan nan one
50.8 nan nan nan one
51.0 6 6 6 two
51.2 7 7 7 two
51.4 8 8 nan two
51.6 9 nan nan two
51.8 nan nan nan two
The value of the offset means what values must be set to nan from the bottom up. For example: offsets[0][0]=0.4, so for column A when D == 'one', the two values from the bottom up must be set to nan (rows 4 and 3, 50.8-0.4 = 50.4 - 50.4 doesn't change). For A when D == 'two', the offsets[1][0]=0.2, so one value from the bottom up must be set to nan (row 9, 51.8-0.2 = 51.6 - 51.6 doesn't change). Offsets[1][0]=0.6, so for column B when D == 'one', the three values from the bottom up must be set to nan (rows 4, 3 and 2, 50.8-0.6 = 50.2 - 50.2 doesn't change). For B when D == 'two', the offsets[1][1]=0.4, so two values from the bottom up must be set to nan (rows 9 and 8, 51.8-0.4 = 51.4 - 51.4 doesn't change). For column C is the same.
Any idea how to do this? A quick comment - I want to replace these values in the dataframe itself, without creating a new one.
CodePudding user response:
One approach is to use apply
to set the last values of each column to NaN:
import pandas as pd
# toy data
df = pd.DataFrame(data={'A': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'B': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'C': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'D': ['one', 'one', 'one', 'one', 'one', 'two', 'two', 'two', 'two', 'two']})
offsets = [2, 3, 4]
offset_lookup = dict(zip(df.columns[:3], offsets))
def funny_shift(x, ofs=None):
"""This function shift each column by the given offset in the ofs parameter"""
for column, offset in ofs.items():
x.loc[x.index[-1 * offset:], column] = None
return x
df.loc[:, ["A", "B", "C"]] = df.groupby("D").apply(funny_shift, ofs=offset_lookup)
print(df)
Output
A B C D
0 1.0 1.0 1.0 one
1 2.0 2.0 NaN one
2 3.0 NaN NaN one
3 NaN NaN NaN one
4 NaN NaN NaN one
5 6.0 6.0 6.0 two
6 7.0 7.0 NaN two
7 8.0 NaN NaN two
8 NaN NaN NaN two
9 NaN NaN NaN two
UPDATE
If you have multiple updates per group, you could do:
offsets = [[2, 3, 4], [1, 2, 3]]
offset_lookup = (dict(zip(df.columns[:3], offset)) for offset in offsets)
def funny_shift(x, ofs=None):
"""This function shift each column by the given offset in the ofs parameter"""
current = next(ofs)
for column, offset in current.items():
x.loc[x.index[-1 * offset:], column] = None
return x
df.loc[:, ["A", "B", "C"]] = df.groupby("D").apply(funny_shift, ofs=offset_lookup)
print(df)