I have dataframe
# | A | B | C | D | E
-- ------- ------- ------- ------- -------
1 | "5" | "4" | "2" | "3" | "2022-11-29" |
2 | "5" | "d" | "2" | "3" | "2022-11-30" |
3 | "5" | "4" | "2" | "h" | "2022-11-29" |
4 | "4" | "4" | "2" | "3" | "2022-11-28" |
5 | "4" | "4" | "g" | "3" | "2022-11-29" |
I would like to find changes in same id (column A), but ignore changes in column E
expected result:
ID 5 changed in column B (changedTovalue "d") and column D (changedTovalue "h")
ID 4 changed in column C (changedTovalue "g")
is this possible to do whit pandas?
CodePudding user response:
You can compare most common value per columns by DataFrame.mode
by DataFrame.eq
, set missing values by DataFrame.where
if no match, reshape by DataFrame.stack
, last convert to DataFrame
:
df1 = df.set_index('A').drop('E', axis=1)
print (df1)
B C D
A
5 4 2 3
5 d 2 g <- added new not matched value
5 4 2 h
4 4 2 3
4 4 g 3
s = df1.mode().iloc[0]
df2 = (df1.where(df1.ne(s)).stack()
.rename_axis(['A','cols'])
.reset_index(name='val'))
print (df2)
A cols val
0 5 B d
1 5 D g
2 5 D h
3 4 C g
EDIT:
df1 = df.set_index(['A','E'])
print (df1)
B C D
A E
5 2022-11-29 4 2 3
2022-11-30 d 2 3
2022-11-29 4 2 h
4 2022-11-28 4 2 3
2022-11-29 4 g 3
s = df1.mode().iloc[0]
df2 = (df1.where(df1.ne(s)).stack()
.rename_axis(['A','E','cols'])
.reset_index(name='val')
)
print (df2)
A E cols val
0 5 2022-11-30 B d
1 5 2022-11-29 D h
2 4 2022-11-29 C g