found changes on same id (column A) pandas-CodePudding

I have dataframe

# |  A    |   B   |   C   |   D   |   E   
-- ------- ------- ------- ------- -------
1 |  "5"  |  "4"  |  "2"  |  "3"  |  "2022-11-29"  |
2 |  "5"  |  "d"  |  "2"  |  "3"  |  "2022-11-30"  |
3 |  "5"  |  "4"  |  "2"  |  "h"  |  "2022-11-29"  |
4 |  "4"  |  "4"  |  "2"  |  "3"  |  "2022-11-28"  |
5 |  "4"  |  "4"  |  "g"  |  "3"  |  "2022-11-29"  |

I would like to find changes in same id (column A), but ignore changes in column E

expected result:

ID 5 changed in column B (changedTovalue "d") and column D (changedTovalue "h")
ID 4 changed in column C (changedTovalue "g")

is this possible to do whit pandas?

CodePudding user response：

You can compare most common value per columns by DataFrame.mode by DataFrame.eq, set missing values by DataFrame.where if no match, reshape by DataFrame.stack, last convert to DataFrame:

df1 = df.set_index('A').drop('E', axis=1)

print (df1)
   B  C  D
A         
5  4  2  3
5  d  2  g <- added new not matched value
5  4  2  h
4  4  2  3
4  4  g  3

s = df1.mode().iloc[0]


df2 = (df1.where(df1.ne(s)).stack()
          .rename_axis(['A','cols'])
          .reset_index(name='val'))
print (df2)
   A cols val
0  5    B   d
1  5    D   g
2  5    D   h
3  4    C   g

EDIT:

df1 = df.set_index(['A','E'])
print (df1)
              B  C  D
A E                  
5 2022-11-29  4  2  3
  2022-11-30  d  2  3
  2022-11-29  4  2  h
4 2022-11-28  4  2  3
  2022-11-29  4  g  3

s = df1.mode().iloc[0]


df2 = (df1.where(df1.ne(s)).stack()
           .rename_axis(['A','E','cols'])
           .reset_index(name='val')
          )
print (df2)
   A           E cols val
0  5  2022-11-30    B   d
1  5  2022-11-29    D   h
2  4  2022-11-29    C   g