Home > database >  found changes on same id (column A) pandas
found changes on same id (column A) pandas

Time:11-30

I have dataframe

# |  A    |   B   |   C   |   D   |   E   
-- ------- ------- ------- ------- -------
1 |  "5"  |  "4"  |  "2"  |  "3"  |  "2022-11-29"  |
2 |  "5"  |  "d"  |  "2"  |  "3"  |  "2022-11-30"  |
3 |  "5"  |  "4"  |  "2"  |  "h"  |  "2022-11-29"  |
4 |  "4"  |  "4"  |  "2"  |  "3"  |  "2022-11-28"  |
5 |  "4"  |  "4"  |  "g"  |  "3"  |  "2022-11-29"  |

I would like to find changes in same id (column A), but ignore changes in column E

expected result:

ID 5 changed in column B (changedTovalue "d") and column D (changedTovalue "h")
ID 4 changed in column C (changedTovalue "g")

is this possible to do whit pandas?

CodePudding user response:

You can compare most common value per columns by DataFrame.mode by DataFrame.eq, set missing values by DataFrame.where if no match, reshape by DataFrame.stack, last convert to DataFrame:

df1 = df.set_index('A').drop('E', axis=1)

print (df1)
   B  C  D
A         
5  4  2  3
5  d  2  g <- added new not matched value
5  4  2  h
4  4  2  3
4  4  g  3

s = df1.mode().iloc[0]


df2 = (df1.where(df1.ne(s)).stack()
          .rename_axis(['A','cols'])
          .reset_index(name='val'))
print (df2)
   A cols val
0  5    B   d
1  5    D   g
2  5    D   h
3  4    C   g

EDIT:

df1 = df.set_index(['A','E'])
print (df1)
              B  C  D
A E                  
5 2022-11-29  4  2  3
  2022-11-30  d  2  3
  2022-11-29  4  2  h
4 2022-11-28  4  2  3
  2022-11-29  4  g  3

s = df1.mode().iloc[0]


df2 = (df1.where(df1.ne(s)).stack()
           .rename_axis(['A','E','cols'])
           .reset_index(name='val')
          )
print (df2)
   A           E cols val
0  5  2022-11-30    B   d
1  5  2022-11-29    D   h
2  4  2022-11-29    C   g
  • Related