I have two DataFrame df1 and df2.
df1 is the original dataset and df2 is the dataset made from df1 after some manipulation.
In df1 I have column 'log' and in df2 I have column 'log1' and 'log2' two columns.
where the values in columns 'log1' and 'log2' contains in column 'log' in df1.
df2 sample below
date id log1 log2
1 uu1q (2,4) (3,5)
1 uu1q (2,4) (7,6)
1 uu1q (3,5) (7,6)
5 u25a (4,7) (3,9)
5 uu25a (1,9) (3,9)
6 ua3b7 (1,1) (2,2)
6 ua3b7 (1,1) (3,3)
6 ua3b7 (2,2) (3,3)
df1 column sample with data below
date id log name col1 col2
1 uu1q (2,4) xyz 1123 qqq
1 uu1q (3,5) aas 2132 wew
1 uu1q (7,6) wqas 2567 uuo
5 u25a (4,7) enj 666 ttt
5 fff (0,0) ddd 0 lll
Now I want to take fetch/filter all the records from df1 based on column values for each row in df2 i.e. based on 'date'
, 'id'
, 'log1'
or 'log2'
and compare it with columns in df1 i.e.
'date'
, 'id'
, 'log'
.
NOTE: values columns 'log1' and 'log2' contained in single column 'log'
CodePudding user response:
IIUC, you're looking for a chained isin
:
out = df1[df1['date'].isin(df2['date']) & df1['id'].isin(df2['id']) & (df1['log'].isin(df2['log1']) | df1['log'].isin(df2['log2']))]
Output:
date id log name col1 col2
0 1 uu1q (2,4) xyz 1123 qqq
1 1 uu1q (3,5) aas 2132 wew
2 1 uu1q (7,6) wqas 2567 uuo
3 5 u25a (4,7) enj 666 ttt
CodePudding user response:
Use DataFrame.melt
for column log
from log1, log2...
columns and for filtering inner join in DataFrame.merge
:
df = (df2.melt(['date','id'], value_name='log')
.drop('variable', axis=1)
.drop_duplicates()
.merge(df1))
print (df)
date id log name col1 col2
0 1 uu1q (2,4) xyz 1123 qqq
1 1 uu1q (3,5) aas 2132 wew
2 5 u25a (4,7) enj 666 ttt
3 1 uu1q (7,6) wqas 2567 uuo