I have two DataFrames. column "video_path" is common in both the dataframes. I need to extract details from df1 if it matches with df2 and also with the value of yes/no.
df1
df2
Expected result:
What i tried
newdf= df1.merge(frame, left_on='video_path', right_on='Video_path', how='inner')
But I'm sure its not correct.
code to create data frames
df1 = {'video_path': {0: 'video/file_path/1.mp4', 1: 'video/file_path/1.mp4', 2: 'video/file_path/1.mp4', 3: 'video/file_path/2.mp4', 4: 'video/file_path/2.mp4', 5: 'video/file_path/2.mp4', 6: 'video/file_path/2.mp4', 7: 'video/file_path/2.mp4', 8: 'video/file_path/3.mp4', 9: 'video/file_path/3.mp4', 10: 'video/file_path/3.mp4', 11: 'video/file_path/4.mp4', 12: 'video/file_path/4.mp4', 13: 'video/file_path/4.mp4', 14: 'video/file_path/4.mp4', 15: 'video/file_path/5.mp4', 16: 'video/file_path/5.mp4', 17: 'video/file_path/5.mp4', 18: 'video/file_path/5.mp4', 19: 'video/file_path/6.mp4', 20: 'video/file_path/6.mp4', 21: 'video/file_path/6.mp4', 22: 'video/file_path/6.mp4', 23: 'video/file_path/6.mp4'}, 'frame_details': {0: 'frame_1.jpg', 1: 'frame_2.jpg', 2: 'frame_3.jpg', 3: 'frame_1.jpg', 4: 'frame_2.jpg', 5: 'frame_3.jpg', 6: 'frame_4.jpg', 7: 'frame_5.jpg', 8: 'frame_1.jpg', 9: 'frame_2.jpg', 10: 'frame_3.jpg', 11: 'frame_1.jpg', 12: 'frame_2.jpg', 13: 'frame_3.jpg', 14: 'frame_4.jpg', 15: 'frame_1.jpg', 16: 'frame_2.jpg', 17: 'frame_3.jpg', 18: 'frame_4.jpg', 19: 'frame_1.jpg', 20: 'frame_2.jpg', 21: 'frame_3.jpg', 22: 'frame_4.jpg', 23: 'frame_5.jpg'}, 'width': {0: 520, 1: 520, 2: 520, 3: 120, 4: 120, 5: 120, 6: 120, 7: 120, 8: 720, 9: 720, 10: 720, 11: 1080, 12: 1080, 13: 1080, 14: 1080, 15: 480, 16: 480, 17: 480, 18: 480, 19: 640, 20: 640, 21: 640, 22: 640, 23: 640}, 'height': {0: 225, 1: 225, 2: 225, 3: 120, 4: 120, 5: 120, 6: 120, 7: 120, 8: 480, 9: 480, 10: 480, 11: 1920, 12: 1920, 13: 1920, 14: 1920, 15: 640, 16: 640, 17: 640, 18: 640, 19: 480, 20: 480, 21: 480, 22: 480, 23: 480}, 'hasAudio': {0: 'yes', 1: 'yes', 2: 'yes', 3: 'yes', 4: 'yes', 5: 'yes', 6: 'yes', 7: 'yes', 8: 'yes', 9: 'yes', 10: 'yes', 11: 'no', 12: 'no', 13: 'no', 14: 'no', 15: 'no', 16: 'no', 17: 'no', 18: 'no', 19: 'yes', 20: 'yes', 21: 'yes', 22: 'yes', 23: 'yes'}}
df2 = {'Video_path': {0: 'video/file_path/1.mp4',
1: 'video/file_path/2.mp4',
2: 'video/file_path/4.mp4',
3: 'video/file_path/6.mp4',
4: 'video/file_path/7.mp4',
5: 'video/file_path/8.mp4',
6: 'video/file_path/9.mp4'},
'isPresent': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan, 5: nan, 6: nan}
CodePudding user response:
Swap df1
and df2
with left join and indicator
parameter, last set column isPresent
by Series.map
:
newdf= df2.merge(df1.rename(columns={'video_path':'Video_path'}),
on='Video_path',
how='left',
indicator=True)
newdf['isPresent'] = newdf.pop('_merge').map({'both':'yes', 'left_only':'no'})
print (newdf)
Video_path isPresent frame_details width height hasAudio
0 video/file_path/1.mp4 yes frame_1.jpg 520.0 225.0 yes
1 video/file_path/1.mp4 yes frame_2.jpg 520.0 225.0 yes
2 video/file_path/1.mp4 yes frame_3.jpg 520.0 225.0 yes
3 video/file_path/2.mp4 yes frame_1.jpg 120.0 120.0 yes
4 video/file_path/2.mp4 yes frame_2.jpg 120.0 120.0 yes
5 video/file_path/2.mp4 yes frame_3.jpg 120.0 120.0 yes
6 video/file_path/2.mp4 yes frame_4.jpg 120.0 120.0 yes
7 video/file_path/2.mp4 yes frame_5.jpg 120.0 120.0 yes
8 video/file_path/4.mp4 yes frame_1.jpg 1080.0 1920.0 no
9 video/file_path/4.mp4 yes frame_2.jpg 1080.0 1920.0 no
10 video/file_path/4.mp4 yes frame_3.jpg 1080.0 1920.0 no
11 video/file_path/4.mp4 yes frame_4.jpg 1080.0 1920.0 no
12 video/file_path/6.mp4 yes frame_1.jpg 640.0 480.0 yes
13 video/file_path/6.mp4 yes frame_2.jpg 640.0 480.0 yes
14 video/file_path/6.mp4 yes frame_3.jpg 640.0 480.0 yes
15 video/file_path/6.mp4 yes frame_4.jpg 640.0 480.0 yes
16 video/file_path/6.mp4 yes frame_5.jpg 640.0 480.0 yes
17 video/file_path/7.mp4 no NaN NaN NaN NaN
18 video/file_path/8.mp4 no NaN NaN NaN NaN
19 video/file_path/9.mp4 no NaN NaN NaN NaN