For example I have two dataframes
df = [{'id':1, 'A': 'False', 'B' : 'False' , 'C':'NA'},{'id':2, 'A': 'True', 'B' : 'False' , 'C':'NA'}]
df2 = [{'A':'True', 'B': 'False' , 'C':'NA', 'D': 'True'}]
idea should be to count the values based on the row of df which matches, df2 columns
df:
id | A | B | C |
---|---|---|---|
1 | False | False | NA |
2 | True | False | NA |
df2:
A | B | C | D |
---|---|---|---|
True | False | NA | True |
Output:
id | A | B | C | Count |
---|---|---|---|---|
1 | False | False | NA | 0 |
2 | True | False | NA | 1 |
I tried something like
for i in range(columns): x = action_value_counts_df.columns[i] if compare_column.equals(action_value_counts_df[x]): print(x, 'Matched') else: print(x,'Not Matched')
This code did not help
CodePudding user response:
- Merge
df
,df2
using overlapping columns. - Count overlapping columns.
- Merge count result DataFrame into
df
. - Replace NA with 0 in the 'Count' column.
import pandas as pd
df = pd.DataFrame([
{'id':1, 'A': 'False', 'B' : 'False' , 'C':'NA'},
{'id':2, 'A': 'True', 'B' : 'False' , 'C':'NA'}])
df2 = pd.DataFrame([
{'A':'True', 'B': 'False' , 'C':'NA', 'D': 'True'}])
# 1
match_df = pd.merge(left=df, right=df2, on=['A', 'B', 'C'], how='inner')
match_df = match_df.assign(Count=1)
"""
id A B C D Count
0 2 True False NA True 1
"""
# 2
match_count_df = match_df.groupby(['A', 'B', 'C'], as_index=False).count()
match_count_df = match_count_df[['A', 'B', 'C', 'Count']]
"""
A B C Count
0 True False NA 1
"""
# 3
result_df = pd.merge(left=df, right=match_count_df, how='left')
"""
id A B C Count
0 1 False False NA NaN
1 2 True False NA 1.0
"""
# 4
result_df.loc[:, 'Count'] = result_df['Count'].fillna(0)
"""
id A B C Count
0 1 False False NA 0.0
1 2 True False NA 1.0
"""
CodePudding user response:
You can compare two dataframes rowwise with arranged columns (from df
to df2
) and ignored NaN
values (as they are not comparable):
df.assign(Count=df.set_index('id').apply(lambda x: (x.dropna() == df2[x.index].squeeze().dropna()).all() * 1, axis=1).values)
id A B C Count
0 1 False False NaN 0
1 2 True False NaN 1