Home > Mobile >  I need to find the count values using a specific row that matches the columns for the other datafram
I need to find the count values using a specific row that matches the columns for the other datafram

Time:02-04

For example I have two dataframes

df = [{'id':1, 'A': 'False', 'B' : 'False' , 'C':'NA'},{'id':2, 'A': 'True', 'B' : 'False' , 'C':'NA'}]

df2 = [{'A':'True', 'B': 'False' , 'C':'NA', 'D': 'True'}]

idea should be to count the values based on the row of df which matches, df2 columns

df:

id A B C
1 False False NA
2 True False NA

df2:

A B C D
True False NA True

Output:

id A B C Count
1 False False NA 0
2 True False NA 1

I tried something like

for i in range(columns): x = action_value_counts_df.columns[i] if compare_column.equals(action_value_counts_df[x]): print(x, 'Matched') else: print(x,'Not Matched')

This code did not help

CodePudding user response:

  1. Merge df, df2 using overlapping columns.
  2. Count overlapping columns.
  3. Merge count result DataFrame into df.
  4. Replace NA with 0 in the 'Count' column.
import pandas as pd


df = pd.DataFrame([
    {'id':1, 'A': 'False', 'B' : 'False' , 'C':'NA'},
    {'id':2, 'A': 'True', 'B' : 'False' , 'C':'NA'}])
df2 = pd.DataFrame([
    {'A':'True', 'B': 'False' , 'C':'NA', 'D': 'True'}])

# 1
match_df = pd.merge(left=df, right=df2, on=['A', 'B', 'C'], how='inner')
match_df = match_df.assign(Count=1)
"""
   id     A      B   C     D  Count
0   2  True  False  NA  True      1
"""

# 2
match_count_df = match_df.groupby(['A', 'B', 'C'], as_index=False).count()
match_count_df = match_count_df[['A', 'B', 'C', 'Count']]
"""
      A      B   C  Count
0  True  False  NA      1
"""

# 3
result_df = pd.merge(left=df, right=match_count_df, how='left')
"""
   id      A      B   C  Count
0   1  False  False  NA    NaN
1   2   True  False  NA    1.0
"""

# 4
result_df.loc[:, 'Count'] = result_df['Count'].fillna(0)
"""
   id      A      B   C  Count
0   1  False  False  NA    0.0
1   2   True  False  NA    1.0
"""

CodePudding user response:

You can compare two dataframes rowwise with arranged columns (from df to df2) and ignored NaN values (as they are not comparable):

df.assign(Count=df.set_index('id').apply(lambda x: (x.dropna() == df2[x.index].squeeze().dropna()).all() * 1, axis=1).values)

  id      A      B   C  Count
0  1  False  False NaN      0
1  2   True  False NaN      1
  • Related