Efficient way to match two columns that share the same type of data-CodePudding

In this problem, 'A' and 'B' both store the same kind of data (page numbers). 'Hits_A' is a sum of hits according to 'A' (previous grouping, not shown). I'd like to sum 'Hits_A' based on column 'B', and then relate the values back to the page numbers on column 'A', like so:

import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3, 4, 5, 6, 7], 'B': [3, 4, 5, 2, 1, 1, 6],
                   'Hits_A': [10, 40, 50, 35, 24, 60, 30]})

tmp = df.drop('A', axis=1)
tmp = tmp.groupby('B').sum().reset_index()
tmp = tmp.rename(columns={'B':'A', 'Hits_A':'Hits_B'})

output = pd.merge(df, tmp, how='left', on='A').drop('B', axis=1)

print(df)

yields

   A  B  Hits_A
0  1  3      10
1  2  4      40
2  3  5      50
3  4  2      35
4  5  1      24
5  6  1      60
6  7  6      30

print(output)

yields

   A  Hits_A  Hits_B
0  1      10    84.0
1  2      40    35.0
2  3      50    10.0
3  4      35    40.0
4  5      24    50.0
5  6      60    30.0
6  7      30     NaN

These are the results I want to replicate in a less janky, cleaner looking manner. I'm not very used to things like lambda functions, and was wondering if this could all be achieved in fewer lines?

CodePudding user response：

We can groupby and sum then map the grouped sum to column A

df['Hits_B'] = df['A'].map(df.groupby('B')['Hits_A'].sum())

   A  B  Hits_A  Hits_B
0  1  3      10    84.0
1  2  4      40    35.0
2  3  5      50    10.0
3  4  2      35    40.0
4  5  1      24    50.0
5  6  1      60    30.0
6  7  6      30     NaN