Home > Software engineering >  summing by keywords and by groups in pandas
summing by keywords and by groups in pandas

Time:12-14

I have a following problem:

a dataframe with keywords and groups:

enter image description here

my task is to look for these keywords in another dataframe in the given description and calculate the occurrences in those descriptions - by adding columns. <- I've done that part, using the code:

keywords = df["keyword"].to_list()
for key in keywords:
    new_df[key] = new_df["description"].str.lower().str.count(key)

and the result df is:

enter image description here

I would like to sum them by the group given in keywords dataframe. So all the columns within one group : gas/gases/gasoline change into column gas and sum the results.

CodePudding user response:

If need aggregate values by another column group is possible create dictionary of DataFrames with tuples in keys for groups names, join by concat, aggreagte and add to existing DataFrame:

d = {(key, g): df["description"].str.lower().str.count(key) 
     for key, g in zip(df["keyword"], df['group'])}
  
df = df.join(pd.concat(d, axis=1).groupby(level=1, axis=1).sum())
  • Related