Home > OS >  group by removing column I'd like to group by in pandas
group by removing column I'd like to group by in pandas

Time:01-19

I'm trying to take a list of list and then add it to pandas to sum up by one value.

My list of list:

[['she', 'walked', 4],
 ['she', 'my', 3],
 ['she', 'dog', 2],
 ['she', 'to', 1],
 ['sniffed', 'I', 5],
 ['sniffed', 'walked', 4],
 ['sniffed', 'my', 3],
 ['sniffed', 'dog', 2],
 ['sniffed', 'to', 1]]

I create the dataframe:

import pandas as pd
df = pd.DataFrame(distanceList, columns = ['word1', 'word2', 'weight']) 

the result looks weird(it has the extra index column for some reason):

    word1   word2   weight
0   I   walked  5
1   I   my  4
2   I   dog 3
3   I   to  2
4   I   the 1
... ... ... ...
1135    I   walked  5
1136    I   my  4
1137    I   dog 3
1138    I   to  2
1139    I   the 1
1140 rows × 3 columns

but when I sum it, seems to combine the words. I used this:

df.groupby('weight').sum()

word1   word2
weight      
1   Iwalkedmydogtotheparkandshesniffedgrassthenrol...   thethethethethetotototototototototototototothe...
2   Iwalkedmydogtotheparkandshesniffedgrassthenrol...   totototodogdogdogdogdogdogdogdogdogdogdogdogdo...
3   Iwalkedmydogtotheparkandshesniffedgrassthenrol...   dogdogdogmymymymymymymymymymymymymymymymydogdo...
4   Iwalkedmydogtotheparkandshesniffedgrassthenrol...   mymywalkedwalkedwalkedwalkedwalkedwalkedwalked...
5   Iwalkedmydogtotheparkandshesniffedgrassthenrol...   walkedIIIIIIIIIIIIIIIIIIwalkedIIIIIIIIIIIIIIII...

What I want is if I have:

dog, cat, 1
dog, cat, 5
dog, rabbit, 1

then the result is:

dog, cat, 6
dog, rabbit, 1

CodePudding user response:

The code you want is as follows.

df.groupby('word1')['weight'].sum()

The code calculates sum of weight according to the word1.

Your code calculates sum of word1 and word2 according to the weight, and sum of strings are concat of strings. That is why the string is concat (e.g, Iwalkedmydogtotheparkandshesniffedgrassthenrol)

Edit I am confusing with the example data. You should try the following code.

df.groupby(['word1', 'word2'], as_index = False)['weight'].sum()

The result as follows.

    word1   word2   weight
0   dog      cat      6
1   dog      rabbit   1
  • Related