Home > OS >  Aggregate values pandas
Aggregate values pandas

Time:10-19

I have a pandas dataframe like this:

Id A B C D

1 a b c d
2 a b   d 
2 a   c d
3 a     d
3 a b c

I want to aggregate the empty values for the columns B-C and D, using the values contained in the other rows, by using the information for the same Id.

The resulting data frame should be the following:

Id A B C D
1 a b c d
2 a b c d 
3 a b c d

There can be the possibility to have different values in the first column (A), for the same Id. In this case instead of putting the first instance I prefer to put another value indicating this event.

So for e.g.

Id A B C D   
1 a b c d
2 a b   d 
2 x   c d

It becomes:

Id A B C D   
1 a b c d
2 f b c d 

CodePudding user response:

IIUC, you can use groupby_last:

>>> df.groupby('Id').last()
    A  B  C  D
Id            
1   a  b  c  d
2   a  b  c  d
3   a  b  c  d
>>>  df.groupby('Id').last()
    A  B  C  D
Id            
1   a  b  c  d
2   x  b  c  d

You can use another function like first, sum, mean or a custom function with apply but the idea still stay the same.

CodePudding user response:

The best way I can think to do this is to iterate through each unique Id, slicing it out of the original dataframe, and constructing a new row as a product of merging the relevant rows:

def aggregate(df):
    ids = df['Id'].unique()
    rows = []
    for id in ids:
        relevant = df[df['Id'] == id]
        newrow = {c: "" for c in df.columns}
        for _, row in relevant.iterrows():
            for col in newrow:
                if row[col]:
                    if len(newrow[col]):
                        if newrow[col][-1] == row[col]:
                            continue
                    newrow[col]  = row[col]
        rows.append(newrow)
    return pd.DataFrame(rows)
  • Related