Home > other >  Manipulating dataframe rows Python
Manipulating dataframe rows Python

Time:11-19

I have a dataframe (adjusted for simplicity) as follows:

                Location Code  Technology  Latitude  Longitude  ... Frequency
0                    ABLSERVP      Type A      11.1       11.1  ...       850
2                    ABLSERVP      Type A      11.1       11.1  ...       700
4                    ABLSERVP      Type B      11.1       11.1  ...       850
...                       ...         ...       ...        ...  ...       ...
1300                    CSEY3      Type A      22.2       22.2  ...      2100
1301                    CSEY3      Type A      22.2       22.2  ...       700
...                       ...         ...       ...        ...  ...       ...
265064                  CSEY1      Type A      33.3       33.3  ...       750
265065                  CSEY3      Type B      22.2       22.2  ...       850

What I'm trying to achieve:

                Location Code  Technologies  Latitude  Longitude  ...  Type A's  Type B's  ...  
0                    ABLSERVP      Type A,B      11.1       11.1  ...   700,850       850  ...
...                       ...         ...         ...       ...        
265064                  CSEY1        Type A      33.3       33.3  ...       750       n/a  ...
265065                  CSEY3      Type A,B      22.2       22.2  ...  700,2100       850  ...

Since I have multiple columns and rows, I included the ellipses to represent. Is there anyway to do this without having to loop through the entire dataframe (I've read that this is inefficient and is one of the LAST resort).

My attempt: I first sorted based on location code as follows:

x=x.sort_values(by='Location Code')

I thought I could get the required result by doing: df = x.groupby(['Location Code', 'Technology']).sum()

This obviously doesn't work as it sums the frequencies instead of listing them. Any help?

CodePudding user response:

Try with groupby, pivot and join:

tech = x.groupby("Location Code")["Technology"].agg(lambda x: ", ".join(x.unique().tolist()))
pivoted = (x.pivot_table(index="Location Code", 
                         columns="Technology", 
                         values="Frequency", 
                         aggfunc=lambda x: ", ".join(x.astype(str)))
           )
output = tech.to_frame().join(pivoted)

>>> output

                   Technology     Type A Type B
Location Code                                  
ABLSERVP       Type A, Type B   850, 700    850
CSEY1                  Type A        750    NaN
CSEY3          Type A, Type B  2100, 700    850
  • Related