I have a dataframe (adjusted for simplicity) as follows:
Location Code Technology Latitude Longitude ... Frequency
0 ABLSERVP Type A 11.1 11.1 ... 850
2 ABLSERVP Type A 11.1 11.1 ... 700
4 ABLSERVP Type B 11.1 11.1 ... 850
... ... ... ... ... ... ...
1300 CSEY3 Type A 22.2 22.2 ... 2100
1301 CSEY3 Type A 22.2 22.2 ... 700
... ... ... ... ... ... ...
265064 CSEY1 Type A 33.3 33.3 ... 750
265065 CSEY3 Type B 22.2 22.2 ... 850
What I'm trying to achieve:
Location Code Technologies Latitude Longitude ... Type A's Type B's ...
0 ABLSERVP Type A,B 11.1 11.1 ... 700,850 850 ...
... ... ... ... ...
265064 CSEY1 Type A 33.3 33.3 ... 750 n/a ...
265065 CSEY3 Type A,B 22.2 22.2 ... 700,2100 850 ...
Since I have multiple columns and rows, I included the ellipses to represent. Is there anyway to do this without having to loop through the entire dataframe (I've read that this is inefficient and is one of the LAST resort).
My attempt: I first sorted based on location code as follows:
x=x.sort_values(by='Location Code')
I thought I could get the required result by doing: df = x.groupby(['Location Code', 'Technology']).sum()
This obviously doesn't work as it sums the frequencies instead of listing them. Any help?
CodePudding user response:
Try with groupby
, pivot
and join
:
tech = x.groupby("Location Code")["Technology"].agg(lambda x: ", ".join(x.unique().tolist()))
pivoted = (x.pivot_table(index="Location Code",
columns="Technology",
values="Frequency",
aggfunc=lambda x: ", ".join(x.astype(str)))
)
output = tech.to_frame().join(pivoted)
>>> output
Technology Type A Type B
Location Code
ABLSERVP Type A, Type B 850, 700 850
CSEY1 Type A 750 NaN
CSEY3 Type A, Type B 2100, 700 850