Lets say I have a dataframe like this:
Column1 | Column2 | Column 3 | Column 4 | Column 5 | Column 6 | Column 7 | Platform_key |
---|---|---|---|---|---|---|---|
amazonwebservicesaws | asiapacificmumbai | 38.33 | nan | nan | nan | nan | amazonwebservicesaws_asiapacificmumbai |
amazonwebservicesaws | asiapacificmumbai | nan | nan | nan | nan | 1.83 | amazonwebservicesaws_asiapacificmumbai |
amazonwebservicesaws | asiapacificmumbai | nan | nan | nan | 5 | nan | amazonwebservicesaws_asiapacificmumbai |
amazonwebservicesaws | asiapacificmumbai | nan | nan | 2.21 | nan | nan | amazonwebservicesaws_asiapacificmumbai |
amazonwebservicesaws | asiapacificmumbai | nan | 20.83 | nan | nan | nan | amazonwebservicesaws_asiapacificmumbai |
And I want to combine all these rows (there are 5 in the example, but more in the real dataset) and columns (also more than shown in the dataset below) based on the platform key. So like this:
Column1 | Column2 | Column 3 | Column 4 | Column 5 | Column 6 | Column 7 | Platform_key |
---|---|---|---|---|---|---|---|
amazonwebservicesaws | asiapacificmumbai | 38.33 | 20.83 | 2.21 | 5 | 1.83 | amazonwebservicesaws_asiapacificmumbai |
What is the best way to do this?
CodePudding user response:
We can just groupby
with first
, which will pick the first not NaN
value per col
out = df.groupby(['Platform_key'],as_index=False).first()