The dataset I'm working with is as follows. I'd want to add new columns to the table based on the row classifications. At the same time, I'd want to have only one row every month. I'm not sure if I should use groupby or not.
Year Month Index Humidity Temperature Pressure date
2019 1 High 100% 30 °C 1021 mbar 20191
2019 1 Low 28% 9 °C 1011 mbar 20191
2019 1 Average 65% 21 °C 1016 mbar 20191
2019 2 High 100% 32 °C 1020 mbar 20192
2019 2 Low 28% 10 °C 1008 mbar 20192
2019 2 Average 63% 18°C 1014 mbar 20192
So the output value looks like this:
**Desired output**
Year Month HighHumidity LowHumidity AverageHumidity HighTemperature LowTemperature AverageTemperature HighPressure LowPressure AveragePressure date
2019 1 100% 28% 65% 30 °C 9 °C 21 °C 1021 mbar 1011 mbar 1016 mbar 20191
2019 2 100% 27% 63% 32 °C 10 °C 18 °C 1020 mbar 1008 mbar 1014 mbar 20192
I experimented with the following code. However, the index column works with all other columns, however I only want to use particular columns in this operation.:
df = df.pivot(index='date', columns=['Index'])
df.columns = ['_'.join((col[1], col[0])) for col in df.columns]
Average_Year High_Year Low_Year Average_Month High_Month Low_Month Average_Humidity High_Humidity Low_Humidity Average_Temperature High_Temperature Low_Temperature Average_Pressure High_Pressure Low_Pressure
date
20191 2019 2019 2019 1 1 1 65% 100% 28% 21 °C 30 °C 9 °C 1016 mbar 1021 mbar 1011 mbar
201910 2019 2019 2019 10 10 10 81% 100% 49% 29 °C 35 °C 22 °C 1011 mbar 1016 mbar 1007 mbar
201911 2019 2019 2019 11 11 11 77% 100% 49% 26 °C 33 °C 16 °C 1013 mbar 1017 mbar 1006 mbar
201912 2019 2019 2019 12 12 12 77% 100% 38% 21 °C 31 °C 10 °C 1016 mbar 1021 mbar 1012 mbar
20192 2019 2019 2019 2 2 2 65% 100% 28% 23 °C 32 °C 10 °C 1015 mbar 1020 mbar 1008 mbar
CodePudding user response:
You can add Year
and Month
to parameter index
:
df = df.pivot(index=['Year','Month','date'], columns=['Index'])
df.columns = ['_'.join((col[1], col[0])) for col in df.columns]
df = df.reset_index()
#for correct order date column
df['date'] = df.pop('date')
print (df)
Year Month Average_Humidity High_Humidity Low_Humidity \
0 2019 1 65% 100% 28%
1 2019 2 63% 100% 28%
Average_Temperature High_Temperature Low_Temperature Average_Pressure \
0 21°C 30°C 9°C 1016mbar
1 18°C 32°C 10°C 1014mbar
High_Pressure Low_Pressure date
0 1021mbar 1011mbar 20191
1 1020mbar 1008mbar 20192