This is a sample of my df, consisting of temperature and rain (mm) per city:
Datetime | Berlin_temperature | Dublin_temperature | London_temperature | Paris_temperature | Berlin_rain | Dublin_rain | London_rain | Paris_rain |
---|---|---|---|---|---|---|---|---|
2022-01-01 10:00:00 | 24 | 24 | 24 | 24 | 10 | 10 | 10 | 10 |
2022-01-01 11:00:00 | 24 | 24 | 24 | 24 | 10 | 10 | 10 | 10 |
2022-01-01 12:00:00 | 24 | 24 | 24 | 24 | 10 | 10 | 10 | 10 |
2022-01-01 13:00:00 | 24 | 24 | 24 | 24 | 10 | 10 | 10 | 10 |
I want to achieve following output as a dataframe:
Datetime | City | Temperature | Rainfall |
---|---|---|---|
2022-01-01 10:00:00 | Berlin | 24 | 10 |
2022-01-01 10:00:00 | Dublin | 24 | 10 |
2022-01-01 10:00:00 | London | 24 | 10 |
2022-01-01 10:00:00 | Paris | 24 | 10 |
2022-01-01 11:00:00 | Berlin | 24 | 10 |
2022-01-01 11:00:00 | Dublin | 24 | 10 |
2022-01-01 11:00:00 | London | 24 | 10 |
2022-01-01 11:00:00 | Paris | 24 | 10 |
2022-01-01 12:00:00 | ... | ... | ... |
At the moment I don't know how to achieve this by transposing or something similar. How would this be possible?
CodePudding user response:
Use DataFrame.stack
with MultiIndex
created by splitted columns with _
- but first convert Datetime
to index by DataFrame.set_index
:
df1 = df.set_index('Datetime')
df1.columns = df1.columns.str.split('_', expand=True)
df1 = df1.stack(0).rename_axis(['Datetime','City']).reset_index()
print (df1)
Datetime City rain temperature
0 2022-01-01 10:00:00 Berlin 10 24
1 2022-01-01 10:00:00 Dublin 10 24
2 2022-01-01 10:00:00 London 10 24
3 2022-01-01 10:00:00 Paris 10 24
4 2022-01-01 11:00:00 Berlin 10 24
5 2022-01-01 11:00:00 Dublin 10 24
6 2022-01-01 11:00:00 London 10 24
7 2022-01-01 11:00:00 Paris 10 24
8 2022-01-01 12:00:00 Berlin 10 24
9 2022-01-01 12:00:00 Dublin 10 24
10 2022-01-01 12:00:00 London 10 24
11 2022-01-01 12:00:00 Paris 10 24
12 2022-01-01 13:00:00 Berlin 10 24
13 2022-01-01 13:00:00 Dublin 10 24
14 2022-01-01 13:00:00 London 10 24
15 2022-01-01 13:00:00 Paris 10 24
CodePudding user response:
Using janitor.pivot_longer
:
import janitor
out = df.pivot_longer(index='Datetime', names_to=('City', '.value'),
names_sep='_', sort_by_appearance=True)
Output:
Datetime City temperature rain
0 2022-01-01 10:00:00 Berlin 24 10
1 2022-01-01 10:00:00 Dublin 24 10
2 2022-01-01 10:00:00 London 24 10
3 2022-01-01 10:00:00 Paris 24 10
4 2022-01-01 11:00:00 Berlin 24 10
5 2022-01-01 11:00:00 Dublin 24 10
6 2022-01-01 11:00:00 London 24 10
7 2022-01-01 11:00:00 Paris 24 10
8 2022-01-01 12:00:00 Berlin 24 10
9 2022-01-01 12:00:00 Dublin 24 10
10 2022-01-01 12:00:00 London 24 10
11 2022-01-01 12:00:00 Paris 24 10
12 2022-01-01 13:00:00 Berlin 24 10
13 2022-01-01 13:00:00 Dublin 24 10
14 2022-01-01 13:00:00 London 24 10
15 2022-01-01 13:00:00 Paris 24 10