I would like to melt my pandas dataframe from this structure...
Name | Color year 2020 | Color year 2019 | Size year 2020 | Size year 2019 |
---|---|---|---|---|
A | red | red | big | small |
B | red | blue | big | medium |
... to this one.
Name | Year | Color | Size |
---|---|---|---|
A | 2020 | red | big |
B | 2020 | red | big |
A | 2019 | red | small |
B | 2019 | blue | medium |
I tried to melt with
df_melted = df.melt(
id_vars=['Name', 'Color year 2020', 'Color year 2019'],
value_vars=['Size year 2020', 'Size year 2019'],
var_name='Year',
value_name='Size')
df_final = df_melted.melt(
id_vars=['Name', 'Year'],
value_vars=['Color year 2020', 'Color year 2019'],
var_name='Year',
value_name='Color')
But it doesn't seem right... Is it possible to do it in one way?
CodePudding user response:
You can check with wide_to_long
out = pd.wide_to_long(df, ['Color year ','Size year '],
i = 'Name', j ='Year',suffix = '\w ')
out.columns = out.columns.str.split().str[0]
out.reset_index(inplace=True)
out
Out[151]:
Name Year Color Size
0 A 2020 red big
1 B 2020 red big
2 A 2019 red small
3 B 2019 blue medium
CodePudding user response:
Another version of pd.wide_to_long
:
out = pd.wide_to_long(df.rename(columns=lambda x: x.replace(' year ', '')),
['Color', 'Size'], 'Name', 'Year').reset_index()
print(out)
# Output
Name Year Color Size
0 A 2020 red big
1 B 2020 red big
2 A 2019 red small
3 B 2019 blue medium
CodePudding user response:
One option is with pivot_longer from pyjanitor:
# pip install pyjanitor
import pandas as pd
import janitor
df.pivot_longer(index = 'Name',
names_to = ('.value', 'year'),
names_sep = ' year ')
Name year Color Size
0 A 2020 red big
1 B 2020 red big
2 A 2019 red small
3 B 2019 blue medium
In the above solution, the .value
determines which parts of the column labels remain as headers - the labels are split apart with the names_sep
.