I'm trying to melt a dataframe
Year | Baseline_low | Baseline_high | Overall_low | Overall_high
================================================================
2022 | 1 | 2 | 3 | 4
2023 | 5 | 6 | 7 | 8
to this:
2022 | Baseline | low | 1
2022 | Baseline | high | 2
There are a lot more columns that end with _low
or _high
than I've listed so I would like to use a pattern to split the column.
Is it possible to do this with melt?
I've seen this similar question but it has a slightly different structure to my dataframe.
CodePudding user response:
If you aim to be pure pandas, then yes melt
is they key:
import io
import pandas as pd
df = pd.read_csv(io.StringIO("""Year | Baseline_low | Baseline_high | Overall_low | Overall_high
2022 | 1 | 2 | 3 | 4
2023 | 5 | 6 | 7 | 8"""), sep="|")
df.columns = [column.strip() for column in df.columns]
final = df.melt(id_vars=["Year"])
final["Type"] = final["variable"].apply(lambda x: x.split("_")[0])
final["State"] = final["variable"].apply(lambda x: x.split("_")[1])
Then just split information you need based on column names, it returns:
Year variable value Type State
0 2022 Baseline_low 1 Baseline low
1 2023 Baseline_low 5 Baseline low
2 2022 Baseline_high 2 Baseline high
3 2023 Baseline_high 6 Baseline high
4 2022 Overall_low 3 Overall low
5 2023 Overall_low 7 Overall low
6 2022 Overall_high 4 Overall high
7 2023 Overall_high 8 Overall high
CodePudding user response:
You cannot achieve this with melt
only. You need post-processing or to use a reshaping with a MultiIndex.
It's however a good use case for janitor
's pivot_longer
:
import janitor
df.pivot_longer(
index='Year',
names_to=('.value', 'new'),
names_sep='_',
sort_by_appearance=True
)
output:
Year new Baseline Overall
0 2022 low 1 3
1 2022 high 2 4
2 2023 low 5 7
3 2023 high 6 8
CodePudding user response:
#melt
df2=df.melt('Year')
# split the columns
df2[['col','level']]=df2['variable'].str.split('_', expand=True)
# pivot
df2.pivot(index=['Year','level'], columns='col', values='value').reset_index().rename_axis(columns=None)
#rename_axis, thanks to @mozway!
Year level Baseline Overall
0 2022 high 2 4
1 2022 low 1 3
2 2023 high 6 8
3 2023 low 5 7