I have a dataframe of the following form:
Year 1 Grade | Year 2 Grade | Year 3 Grade | Year 4 Grade | Year 1 Students | Year 2 Students | Year 3 Students | Year 4 Students |
---|---|---|---|---|---|---|---|
60 | 70 | 80 | 100 | 20 | 32 | 18 | 25 |
I would like to somehow transpose this table to the following format:
Year | Grade | Students |
---|---|---|
1 | 60 | 20 |
2 | 70 | 32 |
3 | 80 | 18 |
4 | 100 | 25 |
I created a list of years and initiated a new dataframe with the "year" column. I was thinking of matching the year integer to the column name containing it in the original DF, match and assign the correct value, but got stuck there.
CodePudding user response:
You need a manual reshaping using a split
of the Index into a MultiIndex:
out = (df
.set_axis(df.columns.str.split(expand=True), axis=1) # make MultiIndex
.iloc[0] # select row as Series
.unstack() # unstack Grade/Students
.droplevel(0) # remove literal "Year"
.rename_axis('Year') # set index name
.reset_index() # index to column
)
output:
Year Grade Students
0 1 60 20
1 2 70 32
2 3 80 18
3 4 100 25
Or using pivot_longer
from janitor
:
import janitor
out = (df
.pivot_longer(names_sep=' ', names_to=['_', 'Year', 'col'])
.pivot('Year', 'col', 'value').reset_index() # pivot Grade/Students as columns
)
CodePudding user response:
I came up with these. grades here are your first row.
df = pd.DataFrame(grades) # your dataframe without columns
grades = np.array(df.iloc[0]).reshape(4,2) # extract the first row, turn it into an array and reshape it to 4*2
new_df = pd.DataFrame(grades).reset_index()
new_df.columns = ['Year', 'Grade', 'Students'] # rename the columns
CodePudding user response:
Here's one way to do it. Feel free to ask questions about how it works.
import pandas as pd
cols = ["Year 1 Grade", "Year 2 Grade", "Year 3 Grade" , "Year 4 Grade",
"Year 1 Students", "Year 2 Students", "Year 3 Students", "Year 4 Students"]
vals = [60,70,80,100,20,32,18,25]
vals = [[v] for v in vals]
df = pd.DataFrame({k:v for k,v in zip(cols,vals)})
grades = df.filter(like="Grade").T.reset_index(drop=True).rename(columns={0:"Grades"})
students = df.filter(like="Student").T.reset_index(drop=True).rename(columns={0:"Students"})
pd.concat([grades,students], axis=1)