Python 3 match values based on column name similarity-CodePudding

I have a dataframe of the following form:

Year 1 Grade	Year 2 Grade	Year 3 Grade	Year 4 Grade	Year 1 Students	Year 2 Students	Year 3 Students	Year 4 Students
60	70	80	100	20	32	18	25

I would like to somehow transpose this table to the following format:

Year	Grade	Students
1	60	20
2	70	32
3	80	18
4	100	25

I created a list of years and initiated a new dataframe with the "year" column. I was thinking of matching the year integer to the column name containing it in the original DF, match and assign the correct value, but got stuck there.

CodePudding user response：

You need a manual reshaping using a split of the Index into a MultiIndex:

out = (df
 .set_axis(df.columns.str.split(expand=True), axis=1) # make MultiIndex
 .iloc[0]                                             # select row as Series
 .unstack()                                           # unstack Grade/Students
 .droplevel(0)                                        # remove literal "Year"
 .rename_axis('Year')                                 # set index name
 .reset_index()                                       # index to column
)

output:

  Year  Grade  Students
0    1     60        20
1    2     70        32
2    3     80        18
3    4    100        25

Or using pivot_longer from janitor:

import janitor

out = (df
   .pivot_longer(names_sep=' ', names_to=['_', 'Year', 'col'])
   .pivot('Year', 'col', 'value').reset_index() # pivot Grade/Students as columns
)

CodePudding user response：

I came up with these. grades here are your first row.

df = pd.DataFrame(grades) # your dataframe without columns

grades = np.array(df.iloc[0]).reshape(4,2) # extract the first row, turn it into an array and reshape it to 4*2
new_df = pd.DataFrame(grades).reset_index()

new_df.columns = ['Year', 'Grade', 'Students'] # rename the columns

CodePudding user response：

Here's one way to do it. Feel free to ask questions about how it works.

import pandas as pd
cols = ["Year 1 Grade", "Year 2 Grade", "Year 3 Grade" , "Year 4 Grade",
        "Year 1 Students", "Year 2 Students", "Year 3 Students", "Year 4 Students"]
vals = [60,70,80,100,20,32,18,25]
vals = [[v] for v in vals]
df = pd.DataFrame({k:v for k,v in zip(cols,vals)})

grades = df.filter(like="Grade").T.reset_index(drop=True).rename(columns={0:"Grades"})
students = df.filter(like="Student").T.reset_index(drop=True).rename(columns={0:"Students"})
pd.concat([grades,students], axis=1)