Home > Software engineering >  Python 3 match values based on column name similarity
Python 3 match values based on column name similarity

Time:07-29

I have a dataframe of the following form:

Year 1 Grade Year 2 Grade Year 3 Grade Year 4 Grade Year 1 Students Year 2 Students Year 3 Students Year 4 Students
60 70 80 100 20 32 18 25

I would like to somehow transpose this table to the following format:

Year Grade Students
1 60 20
2 70 32
3 80 18
4 100 25

I created a list of years and initiated a new dataframe with the "year" column. I was thinking of matching the year integer to the column name containing it in the original DF, match and assign the correct value, but got stuck there.

CodePudding user response:

You need a manual reshaping using a split of the Index into a MultiIndex:

out = (df
 .set_axis(df.columns.str.split(expand=True), axis=1) # make MultiIndex
 .iloc[0]                                             # select row as Series
 .unstack()                                           # unstack Grade/Students
 .droplevel(0)                                        # remove literal "Year"
 .rename_axis('Year')                                 # set index name
 .reset_index()                                       # index to column
)

output:

  Year  Grade  Students
0    1     60        20
1    2     70        32
2    3     80        18
3    4    100        25

Or using pivot_longer from janitor:

import janitor

out = (df
   .pivot_longer(names_sep=' ', names_to=['_', 'Year', 'col'])
   .pivot('Year', 'col', 'value').reset_index() # pivot Grade/Students as columns
)

CodePudding user response:

I came up with these. grades here are your first row.

df = pd.DataFrame(grades) # your dataframe without columns

grades = np.array(df.iloc[0]).reshape(4,2) # extract the first row, turn it into an array and reshape it to 4*2
new_df = pd.DataFrame(grades).reset_index()

new_df.columns = ['Year', 'Grade', 'Students'] # rename the columns

CodePudding user response:

Here's one way to do it. Feel free to ask questions about how it works.

import pandas as pd
cols = ["Year 1 Grade", "Year 2 Grade", "Year 3 Grade" , "Year 4 Grade",
        "Year 1 Students", "Year 2 Students", "Year 3 Students", "Year 4 Students"]
vals = [60,70,80,100,20,32,18,25]
vals = [[v] for v in vals]
df = pd.DataFrame({k:v for k,v in zip(cols,vals)})

grades = df.filter(like="Grade").T.reset_index(drop=True).rename(columns={0:"Grades"})
students = df.filter(like="Student").T.reset_index(drop=True).rename(columns={0:"Students"})
pd.concat([grades,students], axis=1)
  • Related