Home > OS >  How to make the Data Frame into a different view?
How to make the Data Frame into a different view?

Time:12-23

For example, we have data on cities and the population in them in different years.

City 2000 2005 2010 2015
City_1 10 11 12 13
City_2 15 14 13 12
City_3 20 30 25 35

I need to get the DataFrame in this form:

City Year Population
City_1 2000 10
City_1 2005 11
City_1 2010 12
City_1 2015 13
City_2 2000 15
City_2 2005 14
City_2 2010 13
City_2 2015 12
City_3 2000 20
City_3 2005 30
City_3 2010 25
City_3 2015 35

How can this be done using Python?

CodePudding user response:

Simply use pandas.melt() and then (how you wanted) put the results in order by city:

This function is useful to massage a DataFrame into a format where one or more columns are identifier variables (id_vars), while all other columns, considered measured variables (value_vars), are “unpivoted” to the row axis, leaving just two non-identifier columns, ‘variable’ and ‘value’.

import pandas as pd

df = pd.read_csv('csv.csv')
df = pd.melt(df, id_vars=['City'], var_name='Year', value_name='Population')
df = df.sort_values('City').set_index('City') # Sort `City` values so that they go together and set the index back to `City`
print(df)
        Year  Population
City                    
City_1  2000          10
City_1  2005          11
City_1  2010          12
City_1  2015          13
City_2  2000          15
City_2  2005          14
City_2  2010          13
City_2  2015          12
City_3  2000          20
City_3  2005          30
City_3  2010          25
City_3  2015          35

CodePudding user response:

I think you are looking for the melt function. It's basically a reverse pivot that converts your data to a wide long format.

df_unpivoted = df.melt(id_vars=['City'], var_name='Year', value_name='Population')
print(df_unpivoted)

You can read the documentation here if you want to learn more.

https://pandas.pydata.org/docs/reference/api/pandas.melt.html

CodePudding user response:

unstack:

df.set_index('City').unstack().reset_index().sort_values(by='City')

Output:

  level_0   City    0
0   2000    City_1  10
3   2005    City_1  11
6   2010    City_1  12
9   2015    City_1  13
1   2000    City_2  15
4   2005    City_2  14
7   2010    City_2  13
10  2015    City_2  12
2   2000    City_3  20
5   2005    City_3  30
8   2010    City_3  25
11  2015    City_3  35
  • Related