Pandas: Restucture a dataframe to column values-CodePudding

I have the following dataframe where the cities are columns and ages are the values:

City1	City2	City3
2	14	61
51	73	35
42	38	13
12	75	24
27	42	78

I want to create a new dataframe where the columns are age groups, and the cities are the index, like so:

	0-20	20-40	40-60	60-80
City1	2	1	1	0
City2	1	1	1	0
City3	1	2	0	2

Is this possible to do in pandas?

CodePudding user response：

Try this, using pd.cut:

dfc = pd.cut(df.rename_axis('Cities', axis=1).stack(), 
             bins=[-np.inf,20,40,60,np.inf], 
             labels='0-20 20-40 40-60 60-80'.split(' ')).reset_index()

pd.crosstab(dfc['Cities'], dfc[0]).reset_index()

Output:

0 Cities  0-20  20-40  40-60  60-80
0  City1     2      1      2      0
1  City2     1      1      1      2
2  City3     1      2      0      2

CodePudding user response：

#this should work

import pandas as pd

#creating df

data = [[2, 14, 61], [51, 73, 35], [42, 38, 13], [12, 75, 24], [27, 42, 78]]

df = pd.DataFrame(data, columns = ['city1', 'city2', 'city3'])

#sorting by given intervals

data_new = [[df[(df['city1'] > 0) & (df['city1'] <= 20)]['city1'].count(), df[(df['city1'] > 20) & (df['city1'] <= 40)]['city1'].count(), df[(df['city1'] > 40) & (df['city1'] <= 60)]['city1'].count(), df[(df['city1'] > 60) & (df['city1'] <= 80)]['city1'].count()], [df[(df['city2'] > 0) & (df['city2'] <= 20)]['city2'].count(), df[(df['city2'] > 20) & (df['city2'] <= 40)]['city2'].count(), df[(df['city2'] > 40) & (df['city2'] <= 60)]['city2'].count(), df[(df['city2'] > 60) & (df['city2'] <= 80)]['city2'].count()], [df[(df['city3'] > 0) & (df['city3'] <= 20)]['city3'].count(),df[(df['city3'] > 20) & (df['city3'] <= 40)]['city3'].count(), df[(df['city3'] > 40) & (df['city3'] <= 60)]['city3'].count(), df[(df['city3'] > 60) & (df['city3'] <= 80)]['city3'].count()]]

#creating a new df with new data

df_new = pd.DataFrame(data_new, index= ['city1', 'city2', 'city3'], columns= ['0-20', '20-40', '40-60', '60-80'])

#so the point is to add this "index= ['city1', 'city2', 'city3']," between data and columns when you create a new dataframe

CodePudding user response：

Here is a solution using pd.Series.between for all combinations of the range and the citys.

new_data = []
for city in df.columns:
    new_city = []
    for left, right in [(0,20),(20,40),(40,60),(60,80)]:
        new_city.append(df[city].between(left,right, inclusive="left").sum())
    new_data.append(new_city)
new_df = pd.DataFrame(new_data, columns=["0-20","20-40","40-60","60-80"], index=[df.columns])
new_df