I have a dataset like below-:
df = pd.DataFrame({
'state':['California'] * 4 ['Florida'] * 5 ['Minnesota'] * 3 ['New Hampshire'],
'population':['50-100', '0-50', '150-200', '50-100', '0-50', '150-200',
'100-150', 'NA', '0-50', 'NA', '100-150', '50-100', 'NA'],
'locale':['rural', 'urban', 'town', 'suburb', 'suburb', 'urban', 'rural', 'suburb', 'NA', 'town', 'town', 'urban', 'rural']
})
I want new columns for each category in all columns for each state. An example of a row is below-:
state population=0-50 population=50-100 population=100-150 population=150-200 locale=rural locale=urban locale=town locale=suburb
California 1 2 0 1 1 1 1 1
CodePudding user response:
You can play around with this-
df.groupby(['population', 'locale']).apply(lambda x: x['state'].value_counts().to_frame()).unstack().reset_index()
reference - stack thread
CodePudding user response:
Use pd.get_dummies
Groupby.sum()
, as follows:
(pd.get_dummies(df.set_index('state'))
.groupby('state').sum()
.reset_index()
)
Result:
state population_0-50 population_100-150 population_150-200 population_50-100 population_NA locale_NA locale_rural locale_suburb locale_town locale_urban
0 California 1 0 1 2 0 0 1 1 1 1
1 Florida 2 1 1 0 1 1 1 2 0 1
2 Minnesota 0 1 0 1 1 0 0 0 2 1
3 New Hampshire 0 0 0 0 1 0 1 0 0 0
If you want to exclude the entries with value NA
, you can use:
(pd.get_dummies(df[df != 'NA'].set_index('state'))
.groupby('state').sum()
.reset_index()
)
Result:
state population_0-50 population_100-150 population_150-200 population_50-100 locale_rural locale_suburb locale_town locale_urban
0 California 1 0 1 2 1 1 1 1
1 Florida 2 1 1 0 1 2 0 1
2 Minnesota 0 1 0 1 0 0 2 1
3 New Hampshire 0 0 0 0 1 0 0 0