I have this table here and I want to make "Daily ICU occupancy" as a column with the value under as a row. Same for "Daily hospital occupancy". How do I do that? I was thinking to create a new data frame having only "Daily ICU occupancy", same for Daily hospital occupancy" and then merging both data frame in one. However, it does not seem like the best approach. I am sure have a better way to do this.
entity iso_code date indicator value
127422 United States USA 2020-07-15 Daily ICU occupancy 9245.0
127424 United States USA 2020-07-15 Daily hospital occupancy 33759.0
127426 United States USA 2020-07-16 Daily ICU occupancy 9797.0
127428 United States USA 2020-07-16 Daily hospital occupancy 38537.0
127430 United States USA 2020-07-17 Daily ICU occupancy 10700.0
Update: Just to be more clear what I am looking for. I want my new data table to be like this
entity iso_code date Daily ICU occupancy Daily hospital occupancy
127422 United States USA 2020-07-15 9245.0 33759.0
127426 United States USA 2020-07-16 9797.0 38537.0
127430 United States USA 2020-07-17 10700.0 38600.0
CodePudding user response:
you can try using pivot to achieve this. modify the below code
df.pivot_table(index=['entity','iso_code'], columns='indicator',
values='value').reset_index()
Note my answer is based on formatting issues when copying to clipboard. do adjust to suit your use case
indicator entity iso_code ICU occupancy hospital occupancy
0 2020-07-15 Daily 9245.0 33759.0
1 2020-07-16 Daily 9797.0 38537.0
2 2020-07-17 Daily 10700.0 NaN
CodePudding user response:
df = pd.DataFrame({
'entity':[127422, 127424, 127426, 127428, 127430],
'iso_code':['United States', 'United States', 'United States', 'United States', 'United States'],
'date':['USA 2020-07-15', 'USA 2020-07-15', 'USA 2020-07-16', 'USA 2020-07-16', 'USA 2020-07-17'],
'indicator':['Daily ICU occupancy', 'Daily hospital occupancy', 'Daily ICU occupancy', 'Daily hospital occupancy', 'Daily ICU occupancy'],
'value':[9245.0, 33759.0, 9797.0, 38537.0, 10700.0]
})
print(df)
entity iso_code date indicator value
0 127422 United States USA 2020-07-15 Daily ICU occupancy 9245.0
1 127424 United States USA 2020-07-15 Daily hospital occupancy 33759.0
2 127426 United States USA 2020-07-16 Daily ICU occupancy 9797.0
3 127428 United States USA 2020-07-16 Daily hospital occupancy 38537.0
4 127430 United States USA 2020-07-17 Daily ICU occupancy 10700.0
df = df.pivot(index=['iso_code', 'date'], columns='indicator', values='value').reset_index().reset_index(drop=True)
print(df)
indicator iso_code date Daily ICU occupancy \
0 United States USA 2020-07-15 9245.0
1 United States USA 2020-07-16 9797.0
2 United States USA 2020-07-17 10700.0
indicator Daily hospital occupancy
0 33759.0
1 38537.0
2 NaN