I have a Dataframe containing various medical measurements of different patients over a number of hours (in this example 2). For instance, the dataframe is something like this:
patientid hour measurementx measurementy
1 1 13.5 2030
1 2 13.9 2013
2 1 11.5 1890
2 2 14.9 2009
Now, I need to construct a new Dataframe that basically groups all measurements for each patient, which would look like this:
patientid hour measurementx measurementy hour measurementx measurementy
1 1 13.5 2030 2 13.9 2013
2 1 11.5 1890 2 14.9 2009
I'm quite new to Python and i have been struggling with this simple operation, I have been trying something like this, , trying to concatenate and empty Dataframe x_binary_compact with my data x_binary
old_id = 1
for row in x_binary.itertuples(index = False):
new_id = row[0]
if new_id == old_id:
pd.concat((x_binary_compact, row), axis=1)
else:
old_id = new_id
pd.concat((x_binary_compact), row, axis=0)
But i get an empty Dataframe as a result, so something is not right
CodePudding user response:
Here is a solution:
import pandas as pd
import numpy as np
df = pd.DataFrame({'patientid': [1, 1, 2, 2],
'hour': [1, 2, 1, 2],
'measurementx': [13.5, 13.9, 11.5, 14.9],
'measurementy': [2030, 2013, 1890, 2009]})
df2 = df.set_index(['patientid', df.groupby('patientid').cumcount() 1]).unstack()
df2.columns = df2.columns.droplevel(1)
# sort columns in steps of 2, even first then odd. If there are 3 for each patient id, would need step of 3, etc.
df2 = df2.iloc[:, list(np.arange(0, len(df2.columns), 2)) list(np.arange(0, len(df2.columns)-1, 2) 1)]
df2
#Out:
# hour measurementx measurementy hour measurementx measurementy
#patientid
#1 1 13.5 2030 2 13.9 2013
#2 1 11.5 1890 2 14.9 2009
You can use .reset_index()
at the end, if you want the patientid
as a column.
Obviously, having multiple columns with the same name is not a great idea if you are then going to analyse it. But if you are printing it, exporting to Excel etc. then this answer works.
CodePudding user response:
I think this might be what you want.
import pandas as pd
import io
s = '''patientid hour measurementx measurementy
1 1 13.5 2030
1 2 13.9 2013
2 1 11.5 1890
2 2 14.9 2009'''
df = pd.read_csv(io.StringIO(s), sep = "\s ")
df.pivot("patientid", "hour", ["hour", "measurementx", "measurementy"])
The result is shown below :
hour measurementx measurementy
hour 1 2 1 2 1 2
patientid
1 1.0 2.0 13.5 13.9 2030.0 2013.0
2 1.0 2.0 11.5 14.9 1890.0 2009.0
Just make sure to rename the column names into unique values and reorder the columns will get your desire table.
new_names = []
for i in df1.columns :
new_names.append(str(i[0]) str(i[1]))
df1.columns = new_names
df1.reset_index()[["patientid", "hour1", "measurementx1", "measurementy1", "hour2", "measurementx2", "measurementy2"]]
Output :
patientid hour1 measurementx1 measurementy1 hour2 measurementx2 measurementy2
1 1.0 13.5 2030.0 2.0 13.9 2013.0
2 1.0 11.5 1890.0 2.0 14.9 2009.0