Home > Net >  Creating a new Dataframe from an existing one
Creating a new Dataframe from an existing one

Time:04-10

I have a Dataframe containing various medical measurements of different patients over a number of hours (in this example 2). For instance, the dataframe is something like this:

patientid  hour measurementx measurementy
 1          1    13.5         2030
 1          2    13.9         2013
 2          1    11.5         1890
 2          2    14.9         2009 

Now, I need to construct a new Dataframe that basically groups all measurements for each patient, which would look like this:

patientid  hour measurementx measurementy  hour  measurementx measurementy
1          1    13.5         2030          2     13.9         2013
2          1    11.5         1890          2     14.9         2009

I'm quite new to Python and i have been struggling with this simple operation, I have been trying something like this, , trying to concatenate and empty Dataframe x_binary_compact with my data x_binary

old_id = 1
for row in x_binary.itertuples(index = False):
    new_id = row[0]
    if new_id == old_id:
        pd.concat((x_binary_compact, row), axis=1)
    else:
        old_id = new_id
        pd.concat((x_binary_compact), row, axis=0)

But i get an empty Dataframe as a result, so something is not right

CodePudding user response:

Here is a solution:

import pandas as pd
import numpy as np
df = pd.DataFrame({'patientid': [1, 1, 2, 2],
                   'hour': [1, 2, 1, 2],
                   'measurementx': [13.5, 13.9, 11.5, 14.9],
                   'measurementy': [2030, 2013, 1890, 2009]})

df2 = df.set_index(['patientid', df.groupby('patientid').cumcount() 1]).unstack()
df2.columns = df2.columns.droplevel(1)
# sort columns in steps of 2, even first then odd.  If there are 3 for each patient id, would need step of 3, etc.
df2 = df2.iloc[:, list(np.arange(0, len(df2.columns), 2))   list(np.arange(0, len(df2.columns)-1, 2) 1)]

df2
#Out: 
#           hour  measurementx  measurementy  hour  measurementx  measurementy
#patientid                                                                    
#1             1          13.5          2030     2          13.9          2013
#2             1          11.5          1890     2          14.9          2009

You can use .reset_index() at the end, if you want the patientid as a column.

Obviously, having multiple columns with the same name is not a great idea if you are then going to analyse it. But if you are printing it, exporting to Excel etc. then this answer works.

CodePudding user response:

I think this might be what you want.

import pandas as pd
import io

s = '''patientid  hour measurementx measurementy
1          1    13.5         2030
1          2    13.9         2013
2          1    11.5         1890
2          2    14.9         2009'''

df = pd.read_csv(io.StringIO(s), sep = "\s ")
df.pivot("patientid", "hour", ["hour", "measurementx", "measurementy"])

The result is shown below :

             hour      measurementx        measurementy
hour         1  2           1   2            1     2
patientid                       
1           1.0 2.0       13.5  13.9      2030.0  2013.0
2           1.0 2.0       11.5  14.9      1890.0  2009.0

Just make sure to rename the column names into unique values and reorder the columns will get your desire table.

new_names = []
for i in df1.columns :
    new_names.append(str(i[0]) str(i[1]))
df1.columns = new_names
df1.reset_index()[["patientid", "hour1", "measurementx1", "measurementy1", "hour2", "measurementx2", "measurementy2"]]

Output :

patientid   hour1   measurementx1   measurementy1   hour2   measurementx2   measurementy2
1            1.0         13.5          2030.0        2.0         13.9       2013.0
2            1.0         11.5          1890.0        2.0         14.9       2009.0
  • Related