Home > Net >  Iterating through a Pandas Dataframe to construct a new Dataframe
Iterating through a Pandas Dataframe to construct a new Dataframe

Time:04-10

I have a Dataframe containing various medical measurements of different patients over a number of hours (in this example 2). For instance, the dataframe is something like this:

patientid  hour measurementx measurementy
 1          1    13.5         2030
 1          2    13.9         2013
 2          1    11.5         1890
 2          2    14.9         2009  

Now, I need to construct a new Dataframe that basically groups all measurements for each patient, which would look like this:

patientid  hour measurementx measurementy  hour  measurementx measurementy
1          1    13.5         2030          2     13.9         2013
2          1    11.5         1890          2     14.9         2009

I'm quite new to Python and i have been struggling with this simple operation, I have been trying something like this, , trying to concatenate and empty Dataframe x_binary_compact with my data x_binary

old_id = 1
for row in x_binary.itertuples(index = False):
    new_id = row[0]
    if new_id == old_id:
        pd.concat(x_binary_compact, row, axis=1)
    else:
        old_id = new_id
        pd.concat(x_binary_compact, row, axis=0)

But i get a

TypeError: concat() got multiple values for argument 'axis'

I know my initial idea probably isn't right, so any other possible solution will work

CodePudding user response:

I would recommend to pivot the dataframe and put all values into a list.

df = pd.DataFrame(data={'id': [1, 1, 2, 2], 'hour': [1, 2, 1, 2], 'measurement': [1, 2, 3, 4]})
df.pivot_table(index='id', aggfunc=list)
df.reset_index(drop=True, inplace=True)
print(df)
      hour measurement
id                    
1   [1, 2]      [1, 2]
2   [1, 2]      [3, 4]

And to obtain the values for a specific id:

id = 1
print(df.loc[id, 'measurement'].values)

CodePudding user response:

If df is your dataframe you could try

df_result = pd.concat(
    [sdf for _, sdf in df.set_index("patientid", drop=True).groupby("hour")],
    axis=1
)

Result for

df =

   patientid  hour  measurementx  measurementy
0          1     1          13.5          2030
1          1     2          13.9          2013
2          2     1          11.5          1890
3          2     2          14.9          2009

is

           hour  measurementx  measurementy  hour  measurementx  measurementy
patientid                                                                    
1             1          13.5          2030     2          13.9          2013
2             1          11.5          1890     2          14.9          2009
  • Related