Home > Enterprise >  How to align columns with in a Dataframe?
How to align columns with in a Dataframe?

Time:04-22

My data: 62 people have 400 data points, some have 200 (and 200 NaN rows).

(Rows = 400, Columns = 62)

I have to drop every other value for the individuals that have 400 data points, so that everyone has 200. I have keep the first row (index 0), drop the second (index 1), keep the third row (index 2), and so on. I cannot just drop all the rows with Nan, because then they wouldn't be aligned.

    p1   p2   p3   p4
0  0.0  1.0  0.0  1.0
1  2.0  3.0  0.3  0.4
2  4.0  5.0  NaN  NaN
3  6.0  7.0  NaN  NaN

Result

    p1   p2   p3   p4
0  0.0  1.0  0.0  1.0
1  4.0  5.0  0.3  0.4

CodePudding user response:

New answer:

Given the input, we could select the even numbered rows from the columns without NaN, reset its index and concatenate it with the columns with NaNs, after dropping their NaNs:

msk = df.isna().any()
pd.concat([df.loc[::2, ~msk].reset_index(drop=True), df.loc[:, msk].dropna()], axis=1)

Output:

    p1   p2   p3   p4
0  0.0  1.0  0.0  1.0
1  4.0  5.0  0.3  0.4

Old answer:

IIUC, you have a DataFrame that looks like below:

    p1   p2   p3   p4
0  0.0  1.0  0.0  1.0
1  2.0  3.0  NaN  NaN
2  4.0  5.0  4.0  NaN
3  6.0  7.0  NaN  7.0
4  8.0  9.0  8.0  9.0

As you noted, if we use df.dropna(), we'll remove most of the observations. IIUC, your approach would not get rid of all the NaN values. I propose an alternative approach, one that shifts all non-NaN values up in each column:

n = 3 # for your case n=200; this is the number of non-NaN values in columns with NaN
out = (df.stack()
       .groupby(level=1).sample(n)
       .reset_index(name='val')
       .assign(i=lambda x: x.groupby('level_1').cumcount())
       .pivot('i', 'level_1', 'val')
       .reset_index(drop=True).rename_axis(columns=[None]))

Output:

    p1   p2   p3   p4
0  2.0  3.0  8.0  9.0
1  4.0  5.0  4.0  1.0
2  8.0  1.0  0.0  7.0
  • Related