Home > other >  Move the NaNs from the end to the beginning of each column while maintaining the order of the number
Move the NaNs from the end to the beginning of each column while maintaining the order of the number

Time:09-06

I don't know how to move the nan present in each column to the beginning of each of it, translating the different values from nan to the bottom of the column.

    import pandas as pd
    import numpy as np

    q = pd.Series(dtype=int)
    w = pd.Series(data=np.array([6,23,4]))
    e = pd.Series(data=np.array([2,3,7,89,12,45]))
    r = pd.Series(data=np.array([98,25,84,12]))
    t = pd.Series(data=np.array([36,41,74,29,86,77,55]))
    
    q = pd.concat([q, w], axis=1)
    q = pd.concat([q, e], axis=1)
    q = pd.concat([q, r], axis=1)
    q = pd.concat([q, t], axis=1)
    
    print(q)

CodePudding user response:

Use DataFrame.sort_values with key parameter:

df = pd.concat([q, w,e,r,t], axis=1).apply(lambda x: x.sort_values(key=pd.notna).to_numpy())

print(df)
    0     1     2     3   4
0 NaN   NaN   NaN   NaN  36
1 NaN   NaN   2.0   NaN  41
2 NaN   NaN   3.0   NaN  74
3 NaN   NaN   7.0  98.0  29
4 NaN   6.0  89.0  25.0  86
5 NaN  23.0  12.0  84.0  77
6 NaN   4.0  45.0  12.0  55

Another idea is swap order of arrays and after concat use DataFrame.sort_index:

q = pd.Series(dtype=int)
w = pd.Series(data=np.array([6,23,4])[::-1])
e = pd.Series(data=np.array([2,3,7,89,12,45])[::-1])
r = pd.Series(data=np.array([98,25,84,12])[::-1])
t = pd.Series(data=np.array([36,41,74,29,86,77,55])[::-1])


df = pd.concat([q, w,e,r,t], axis=1).sort_index(ascending=False, ignore_index=True)
print(df)
    0     1     2     3   4
0 NaN   NaN   NaN   NaN  36
1 NaN   NaN   2.0   NaN  41
2 NaN   NaN   3.0   NaN  74
3 NaN   NaN   7.0  98.0  29
4 NaN   6.0  89.0  25.0  86
5 NaN  23.0  12.0  84.0  77
6 NaN   4.0  45.0  12.0  55

CodePudding user response:

Would this make sense to you?

columns = [
    [6, 23, 4],
    [2, 3, 7, 89, 12, 45],
    [98, 25, 84, 12],
    [36, 41, 74, 29, 86, 77, 55]
]

max_length = max(len(column) for column in columns)
data = [
    [None for _ in range(max_length - len(column))]   column
    for column in columns
]
df = pd.DataFrame(data, dtype=int).T
print(df)

Result:

      w     e     r   t
0   NaN   NaN   NaN  36
1   NaN   2.0   NaN  41
2   NaN   3.0   NaN  74
3   NaN   7.0  98.0  29
4   6.0  89.0  25.0  86
5  23.0  12.0  84.0  77
6   4.0  45.0  12.0  55

CodePudding user response:

My solution (maybe not the best):

I would use shift function and apply it for each column.

But first we need to name cols properly as later indexing and shifting of values will be done by column names.

col_names = ["A","B","C","D","E"]

q.columns =col_names # renaming cols by list

cols_nan = q.isna().sum() # counting NaN values per each col and placing int pd.series / dataframe

df = q.apply(lambda x: x.shift(periods = cols_nan[x.name])) #using lambda to shift each columns by values int cols_nan

out:

   A     B     C     D   E
0 NaN   NaN   NaN   NaN  36
1 NaN   NaN   2.0   NaN  41
2 NaN   NaN   3.0   NaN  74
3 NaN   NaN   7.0  98.0  29
4 NaN   6.0  89.0  25.0  86
5 NaN  23.0  12.0  84.0  77
6 NaN   4.0  45.0  12.0  55
  • Related