I don't know how to move the nan present in each column to the beginning of each of it, translating the different values from nan to the bottom of the column.
import pandas as pd
import numpy as np
q = pd.Series(dtype=int)
w = pd.Series(data=np.array([6,23,4]))
e = pd.Series(data=np.array([2,3,7,89,12,45]))
r = pd.Series(data=np.array([98,25,84,12]))
t = pd.Series(data=np.array([36,41,74,29,86,77,55]))
q = pd.concat([q, w], axis=1)
q = pd.concat([q, e], axis=1)
q = pd.concat([q, r], axis=1)
q = pd.concat([q, t], axis=1)
print(q)
CodePudding user response:
Use DataFrame.sort_values
with key
parameter:
df = pd.concat([q, w,e,r,t], axis=1).apply(lambda x: x.sort_values(key=pd.notna).to_numpy())
print(df)
0 1 2 3 4
0 NaN NaN NaN NaN 36
1 NaN NaN 2.0 NaN 41
2 NaN NaN 3.0 NaN 74
3 NaN NaN 7.0 98.0 29
4 NaN 6.0 89.0 25.0 86
5 NaN 23.0 12.0 84.0 77
6 NaN 4.0 45.0 12.0 55
Another idea is swap order of arrays and after concat
use DataFrame.sort_index
:
q = pd.Series(dtype=int)
w = pd.Series(data=np.array([6,23,4])[::-1])
e = pd.Series(data=np.array([2,3,7,89,12,45])[::-1])
r = pd.Series(data=np.array([98,25,84,12])[::-1])
t = pd.Series(data=np.array([36,41,74,29,86,77,55])[::-1])
df = pd.concat([q, w,e,r,t], axis=1).sort_index(ascending=False, ignore_index=True)
print(df)
0 1 2 3 4
0 NaN NaN NaN NaN 36
1 NaN NaN 2.0 NaN 41
2 NaN NaN 3.0 NaN 74
3 NaN NaN 7.0 98.0 29
4 NaN 6.0 89.0 25.0 86
5 NaN 23.0 12.0 84.0 77
6 NaN 4.0 45.0 12.0 55
CodePudding user response:
Would this make sense to you?
columns = [
[6, 23, 4],
[2, 3, 7, 89, 12, 45],
[98, 25, 84, 12],
[36, 41, 74, 29, 86, 77, 55]
]
max_length = max(len(column) for column in columns)
data = [
[None for _ in range(max_length - len(column))] column
for column in columns
]
df = pd.DataFrame(data, dtype=int).T
print(df)
Result:
w e r t
0 NaN NaN NaN 36
1 NaN 2.0 NaN 41
2 NaN 3.0 NaN 74
3 NaN 7.0 98.0 29
4 6.0 89.0 25.0 86
5 23.0 12.0 84.0 77
6 4.0 45.0 12.0 55
CodePudding user response:
My solution (maybe not the best):
I would use shift function and apply it for each column.
But first we need to name cols properly as later indexing and shifting of values will be done by column names.
col_names = ["A","B","C","D","E"]
q.columns =col_names # renaming cols by list
cols_nan = q.isna().sum() # counting NaN values per each col and placing int pd.series / dataframe
df = q.apply(lambda x: x.shift(periods = cols_nan[x.name])) #using lambda to shift each columns by values int cols_nan
out:
A B C D E
0 NaN NaN NaN NaN 36
1 NaN NaN 2.0 NaN 41
2 NaN NaN 3.0 NaN 74
3 NaN NaN 7.0 98.0 29
4 NaN 6.0 89.0 25.0 86
5 NaN 23.0 12.0 84.0 77
6 NaN 4.0 45.0 12.0 55