I have a column of data with rows where NaN exists (see image). I intend splitting it where values are NaN and create new columns where a value emerges after NaN. For instance, I intend to create a new column at row 7 and subsequent rows where succeeding NaN values in the column. I have tried this but it congests the data together.
Col1
0 Start
1 65
2 oft
3 23:59:02
4 12-Feb-99
5 NaN
6 NaN
7 17
8 Sparkle
9 10
I have used the code below to break them into groups. df['group_no'] = (df.Column1.isnull()).cumsum()
Col1 groups
0 Start 0
1 65 0
2 oft 0
3 23:59:02 0
4 12-Feb-99. 0
5 NaN 1
6 NaN 2
7 17 2
8 Sparkle 2
9 10 2
I now intend to stack the the data into different columns based on the groups numbers
Col1 Col2 Col3 ... ColN
0 Start NaN Nan ...
1 65 17 ....
2 oft Sparkle ....
3 23:59:02 10 ...
4 12-Feb-99
CodePudding user response:
I suggest slicing pandas dataframe manually instead of using numpy to slice.
# Get index of Null values
index = df.index[df.col.isna()].to_list()
starting_index = [0] [i 1 for i in index]
ending_index = [i - 1 for i in index] [len(df) - 1]
n = 0
for i, j in zip(starting_index, ending_index):
if i <= j:
n = 1
df[f"col{n}"] = np.nan
df.loc[: j - i, f"col{n}"] = df.loc[i:j, "col"].values