I have a DataFrame that looks like this:
| characters | result |
|:----------:|:------:|
| b | TP |
| a | TP |
| t | FN |
| NaN | None |
| c | TN |
| o | FP |
| p | TP |
I exploded it before from being "bat" and "cop". Each word is separated by a NaN row. I would like to bring them back into a DataFrame format like this:
| characters | result | word |
|:----------:|:----- :|:----:|
| b | TP | bat |
| a | TP | bat |
| t | FN | bat |
| NaN | None | None |
| c | TN | cop |
| o | FP | cop |
| p | TP | cop |
Edit:
Please ignore the result column. It's just the characters
and word
that matters here. The original dataframe consisted of the word
column and applied pandas explode()
to get the characters
column.
CodePudding user response:
You could create a custom group to identify the consecutive non NaN values, then join and map to the original dataframe:
m = df['characters'].isna()
group = (m!=m.shift()).cumsum().mask(m)
to_map = df.groupby(group)['characters'].apply(lambda g: ''.join(g))
df['word'] = group.map(to_map)
output:
characters result word
0 b TP bat
1 a TP bat
2 t FN bat
3 NaN None NaN
4 c TN cop
5 o FP cop
6 p TP cop