Home > Blockchain >  How to implode(back to original) Pandas characters to words separated by NaN rows?
How to implode(back to original) Pandas characters to words separated by NaN rows?

Time:10-25

I have a DataFrame that looks like this:

| characters | result |
|:----------:|:------:| 
| b          | TP    |
| a          | TP    | 
| t          | FN    | 
| NaN        | None  | 
| c          | TN    |  
| o          | FP    |  
| p          | TP    |  

I exploded it before from being "bat" and "cop". Each word is separated by a NaN row. I would like to bring them back into a DataFrame format like this:

| characters | result | word |
|:----------:|:----- :|:----:|
| b          | TP     | bat  |
| a          | TP     | bat  |
| t          | FN     | bat  |
| NaN        | None   | None |
| c          | TN     | cop  |
| o          | FP     | cop  |
| p          | TP     | cop  |

Edit: Please ignore the result column. It's just the characters and word that matters here. The original dataframe consisted of the word column and applied pandas explode() to get the characters column.

CodePudding user response:

You could create a custom group to identify the consecutive non NaN values, then join and map to the original dataframe:

m = df['characters'].isna()
group = (m!=m.shift()).cumsum().mask(m)
to_map = df.groupby(group)['characters'].apply(lambda g: ''.join(g))
df['word'] = group.map(to_map)

output:

  characters result word
0          b     TP  bat
1          a     TP  bat
2          t     FN  bat
3        NaN   None  NaN
4          c     TN  cop
5          o     FP  cop
6          p     TP  cop
  • Related