Home > Mobile >  add values to multiple columns of a pandas dataframe based on a condition
add values to multiple columns of a pandas dataframe based on a condition

Time:11-16

I have a dataframe that look like:

  corpus    zero_level_name time    labels  A   B   C
 0  ff                      f               1   1
 1  gg  g                   G
 2  hh  h                   H                1  1   1
 3  ii  i                   I
 4  jj  j                   J                1

I want to add 0 to all the empty cells from columns A to C. Is it possible to do this in one goal?

CodePudding user response:

Assuming you have either NaNs or empty strings in your DataFrame, you can use:

df.update(df.loc[:, 'A':'C'].replace('', 0).fillna(0))

NB. there is no output, the DataFrame is modified in place

Also note that changing the values does not change the dtypes. If you need integers, rather run:

cols = df.loc[:, 'A':'C'].columns
df[cols] = df[cols].replace('', 0).fillna(0).astype(int)

Updated df:

  corpus zero_level_name time labels  A  B  C
0     ff                    f         1  1  0
1     gg               g    G         0  0  0
2     hh               h    H         1  1  1
3     ii               i    I         0  0  0
4     jj               j    J         1  0  0

If you only have empty strings:

df.update(df.loc[:, 'A':'C'].replace('', 0))

Or only NaNs:

df.update(df.loc[:, 'A':'C'].fillna(0))

CodePudding user response:

So there's probably a better way to do this - but the first thing that comes to mind is:

pd.concat([df[['corpus', 'zero_level_name', 'time', 'labels']],df[['A','B','C']].fillna(0)], axis=1)

I think that gets what you're looking for (the other columns as is, A->C fill blanks with 0 and get it all as one df)

CodePudding user response:

import numpy as np
import pandas as pd

df = pd.DataFrame(
    data=np.array([
        ['ff', 'gg', 'hh', 'ii', 'jj'],
        [None, 'g', 'h', 'i', 'j'],
        ['f', 'G', 'H', 'I', 'J'],
        [None, None, None, None, None],
        [1, None, 1, None, 1],
        [1, None, 1, None, None],
        [None, None, 1, None, None]

    ]).T,
    columns=['corpus', 'zero_level_name', 'time', 'labels', 'A', 'B', 'C'],
)

df[['A', 'B', 'C']] = df[['A', 'B', 'C']].fillna(0)

CodePudding user response:

Select the relevant columns and apply a mask.

cols = ['A', 'B', 'C']
df[cols] = df[cols].mask(df[cols] == '', 0)
  • Related