Home > Blockchain >  Conditionally Set Values Greater Than 0 To 1
Conditionally Set Values Greater Than 0 To 1

Time:10-29

I have a dataframe that looks like this, with many more date columns

              AUTHOR        2022-07-01  2022-10-14      2022-10-15 .....
0            Kathrine          0.0         7.0              0.0
1            Catherine         0.0         13.0             17.0
2            Amanda Jane       0.0         0.0              0.0
3            Jaqueline         0.0         3.0              0.0
4            Christine         0.0         0.0              0.0

I would like to set values in each column after the AUTHOR to 1 when the value is greater than 0, so the resulting table would look like this:

              AUTHOR        2022-07-01  2022-10-14      2022-10-15 .....
0            Kathrine          0.0         1.0              0.0
1            Catherine         0.0         1.0              1.0
2            Amanda Jane       0.0         0.0              0.0
3            Jaqueline         0.0         1.0              0.0
4            Christine         0.0         0.0              0.0

I tried the following line of code but got an error, which makes sense. As I need to figure out how to apply this code just to the date columns while also keeping the AUTHOR column in my table.

Counts[Counts != 0] = 1


TypeError: Cannot do inplace boolean setting on mixed-types with a non np.nan value

CodePudding user response:

You can select the date column first then mask on these columns

cols = df.drop(columns='AUTHOR').columns
# or
cols = df.filter(regex='\d{4}-\d{2}-\d{2}').columns
# or
cols = df.select_dtypes(include='number').columns

df[cols] = df[cols].mask(df[cols] != 0, 1)
print(df)

        AUTHOR  2022-07-01  2022-10-14  2022-10-15
0     Kathrine         0.0         1.0         0.0
1    Catherine         0.0         1.0         1.0
2  Amanda Jane         0.0         0.0         0.0
3    Jaqueline         0.0         1.0         0.0
4    Christine         0.0         0.0         0.0

CodePudding user response:

Since you would like to only exclude the first column you could first set it as index and then create your booleans. In the end you will reset the index.

df.set_index('AUTHOR').pipe(lambda g: g.mask(g > 0, 1)).reset_index()
df

     AUTHOR  2022-10-14  2022-10-15
0  Kathrine         0.0         1.0
1  Cathrine         1.0         1.0
  • Related