I have a dataframe that looks like this, with many more date columns
AUTHOR 2022-07-01 2022-10-14 2022-10-15 .....
0 Kathrine 0.0 7.0 0.0
1 Catherine 0.0 13.0 17.0
2 Amanda Jane 0.0 0.0 0.0
3 Jaqueline 0.0 3.0 0.0
4 Christine 0.0 0.0 0.0
I would like to set values in each column after the AUTHOR
to 1 when the value is greater than 0, so the resulting table would look like this:
AUTHOR 2022-07-01 2022-10-14 2022-10-15 .....
0 Kathrine 0.0 1.0 0.0
1 Catherine 0.0 1.0 1.0
2 Amanda Jane 0.0 0.0 0.0
3 Jaqueline 0.0 1.0 0.0
4 Christine 0.0 0.0 0.0
I tried the following line of code but got an error, which makes sense. As I need to figure out how to apply this code just to the date columns while also keeping the AUTHOR
column in my table.
Counts[Counts != 0] = 1
TypeError: Cannot do inplace boolean setting on mixed-types with a non np.nan value
CodePudding user response:
You can select the date column first then mask on these columns
cols = df.drop(columns='AUTHOR').columns
# or
cols = df.filter(regex='\d{4}-\d{2}-\d{2}').columns
# or
cols = df.select_dtypes(include='number').columns
df[cols] = df[cols].mask(df[cols] != 0, 1)
print(df)
AUTHOR 2022-07-01 2022-10-14 2022-10-15
0 Kathrine 0.0 1.0 0.0
1 Catherine 0.0 1.0 1.0
2 Amanda Jane 0.0 0.0 0.0
3 Jaqueline 0.0 1.0 0.0
4 Christine 0.0 0.0 0.0
CodePudding user response:
Since you would like to only exclude the first column you could first set it as index and then create your booleans. In the end you will reset the index.
df.set_index('AUTHOR').pipe(lambda g: g.mask(g > 0, 1)).reset_index()
df
AUTHOR 2022-10-14 2022-10-15
0 Kathrine 0.0 1.0
1 Cathrine 1.0 1.0