Looking at building out some logging of errors and trying to catch null values in specific columns.
Essentially, I want to go from a dataframe and list of columns, to then output a dataframe with a column containing which of those columns from the list are null for each row. Note, I will also be doing this for negative values etc.
Example:
columns_list = ['A','B','D']
Date | A | B | C | D |
---|---|---|---|---|
2022-01-01 | 1 | 22 | 1231 | -121 |
2022-01-02 | 11 | NaN | NaN | NaN |
2022-01-03 | NaN | 52 | 12 | 0 |
2022-01-04 | 11 | 27 | NaN | 3434 |
The following code will give the following output but I want to be able to use columns_list to not have column C being returned in X:
df['X']= df.apply(lambda x: ','.join(x[x.isnull()].index), axis=1)
Date | A | B | C | D | X |
---|---|---|---|---|---|
2022-01-02 | 11 | NaN | NaN | NaN | B,C,D |
2022-01-03 | NaN | 52 | 12 | 0 | A |
2022-01-04 | 11 | 27 | NaN | 3434 | C |
Thanking you all in advance!
CodePudding user response:
Just subset your columns:
df['X']= df[columns_list].apply(lambda x: ','.join(x[x.isnull()].index), axis=1)