I am doing cohort analysis and the dataset I'm using has 15 months as the name as columns with revenue and around 7k user_id rows. I need to get a new column with the month when the user was last time active.
2021-01-01 | 2021-02-01 |
---|---|
3456. | Nan |
Nan. | 8679 |
Result should be like this
2021-01-01 | 2021-02-01 | Last_month |
---|---|---|
3456. | Nan | 2021-01-01 |
Nan. | 8679 | 2021-02-01 |
I have tried few options but it didnt work
users.apply(pd.Series.last_valid_index)
CodePudding user response:
using a boolean and idxmax()
might be the solution here
df['last_month'] = (~df.isna()).idxmax(axis=1)
print(df)
2021-01-01 2021-02-01 last_month
0 3456 NaN 2021-01-01
1 NaN 8679 2021-02-01
CodePudding user response:
Example
data = {'2021-01-01': {0: 3456, 1: None}, '2021-02-01': {0: None, 1: 8679}}
df = pd.DataFrame(data)
df
2021-01-01 2021-02-01
0 3456.0 NaN
1 NaN 8679.0
Code
df.apply(lambda x: x.last_valid_index(), axis=1)
output:
0 2021-01-01
1 2021-02-01
dtype: object
full Code
make output to Last_month
column
df.assign(Last_month=df.apply(lambda x: x.last_valid_index(), axis=1))
result:
2021-01-01 2021-02-01 Last_month
0 3456.0 NaN 2021-01-01
1 NaN 8679.0 2021-02-01