Find date of the first occurance of values in columns of a data frame

How to find the date of the first occurrence of a value for columns A and B in this data frame?

So, I want 2012-04-03 of A and 2012-04-04 of column B:

|                     |   A |   B |
|:--------------------|----:|----:|
| 2012-04-01 00:00:00 | nan | nan |
| 2012-04-02 00:00:00 | nan | nan |
| 2012-04-03 00:00:00 |   4 | nan | <- First occurrence of A
| 2012-04-04 00:00:00 |   6 |   2 | <- First occurrence of B
| 2012-04-05 00:00:00 |   5 | nan |
| 2012-04-06 00:00:00 | nan |   2 |
| 2012-04-07 00:00:00 |   8 |   3 |
| 2012-04-08 00:00:00 |   4 | nan |

Here is the code that makes the df:

df = pd.DataFrame(data={"A":[np.NaN, np.NaN, 4,6,5,np.NaN,8,4],"B":[np.NaN,np.NaN,np.NaN,2,np.NaN,2,3, np.NaN,]}, index=pd.date_range('2012-04-01', '2012-04-08'))

I tried with iterating over the columns, then using dropna() to get rid of NaNs then retrieve the date via the index. ... I am sure that there are better ways.

CodePudding user response：

Use first_valid_index:

>>> df.apply(lambda x: x.first_valid_index())
A   2012-04-03
B   2012-04-04
dtype: datetime64[ns]

CodePudding user response：

for col in df.columns:
    print(df[df[col].notna()].head(1))