How to find the date of the first occurrence of a value for columns A and B in this data frame?
So, I want 2012-04-03
of A and 2012-04-04
of column B:
| | A | B |
|:--------------------|----:|----:|
| 2012-04-01 00:00:00 | nan | nan |
| 2012-04-02 00:00:00 | nan | nan |
| 2012-04-03 00:00:00 | 4 | nan | <- First occurrence of A
| 2012-04-04 00:00:00 | 6 | 2 | <- First occurrence of B
| 2012-04-05 00:00:00 | 5 | nan |
| 2012-04-06 00:00:00 | nan | 2 |
| 2012-04-07 00:00:00 | 8 | 3 |
| 2012-04-08 00:00:00 | 4 | nan |
Here is the code that makes the df:
df = pd.DataFrame(data={"A":[np.NaN, np.NaN, 4,6,5,np.NaN,8,4],"B":[np.NaN,np.NaN,np.NaN,2,np.NaN,2,3, np.NaN,]}, index=pd.date_range('2012-04-01', '2012-04-08'))
I tried with iterating over the columns, then using dropna()
to get rid of NaNs
then retrieve the date via the index. ... I am sure that there are better ways.
CodePudding user response:
Use first_valid_index
:
>>> df.apply(lambda x: x.first_valid_index())
A 2012-04-03
B 2012-04-04
dtype: datetime64[ns]
CodePudding user response:
for col in df.columns:
print(df[df[col].notna()].head(1))