Currently I have a DataFrame as below:
import pandas as pd
import numpy as np
d = {'name': ['a', 'a','a','b','b','b','c','c','c'],
'Year': ['2000', '2010', '2020', '2000', '2010', '2020', '2000', '2010', '2020'],
'v1': [np.NaN, np.NaN, np.NaN, 41, 51, 61, 71, 81, 91],
'v2': [12, 22, 32, np.NaN, 52, np.NaN, 72, 82, 92],
'v3': [13, 23, 33, 43, 53, 63, np.NaN, 83, np.NaN]}
df = pd.DataFrame(d)
df
name Year v1 v2 v3
a 2000 NaN 12 13
a 2010 NaN 22 23
a 2020 NaN 32 33
b 2000 41 NaN 43
b 2010 51 52 53
b 2020 61 NaN 63
c 2000 71 72 NaN
c 2010 81 82 83
c 2020 91 92 NaN
And I'm trying to just get the latest record available for each person as follows:
name v1 v2 v3
a NaN 32 33
b 61 52 63
c 91 92 83
Is there any way I am able to achieve this?
CodePudding user response:
Simply use groupby
last
and as_index=False
as parameter for groupby
:
df.groupby('name', as_index=False).last()
Alternatively, if you know that the last year is "2020"
:
df.query('Year == "2020"')
output:
name Year v1 v2 v3
0 a 2020 NaN 32.0 33.0
1 b 2020 61.0 52.0 63.0
2 c 2020 91.0 92.0 83.0