I have dataframe which should be filled by understanding rows understanding like we do in excel. If its continious integer it fill by next number itself.
Is there any function in python like this?
import pandas as pd
d = { 'year': [2019,2020,2019,2020,np.nan,np.nan], 'cat1': [1,2,3,4,np.nan,np.nan], 'cat2': ['c1','c1','c1','c2',np.nan,np.nan]}
df = pd.DataFrame(data=d)
df
year cat1 cat2
0 2019.0 1.0 c1
1 2020.0 2.0 c1
2 2019.0 3.0 c1
3 2020.0 4.0 c2
4 NaN NaN NaN
5 NaN NaN NaN
output required:
year cat1 cat2
0 2019.0 1.0 c1
1 2020.0 2.0 c1
2 2019.0 3.0 c1
3 2020.0 4.0 c2
4 2019.0 5.0 c2 #here can be ignored if it can't understand the earlier pattern
5 2020.0 6.0 c2 #here can be ignored if it can't understand the earlier pattern
I tried df.interpolate(method='krogh') #it fill 1,2,3,4,5,6 but incorrect others
.
CodePudding user response:
I tested some stuff out and did some more research. It appears pandas does not currently offer the functionality you're looking for.
df['cat'].interpolate(method='linear')
will only work if the first/last values are filled in already. You would have to manually assign df.loc[5, 'cat1'] = 6
in this example, then a linear interpolation would work.
Some Options:
If the data is small enough, you can always export to Excel and use the fill there, then bring back into pandas.
Analyze the patterns yourself and design your own fill methods. For example, to get the year, you can use
df['year'] = df.index.to_series().apply(lambda x: 2019 if x % 2 == 0 else 2020)
.
There are other Stack Overflow questions very similar to this, and none that I saw have a generic answer.
CodePudding user response:
Try using fillna(value) method where it replaces the Nan with the value passed into it.