I have the following dataframe:
data = {'feature1_in_use?': [0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1],
'feature1_available?': [0, 1, 1, 'NA', 1, 'NA', 1, 1, 'NA', 1, 1]}
df = pd.DataFrame(data)
In the column 'feature1_available?' I want to fill the nan with the values from 'feature1_in_use? ' only when these are 1, otherwise fill it up with 'X' or any valid string.
It should look like this:
data = {'feature1_in_use?': [0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1],
'feature1_available?': [0, 1, 1, 'X', 1, 1, 1, 1, 1, 1, 1]}
df = pd.DataFrame(data)
CodePudding user response:
Lets say you want your blank value to be 'x' then do the following:
df['feature1_available'] = df['feature1_available'].fillna(df['feature1_in_use'])
df['feature1_in_use'] = df['feature1_in_use'].fillna('x')
Step 1 fills blank values in feature1_available from feature1_in_use. Step 2 fills any remaining nan values with 'x' (or any other value you choose
CodePudding user response:
A flexible option you could use would be np.select()
data = {'feature1_in_use?': [0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1],
'feature1_available?': [0, 1, 1, 'NA', 1, 'NA', 1, 1, 'NA', 1, 1]}
df = pd.DataFrame(data)
condition_list = [
((df['feature1_available?'] == 'NA') & (df['feature1_in_use?'] == 1)),
((df['feature1_available?'] == 'NA') & (df['feature1_in_use?'] == 0))
]
choice_list = [df['feature1_in_use?'], 'X']
df['feature1_available?'] = np.select(condition_list, choice_list, df['feature1_available?'])
df
CodePudding user response:
The first line creates a series where everything other than 1 is 'X'. Then 'NA' is replaced with np.NaN
which allows us to fill in the missing values with the series that was created
s = df['feature1_in_use?'].where(df['feature1_in_use?'].eq(1),'X')
df.replace('NA',np.NaN).fillna({'feature1_available?':s})
Output:
feature1_in_use? feature1_available?
0 0 0.0
1 0 1.0
2 1 1.0
3 0 X
4 1 1.0
5 1 1
6 1 1.0
7 0 1.0
8 1 1
9 0 1.0
10 1 1.0