I have a dataframe as a result of a pivot which has several thousand columns (representing time-boxed attributes). Below is a much shortened version for resemblance.
d = {'incount - 14:00': [1,'NaN', 1,1,'NaN','NaN','NaN','NaN',1],
'incount - 15:00': [2,1,2,'NaN','NaN','NaN',1,4,'NaN'],
'outcount - 14:00':[2,'NaN',1,1,1,1,2,2,1]
'outcount - 15:00':[2,2,1,1,'NaN',2,'NaN',1,1]}
df = pd.DataFrame(data=d)
I want to replace the NaNs in columns that contain "incount" with 0 (leaving other columns untouched). I have tried the following but predictably it does not recognise the column name.
df['incount'] = df_all['incount'].fillna(0)
I need the ability to search the column names and only impact those containing a defined string.
CodePudding user response:
you can use:
loop_cols = list(df.columns[df.columns.str.contains('incount',na=False)]) #get columns containing incount as a list
#or
#loop_cols = [col for col in df.columns if 'incount' in col]
print(loop_cols)
'''
['incount - 14:00', 'incount - 15:00']
'''
for i in loop_cols:
df[i]=df[i].fillna(0)
CodePudding user response:
try this:
m = df.columns[df.columns.str.startswith('incount')]
df.loc[:, m] = df.loc[:, m].fillna(0)
print(df)