I am working on Python Pandas. Dataframe is like:
Group | Group1 | Group2 | Group3 |
---|---|---|---|
11abc | q | q | q |
q | q | q | q |
q | 22abc | q | q |
I would like to create an additional column. If in the same row, there is an item matched substring "abc", the exact value will be copied to the new column.
Expected result:
|Group | Group1 | Group2 | Group3 | NewColumn |
|------|---------|---------|---------|-----------|
|11abc | q | q | q | 11abc |
|q | q | q | q | (no value)|
|q | 22abc | q | q | 22abc |
I googled a while and find out I can use pd.series.str.contains('abc')
to return TRUE. Then I planned to to loop through the columns. First of all I tried one column first by:
IF df_target[df_target['Group'].str.contains('abc')]:
df_target['NewColumn'] = df_target['Group']
However, I encounter the error "Cannot mask with non-boolean array containing NA / NaN values
". I have a step backward and I am not sure if I am on the correct direction. I would like to check with the community what would be the appropriate way in doing this kind of data transformation task in Pandas?
Many thanks..
CodePudding user response:
IIUC, stack and perform your match:
s = df.stack()
out = df.join(s[s.str.contains('abc')]
.groupby(level=0).agg(','.join)
.rename('NewColumn'))
output:
Group Group1 Group2 Group3 NewColumn
0 11abc q q q 11abc
1 q q q q NaN
2 q 22abc q q 22abc
CodePudding user response:
If posible multiple match you can filter rows with join by ,
:
df['NewColumn'] = df.apply(lambda x: ', '.join(x[x.str.contains('abc')]), axis=1)
print (df)
Group Group1 Group2 Group3 NewColumn
0 11abc q q q 11abc
1 q q q q
2 q 22abc q q 22abc