Check if a df has any str value rowwise and create a new column with that value-CodePudding

I have a df that looks like

df:

A	B	C	D
2.0	NaN	Text String 1	3.0
Text String 2	NaN	3.0	2.0
2.0	Text String 3	3.0	NaN

I want create a new column new_column that look for all the row values in the column, and as long as any column contains string like [^A-Za-z\s], my new_column get that exact value

for example, my desired output is something like

A	B	C	D	new_column
2.0	NaN	Text String 1	3.0	Text String 1
Text String 2	NaN	3.0	2.0	Text String 2
2.0	Text String 3	3.0	NaN	Text String 3

I've tried str.contains or other df methods but the problem is to get the value for any column that has a text value, not from any specific column.

Thanks!

CodePudding user response：

You can reshape by DataFrame.stack and select strings columns by regex:

s = df.stack()
df['new_column'] = s[s.astype('str').str.contains('(^[A-Za-z\s])')].droplevel(1)

Similar idea is comvert values to numeric if possible and get non numeric strings values:

s = df.stack()
df['new_column'] = s[pd.to_numeric(s, errors='coerce').isna()].droplevel(1)

Or join values by , and select strings with number pattern:

df['new_column'] = df.astype('str').agg(','.join, 1).str.extract('([A-Za-z\s] \d )')
print (df)
               A              B              C    D     new_column
0            2.0            NaN  Text String 1  3.0  Text String 1
1  Text String 2            NaN            3.0  2.0  Text String 2
2            2.0  Text String 3            3.0  NaN  Text String 3

CodePudding user response：

You can write a function to select the string value from a row and then apply that to the dataFrame:

def get_str(row):
    return [v for v in row if type(v) is str][0]

df['new_column'] = df.apply(get_str, axis=1)

Output:

               A              B              C    D     new_column
0            2.0            NaN  Text String 1  3.0  Text String 1
1  Text String 2            NaN            3.0  2.0  Text String 2
2            2.0  Text String 3            3.0  NaN  Text String 3

If a row might not contain a string value, you will need to check for that situation in get_str and return a default value e.g.

def get_str(row):
    try:
        return [v for v in row if type(v) is str][0]
    except:
        return np.nan

df = pd.DataFrame({ 'A' : [2.0, 'Text String 2', 2.0, 3.0], 'B': [np.nan, np.nan, 'Text String 3', 2.0], 'C': ['Text String 1', 3.0, 3.0, np.nan], 'D': [3.0, 2.0, np.nan, 1.0] })

df['new_column'] = df.apply(get_str, axis=1)

Output:

               A              B              C    D     new_column
0            2.0            NaN  Text String 1  3.0  Text String 1
1  Text String 2            NaN            3.0  2.0  Text String 2
2            2.0  Text String 3            3.0  NaN  Text String 3
3            3.0            2.0            NaN  1.0            NaN