I have a dataframe like this:
df_test = pd.DataFrame({'ID1':['A','B','C','BA','BA','AB','>','>','>','>'],
'ID2':['','','','','','','mh','mh','nn','nn']})
df_test
I want to obtain a dataframe like this based on the column 'ID1'(1. if len(ID1)>2: then ID1=ID1[-1]
(for example 'BA', 'AB' will be replaced with 'A', 'B', respectively); 2. if ID1='>': then ID1=ID2
(for example: '>' will be replaced with 'mh','nn',respectively)):
df_result = pd.DataFrame({'ID1':['A','B','C','A','A','B','mh','mh','nn','nn']})
df_result
CodePudding user response:
Use str
accessor:
out = df['ID1'].str[-1].replace('>', np.nan).fillna(df['ID2']).to_frame()
print(out)
# Output
ID1
0 A
1 B
2 C
3 A
4 A
5 B
6 mh
7 mh
8 nn
9 nn
CodePudding user response:
You can use .str[-1]
regardless of the length of the strings in the column to select the last character, and use <column>.where(cond, other_col)
to fill in values that don't match cond
with those values from other_col
:
df_test['ID1'] = df_test.assign(ID1=df_test['ID1'].str[-1]).pipe(lambda x: x['ID1'].where(x['ID1'] != '>', x['ID2']))
CodePudding user response:
You can try using np.where
:
import pandas as pd
import numpy as np
df_test = pd.DataFrame({'ID1':['A','B','C','BA','BA','AB','>','>','>','>'],
'ID2':['','','','','','','mh','mh','nn','nn']})
df_test['ID1'] = np.where(df_test['ID1'].str.len()>2, df_test['ID1'].str[-1], df_test['ID1'].str[-1])
df_test['ID1'] = np.where(df_test['ID1'] == '>', df_test['ID2'], df_test['ID1'])
df_test = df_test.drop('ID2', axis=1)
print(df_test)
ID1
0 A
1 B
2 C
3 A
4 A
5 B
6 mh
7 mh
8 nn
9 nn
CodePudding user response:
Let us do mask
df_test.ID1.mask(df_test.ID1.eq('>'),df_test.ID2,inplace=True)
df_test
Out[217]:
ID1 ID2
0 A
1 B
2 C
3 BA
4 BA
5 AB
6 mh mh
7 mh mh
8 nn nn
9 nn nn