Home > Enterprise >  Extracting features from dataframe
Extracting features from dataframe

Time:03-11

I have pandas dataframe like this

    ID  Phone          ex

0   1   5333371000     533
1   2   5354321938     535
2   3   3840812        384
3   4   5451215        545
4   5   2125121278     212

For example if "ex" start to 533,535,545 new variable should be :

Sample output :

   ID    Phone         ex          iswhat

0   1   5333371000     533         personal
1   2   5354321938     535         personal
2   3   3840812        384         notpersonal
3   4   5451215        545         personal
4   5   2125121278     212         notpersonal

How can i do that ?

CodePudding user response:

You can use np.where:

df['iswhat'] = np.where(df['ex'].isin([533, 535, 545]), 'personal', 'not personal')
print(df)

# Output
   ID       Phone   ex        iswhat
0   1  5333371000  533      personal
1   2  5354321938  535      personal
2   3     3840812  384  not personal
3   4     5451215  545      personal
4   5  2125121278  212  not personal

Update

You can also use your Phone column directly:

df['iswhat'] = np.where(df['Phone'].astype(str).str.match('533|535|545'), 
                        'personal', 'not personal')

Note: If Phone column contains strings you can safely remove .astype(str).

CodePudding user response:

We can use np.where along with str.contains:

df["iswhat"] = np.where(df["ex"].str.contains(r'^(?:533|535|545)$'),
                        'personal', 'notpersonal')
  • Related