Home > database >  Filter columns containing values and NaN using specific characters and create seperate columns
Filter columns containing values and NaN using specific characters and create seperate columns

Time:09-07

I have a dataframe containing columns in the below format

df =

ID     Folder Name    Country
300    ABC 12345      CANADA
1000   NaN            USA
450    AML 2233       USA
111    ABC 2234       USA
550    AML 3312       AFRICA

Output needs to be in the below format

ID     Folder Name    Country    Folder Name - ABC   Folder Name - AML
300    ABC 12345      CANADA      ABC 12345             NaN
1000     NaN          USA         NaN                   NaN
450    AML 2233       USA         NaN                   AML 2233
111    ABC 2234       USA         ABC 2234              NaN
550    AML 3312       AFRICA      NaN                   AML 3312

I tried using the below python code:-

df_['Folder Name - ABC'] = df['Folder Name'].apply(lambda x: x.str.startswith('ABC',na = False))

Can you please help me where i am going wrong?

CodePudding user response:

You should not use apply but boolean indexing:

df.loc[df['Folder Name'].str.startswith('ABC', na=False),
       'Folder Name - ABC'] = df['Folder Name']

However, a better approach that would not require you to loop over all possible codes would be to extract the code, pivot_table and merge:

out = df.merge(
         df.assign(col=df['Folder Name'].str.extract('(\w )'))
           .pivot_table(index='ID', columns='col',
                        values='Folder Name', aggfunc='first')
           .add_prefix('Folder Name - '),
         on='ID', how='left'
)

output:

     ID Folder Name Country Folder Name - ABC Folder Name - AML
0   300   ABC 12345  CANADA         ABC 12345               NaN
1  1000         NaN     USA               NaN               NaN
2   450    AML 2233     USA               NaN          AML 2233
3   111    ABC 2234     USA          ABC 2234               NaN
4   550    AML 3312  AFRICA               NaN          AML 3312

CodePudding user response:

the startswith methode return True or False so your column will contains just a boolean values instead you can try this :

df_['Folder Name - ABC'] = df['Folder Name'].apply(lambda x: x if x.str.startswith('ABC',na = False))

CodePudding user response:

does this code do the trick?

df['Folder Name - ABC'] = df['Folder Name'].where(df['Folder Name'].str.startswith('ABC'))
  • Related