Home > OS >  Columns selection on specific text
Columns selection on specific text

Time:01-21

I want to extract specific columns that contain specific names. Below you can see my data

import numpy as np
import pandas as pd


data = {
         'Names': ['Store (007) Total amount of Sales ',
                   'Store perc (65) Total amount of sales ',
                   'Mall store, aid (005) Total amount of sales',
                   'Increase in the value of sales / Additional seling (22) Total amount of sales',
                   'Dividends (0233) Amount of income tax',
                   'Other income (098) Total amount of Sales',
                   'Other income (0245) Amount of Income Tax',
                    ],
         'Sales':[10,10,9,7,5,5,5],
         
         
        }

df = pd.DataFrame(data, columns = ['Names',
                                   'Sales',
                             
                                   ])
df

This data have some specific columns that I need to be selected in the separate data frame. Keywords for this selection are words Total amount of Sales or Total amount of sales . These words are placed after the second brackets ). Also please take into account that text is no trimmed so empty spaces are possible.

enter image description here

So can anybody help me how to solve this ?

CodePudding user response:

Use Series.str.contains without test cases with case=False in boolean indexing:

df1 = df[df['Names'].str.contains('Total amount of Sales', case=False)]
print (df1)
                                               Names  Sales
0                 Store (007) Total amount of Sales      10
1             Store perc (65) Total amount of sales      10
2        Mall store, aid (005) Total amount of sales      9
3  Increase in the value of sales / Additional se...      7
5           Other income (098) Total amount of Sales      5

Or if need test sales or Sales use:

df2 = df[df['Names'].str.contains('Total amount of [Ss]ales')]

CodePudding user response:

try this :

searchfor = ['Total amount of Sales','Total amount of sales']
df = df[df['Names'].str.contains('|'.join(searchfor))]
df.reset_index(inplace=True, drop=True)
print(df)
  • Related