I want to extract specific columns that contain specific names. Below you can see my data
import numpy as np
import pandas as pd
data = {
'Names': ['Store (007) Total amount of Sales ',
'Store perc (65) Total amount of sales ',
'Mall store, aid (005) Total amount of sales',
'Increase in the value of sales / Additional seling (22) Total amount of sales',
'Dividends (0233) Amount of income tax',
'Other income (098) Total amount of Sales',
'Other income (0245) Amount of Income Tax',
],
'Sales':[10,10,9,7,5,5,5],
}
df = pd.DataFrame(data, columns = ['Names',
'Sales',
])
df
This data have some specific columns that I need to be selected in the separate data frame. Keywords for this selection are words Total amount of Sales
or Total amount of sales
. These words are placed after the second brackets )
. Also please take into account that text is no trimmed so empty spaces are possible.
So can anybody help me how to solve this ?
CodePudding user response:
Use Series.str.contains
without test cases with case=False
in boolean indexing
:
df1 = df[df['Names'].str.contains('Total amount of Sales', case=False)]
print (df1)
Names Sales
0 Store (007) Total amount of Sales 10
1 Store perc (65) Total amount of sales 10
2 Mall store, aid (005) Total amount of sales 9
3 Increase in the value of sales / Additional se... 7
5 Other income (098) Total amount of Sales 5
Or if need test sales
or Sales
use:
df2 = df[df['Names'].str.contains('Total amount of [Ss]ales')]
CodePudding user response:
try this :
searchfor = ['Total amount of Sales','Total amount of sales']
df = df[df['Names'].str.contains('|'.join(searchfor))]
df.reset_index(inplace=True, drop=True)
print(df)