I'm still a python beginner. I would like to extract only records starts with specific prefix like 'Wrong Data' for 'Specific Group' from df:
I'm trying to create a loop, please see below:
names_list = []
for name in df['short_desc']:
if 'Specifc Group' in df['group']:
if name.startswith("Wrong Data"):
names_list.append(name)
But this loop doesn't extract what I would like to have. I'm not sure what went wrong. Could you please help?
CodePudding user response:
You need to use .str.startswith
to find rows where a column starts with a particular value:
subset = df[df['short_desc'].str.startswith('Wrong Data') & df['group'].eq('Specific Group')]
CodePudding user response:
The cool thing about pandas is that you don't have to do these things in a loop.
import pandas as pd
data = [
['Closed', 'j.snow', 'Wrong Data. Contact your admin', 'Specific Group'],
['Closed', 'j.doe', 'General Issue', 'Master Group'],
['Closed', 'j.snow', 'Wrong Data. Contact your admin', 'Specific Group'],
['Closed', 'm.smith', 'Wrong Data. Contact your admin', 'Specific Group'],
['Closed', 'a.richards', 'Wrong Data. Contact your admin', 'Specific Group'],
['Closed', 'a.blecha', 'General Issue', 'Master Group'],
['Closed', 'r.kipling', 'Wrong Data. Contact your admin', 'First Group']
]
df = pd.DataFrame(data, columns=['status', 'created', 'short_desc', 'group'])
print(df)
# Pick only those rows where short_desc starts with "Wrong".
df1 = df[df['short_desc'].str.startswith('Wrong')]
# Pick only those rows where group is "Specific Group".
df1 = df1[df1['group']=='Specific Group']
# Print the "short_desc" column.
print(df1['short_desc'])
Or, in a single line:
df1 = df[
(df['short_desc'].str.startswith('Wrong')) &
(df['group']=='Specific Group')
]
This is pandas' "magic indexing". Those comparison operators return an array of booleans, True where the condition is true. When passing that to df[...]
, that returns only the rows where the array element is True.