Home > OS >  Create a loop startswith specific string pandas
Create a loop startswith specific string pandas

Time:05-10

I'm still a python beginner. I would like to extract only records starts with specific prefix like 'Wrong Data' for 'Specific Group' from df:

enter image description here

I'm trying to create a loop, please see below:

names_list = []
for name in df['short_desc']:
    if 'Specifc Group' in df['group']:
        if name.startswith("Wrong Data"):
            names_list.append(name)

But this loop doesn't extract what I would like to have. I'm not sure what went wrong. Could you please help?

CodePudding user response:

You need to use .str.startswith to find rows where a column starts with a particular value:

subset = df[df['short_desc'].str.startswith('Wrong Data') & df['group'].eq('Specific Group')]

CodePudding user response:

The cool thing about pandas is that you don't have to do these things in a loop.

import pandas as pd
data = [
    ['Closed', 'j.snow', 'Wrong Data.  Contact your admin', 'Specific Group'],
    ['Closed', 'j.doe', 'General Issue', 'Master Group'],
    ['Closed', 'j.snow', 'Wrong Data.  Contact your admin', 'Specific Group'],
    ['Closed', 'm.smith', 'Wrong Data.  Contact your admin', 'Specific Group'],
    ['Closed', 'a.richards', 'Wrong Data.  Contact your admin', 'Specific Group'],
    ['Closed', 'a.blecha', 'General Issue', 'Master Group'],
    ['Closed', 'r.kipling', 'Wrong Data.  Contact your admin', 'First Group']
]

df = pd.DataFrame(data, columns=['status', 'created', 'short_desc', 'group'])
print(df)
# Pick only those rows where short_desc starts with "Wrong".
df1 = df[df['short_desc'].str.startswith('Wrong')]
# Pick only those rows where group is "Specific Group".
df1 = df1[df1['group']=='Specific Group'] 
# Print the "short_desc" column.
print(df1['short_desc'])

Or, in a single line:

df1 = df[
        (df['short_desc'].str.startswith('Wrong')) &
        (df['group']=='Specific Group')
    ] 

This is pandas' "magic indexing". Those comparison operators return an array of booleans, True where the condition is true. When passing that to df[...], that returns only the rows where the array element is True.

  • Related