Home > Mobile >  Filter dataframe with condition where all substrings in a column start with particular value
Filter dataframe with condition where all substrings in a column start with particular value

Time:02-26

I have below data

   system_id           software_versions
0       1001          ['VA100', 'VB101']
1       1002          ['VA100', 'VA101']
2       1003          ['VA100', 'VB102','VC101']

I need to return rows which has software_versions beginning with VA only.

In this example, only second row matches this criteria, as it contains versions VA100 and VA101.

But other rows have some other version which is not starting with VA

Expected output

   system_id   software_versions
1       1002  ['VA100', 'VA101'] 

I began with this code. I'm not able to proceed further as how I can achieve this.

import pandas as pd
df = pd.DataFrame([[1001,"['VA100', 'VB101']"], [1002,"['VA100', 'VA101']"], [1003,"['VA100', 'VB102','VC101']"]],columns=['system_id', 'software_versions'])

Note: Actual data will come from a csv file, it has more than 1000 entries. I have provided the exact sample data here.

CodePudding user response:

Idea is use generator for test if all values in list starting by VA and filter in boolean indexing:

import ast

df['software_versions'] = df['software_versions'].apply(ast.literal_eval)

df = df[df['software_versions'].map(lambda x: all(y.startswith('VA') for y in x))]
print (df)
   system_id software_versions
1       1001    [VA100, VA101]
  • Related