Home > Mobile >  Get index of column containing string of key:value pairs
Get index of column containing string of key:value pairs

Time:02-27

I have a data frame df:

Name   Student_info                                                          School
Rajat  {"FirstName":"Rajat", "LastName":"Sinha", "birthDate":"1999-05-01"}   XYZ
Vivek  {"FirstName":"Vivek", "LastName":"Vishwa", "birthDate":"1999-07-09"}  ABC
Ram    {"FirstName":"", "LastName":"Ram", "birthDate":"1999-05-09"}          ABC
John   {"FirstName":"", "LastName":"Mac", "birthDate":"1999-08-03"}          ABC

I want to get the index of FirstName and LastName (indices of the keys and corresponding values). How can I get this? e.g: index of "FirstName":"Rajat" inside the Student_info column.

I tried:

index = [i for i in df.columns if isinstance(data[i][0], dict)]

But this gives index as null (as it's not a dictionary, but a string of values in the form of "key":"value" pairs)

I want to check if "FirstName":"" is empty:

ValueToCheck = ""
ValuesInDataframe = ?  # get the index of empty string of key FirstName
if (ValueToCheck == ValuesInDataframe):
    return true

What is the best way to get the index to compare the strings?

CodePudding user response:

I assume your df is being read from a csv (or similar) file, something along the lines of:

data = StringIO('''
Name     Student_info                                                       School
Rajat    {"FirstName":"Rajat","LastName":"Sinha","birthDate":"1999-05-01"}     XYZ
Vivek    {"FirstName":"Vivek","LastName":"Vishwa","birthDate":"1999-07-09"}    ABC
Ram      {"FirstName":"","LastName":"Ram","birthDate":"1999-05-09"}            ABC
John     {"FirstName":"","LastName":"Mac","birthDate":"1999-08-03"}            ABC
''')
df = pd.read_csv(data, sep = '\s ')

In this case 'Student_info' has strings that 'look like' dicts.

We can convert them to actual dicts by pply-ing eval and then expanding into separate columns:

df_si = df['Student_info'].apply(eval).apply(pd.Series)

df_si looks like this:


    FirstName   LastName    birthDate
0   Rajat       Sinha       1999-05-01
1   Vivek       Vishwa      1999-07-09
2               Ram         1999-05-09
3               Mac         1999-08-03

You can combine this with the rest of df using join:

df = df[['Name', 'School']].join(df_si)

Now df looks like this:

    Name    School    FirstName    LastName    birthDate
--  ------  --------  -----------  ----------  -----------
 0  Rajat   XYZ       Rajat        Sinha       1999-05-01
 1  Vivek   ABC       Vivek        Vishwa      1999-07-09
 2  Ram     ABC                    Ram         1999-05-09
 3  John    ABC                    Mac         1999-08-03

You can now extract those rows where Firstname is an empty string:

df[df['FirstName']=='']

output:

    Name    School    FirstName    LastName    birthDate
--  ------  --------  -----------  ----------  -----------
 2  Ram     ABC                    Ram         1999-05-09
 3  John    ABC                    Mac         1999-08-03

CodePudding user response:

Your question is still a bit unclear. Does this solve your problem:

If the Student_info column contains strings instead of dictionaries then do this first:

df.Student_info = df.Student_info.map(eval)

Now you can use .str.get to extract the indices for the rows that meet the requirement:

indices = df[df.Student_info.str.get('FirstName').eq('')].index

Result:

Int64Index([2, 3], dtype='int64')

So

df.loc[indices]

results in:

   Name                                       Student_info School
2   Ram  {'FirstName': '', 'LastName': 'Ram', 'birthDat...'   ABC
3  John  {'FirstName': '', 'LastName': 'Mac', 'birthDat...'   ABC
  • Related