I have a data frame df
:
Name Student_info School
Rajat {"FirstName":"Rajat", "LastName":"Sinha", "birthDate":"1999-05-01"} XYZ
Vivek {"FirstName":"Vivek", "LastName":"Vishwa", "birthDate":"1999-07-09"} ABC
Ram {"FirstName":"", "LastName":"Ram", "birthDate":"1999-05-09"} ABC
John {"FirstName":"", "LastName":"Mac", "birthDate":"1999-08-03"} ABC
I want to get the index of FirstName
and LastName
(indices of the keys and corresponding values). How can I get this? e.g: index of "FirstName":"Rajat"
inside the Student_info
column.
I tried:
index = [i for i in df.columns if isinstance(data[i][0], dict)]
But this gives index as null (as it's not a dictionary, but a string of values in the form of "key":"value"
pairs)
I want to check if "FirstName":""
is empty:
ValueToCheck = ""
ValuesInDataframe = ? # get the index of empty string of key FirstName
if (ValueToCheck == ValuesInDataframe):
return true
What is the best way to get the index to compare the strings?
CodePudding user response:
I assume your df is being read from a csv (or similar) file, something along the lines of:
data = StringIO('''
Name Student_info School
Rajat {"FirstName":"Rajat","LastName":"Sinha","birthDate":"1999-05-01"} XYZ
Vivek {"FirstName":"Vivek","LastName":"Vishwa","birthDate":"1999-07-09"} ABC
Ram {"FirstName":"","LastName":"Ram","birthDate":"1999-05-09"} ABC
John {"FirstName":"","LastName":"Mac","birthDate":"1999-08-03"} ABC
''')
df = pd.read_csv(data, sep = '\s ')
In this case 'Student_info' has strings that 'look like' dicts.
We can convert them to actual dicts by pply
-ing eval
and then expanding into separate columns:
df_si = df['Student_info'].apply(eval).apply(pd.Series)
df_si
looks like this:
FirstName LastName birthDate
0 Rajat Sinha 1999-05-01
1 Vivek Vishwa 1999-07-09
2 Ram 1999-05-09
3 Mac 1999-08-03
You can combine this with the rest of df using join
:
df = df[['Name', 'School']].join(df_si)
Now df
looks like this:
Name School FirstName LastName birthDate
-- ------ -------- ----------- ---------- -----------
0 Rajat XYZ Rajat Sinha 1999-05-01
1 Vivek ABC Vivek Vishwa 1999-07-09
2 Ram ABC Ram 1999-05-09
3 John ABC Mac 1999-08-03
You can now extract those rows where Firstname is an empty string:
df[df['FirstName']=='']
output:
Name School FirstName LastName birthDate
-- ------ -------- ----------- ---------- -----------
2 Ram ABC Ram 1999-05-09
3 John ABC Mac 1999-08-03
CodePudding user response:
Your question is still a bit unclear. Does this solve your problem:
If the Student_info
column contains strings instead of dictionaries then do this first:
df.Student_info = df.Student_info.map(eval)
Now you can use .str.get
to extract the indices for the rows that meet the requirement:
indices = df[df.Student_info.str.get('FirstName').eq('')].index
Result:
Int64Index([2, 3], dtype='int64')
So
df.loc[indices]
results in:
Name Student_info School
2 Ram {'FirstName': '', 'LastName': 'Ram', 'birthDat...' ABC
3 John {'FirstName': '', 'LastName': 'Mac', 'birthDat...' ABC