I have a JSON file from which I'm initially reading into a pandas DF. It looks like this:
{
...
...
"Info": [
{
"Type": "A",
"Desc": "4848",
...
},
{
"Type": "P",
"Desc": "3763",
...
},
{
"Type": "S",
"Desc": "AUBERT",
...
}
],
...
}
I have a function that will loop over the "Info" field and depending on "Type" will store information into a dictionary and return that dictionary. Then I want to create new columns in my df based on the values stored in the dictionary using df.apply
. Please see below:
def extract_info(self):
def extract_data(df):
dic = {'a': None, 'p': None, 's': None}
for info in df['Info']:
if info['Type'] == "A":
dic['a'] = info['Desc']
if info['Type'] == "P":
dic['p'] = info['Desc']
if info['Type'] == "S":
dic['s'] = info['Desc']
return dic
self.df['A'] = self.df.apply(extract_data, axis=1)['a']
self.df['P'] = self.df.apply(extract_data, axis=1)['p']
self.df['S'] = self.df.apply(extract_data, axis=1)['s']
return self
I have also tried doing:
self.df['A'] = self.df.apply(lambda x: extract_data(x['a']), axis=1)
But these are not working for me. I have looked at other SO posts about using df.apply
with function that returns dictionary but did not find what I need for my case. Please help.
I could write 3 separate functions like extract_A
, extract_B
and extract_C
and return single values each to make df.apply
work but that means running the for loop 3 times, one for each function. Any other suggestions other than use of a dictionary is welcome too. Thanks.
CodePudding user response:
I'm not sure where you're getting at with your nested functions and your use of self
. I think you can get what you need with a single function:
input_dict = {
"col1": [1, 2, 3],
"Info": [
{
"Type": "A",
"Desc": "4848",
},
{
"Type": "P",
"Desc": "3763",
},
{
"Type": "S",
"Desc": "AUBERT",
}
]
}
def extract_data(info_col, typ):
if info_col['Type'] == typ:
return info_col['Desc']
df = pd.DataFrame(input_dict)
df['A'] = df['Info'].apply(lambda x: extract_data(x, 'A'))
df['P'] = df['Info'].apply(lambda x: extract_data(x, 'P'))
df['S'] = df['Info'].apply(lambda x: extract_data(x, 'S'))
Output:
col1 Info A P S
0 1 {'Type': 'A', 'Desc': '4848'} 4848 None None
1 2 {'Type': 'P', 'Desc': '3763'} None 3763 None
2 3 {'Type': 'S', 'Desc': 'AUBERT'} None None AUBERT
Is this what you're looking for?
CodePudding user response:
Instead of storing it in a dictionary, I can store them as variables and return them in my extract_data
function. Then I can assign these values to new columns in my self.df
directly using result_type
parameter in df.apply
.
def extract_info(self):
def extract_data(df):
a = None
p = None
s = None
for info in df['Info']:
if info['Type'] == "A":
a = info['Desc']
if info['Type'] == "P":
p = info['Desc']
if info['Type'] == "S":
s = info['Desc']
return a, p, s
self.df[['A', 'P', 'S']] = self.df.apply(extract_data, axis=1, result_type="expand")
return self
Output:
A P S
0 4848 3763 AUBERT
...
...