I have a DataFrame that has a json array as the values of one column. I want to select one of those as the values of the column and get rid of the rest. I have gotten the desired values into a series but I can't figure out how to join them back to the DataFrame in place of the existing column:
import json
from pandas.io.json import json_normalize
df = pd.DataFrame({
'bank_account': [101, 102, 201, 301],
'data': [
'{"uid": 100, "account_type": 1, "account_data": {"currency": {"current": 1000, "minimum": -500}, "fees": {"monthly": 13.5}}, "user_name": "Alice"}',
'{"uid": 100, "account_type": 2, "account_data": {"currency": {"current": 2000, "minimum": 0}, "fees": {"monthly": 0}}, "user_name": "Alice"}',
'{"uid": 200, "account_type": 1, "account_data": {"currency": {"current": 3000, "minimum": 0}, "fees": {"monthly": 13.5}}, "user_name": "Bob"}',
'{"uid": 300, "account_type": 1, "account_data": {"currency": {"current": 4000, "minimum": 0}, "fees": {"monthly": 13.5}}, "user_name": "Carol"}'
]},
index = ['Alice', 'Alice', 'Bob', 'Carol']
)
lst = []
for d in df['data']:
d = pd.read_json(d, lines=True)['uid'].values[0]
lst.append(d)
s = pd.DataFrame(lst)
df['data'] = s
print(s)
print(df)
returns
0
0 100
1 100
2 200
3 300
bank_account data
Alice 101 NaN
Alice 102 NaN
Bob 201 NaN
Carol 301 NaN
currently and I don't know why the data column shows all nan values. Any help appreciated.
Updated Issue: Some of the rows have lists of json arrays instead of just one. Here is what I have so far:
import json
from pandas.io.json import json_normalize
df = pd.DataFrame({
'bank_account': [101, 102, 201, 301],
'data': [
'[{"uid": 100, "account_type": 1, "account_data": {"currency": {"current": 1000, "minimum": -500}, "fees": {"monthly": 13.5}}, "user_name": "Alice"},{"uid": 150, "account_type": 1, "account_data": {"currency": {"current": 1000, "minimum": -500}, "fees": {"monthly": 13.5}}, "user_name": "jer"}]',
'{"uid": 100, "account_type": 2, "account_data": {"currency": {"current": 2000, "minimum": 0}, "fees": {"monthly": 0}}, "user_name": "Alice"}',
'{"uid": 200, "account_type": 1, "account_data": {"currency": {"current": 3000, "minimum": 0}, "fees": {"monthly": 13.5}}, "user_name": "Bob"}',
'{"uid": 300, "account_type": 1, "account_data": {"currency": {"current": 4000, "minimum": 0}, "fees": {"monthly": 13.5}}, "user_name": "Carol"}'
]},
index = ['Alice', 'Alice', 'Bob', 'Carol']
)
# df["data"] = df["data"].apply(lambda x: pd.read_json(x, lines=True)["uid"][0])
df["data"] = df["data"].apply(lambda array : (",".join(list(map(lambda x : pd.read_json(x, lines=True)["uid"][0], array),(df['data'])))))
print(df)
CodePudding user response:
This works for me:
df = pd.DataFrame({
'bank_account': [101, 102, 201, 301],
'data': [
'{"uid": 100, "account_type": 1, "account_data": {"currency": {"current": 1000, "minimum": -500}, "fees": {"monthly": 13.5}}, "user_name": "Alice"}',
'{"uid": 100, "account_type": 2, "account_data": {"currency": {"current": 2000, "minimum": 0}, "fees": {"monthly": 0}}, "user_name": "Alice"}',
'{"uid": 200, "account_type": 1, "account_data": {"currency": {"current": 3000, "minimum": 0}, "fees": {"monthly": 13.5}}, "user_name": "Bob"}',
'{"uid": 300, "account_type": 1, "account_data": {"currency": {"current": 4000, "minimum": 0}, "fees": {"monthly": 13.5}}, "user_name": "Carol"}'
]},
index = ['Alice', 'Alice', 'Bob', 'Carol']
)
df["data"] = df["data"].apply(lambda x: pd.read_json(x, lines=True)["uid"][0])
Your code does not work because df
and s
have different indices. If you want to fix your code set df['data'] = s[0].values
(instead of df['data'] = s
) before your two print statements.
CodePudding user response:
As @rachwa notes, the issue is that the indexes don't match because the index of s
is numbers while the index of df
is names. If you assign lst
directly instead of casting it to a DataFrame, you will get the desired outcome, i.e.
df['data'] = lst
would work as expected.
You could also use json.loads
instead of read_json
(it should be faster):
import json
df['data'] = [json.loads(d)['uid'] for d in df['data']]
Output:
bank_account data
Alice 101 100
Alice 102 100
Bob 201 200
Carol 301 300