I have DataFrame in Python Pandas like below ("col1" is "object" data type):
col1
------------
{"CITY":"MADRID","A_DT":"2022-05-10T00:00","PLACE":"XYT"}
{"CITY":"BERLIN","A_DT":"2022-07-11T00:00","PLACE":"TR"}
{"CITY":"BOSTON","A_DT":null,"PLACE":"abbnpK"}
...
And I need to create new column "col2" (string data type) where I need to have value which is between "A_DT":" and ","PLACE" in appropriate form for date, so as a result I need something like below:
col1 | col2
----------------------------------------------------------|-------------
{"CITY":"MADRID","A_DT":"2022-05-10T00:00","PLACE":"XYT"} | 2022-05-10
{"CITY":"BERLIN","A_DT":"2022-07-11T00:00","PLACE":"TR"} | 2022-07-11
{"CITY":"BOSTON","A_DT":null,"PLACE":"abbnpK"} | null
... | ...
I tried to use code like below, but I have error and it does not work probably because sometimes there is value null sometimes value like "2022-07-11T00:00" with quote:
df.col1.str.split('A_DT":"').apply(lambda x: x[1].split('","PLACE"')
Can you suggest better solution appropriate for my DataFrame in Python Pandas ?
CodePudding user response:
That looks like json data. It's probably a good idea to parse it into a dataframe and work with that.
import json
df2 = pd.DataFrame.from_records(df.col1.apply(json.loads))
gives
CITY A_DT PLACE
0 MADRID 2022-05-10T00:00 XYT
1 BERLIN 2022-07-11T00:00 TR
2 BOSTON None abbnpK
Now you can simply select the data via df2.A_DT
.
CodePudding user response:
You can use this regex to get the string in between your 2 strings
df["date"] = df.col1.astype(str).str.extract("'A_DT': '(.*?)', 'PLACE")
Output:
print(df)
col1 | date | |
---|---|---|
0 | {'CITY': 'MADRID', 'A_DT': '2022-05-10T00:00', 'PLACE': 'XYT'} | 2022-05-10T00:00 |
1 | {'CITY': 'BERLIN', 'A_DT': '2022-07-11T00:00', 'PLACE': 'TR'} | 2022-07-11T00:00 |