I have converted an xml file to csv and got this result as a dataframe column "data[column]".
`0 Jan:2018,000/XXX|Dec:2017,000/XXX|Nov:2017,000...
1 Apr:2018,000/XXX|Mar:2018,000/STD|Feb:2018,000...
2 Apr:2019,000/XXX|Mar:2019,000/XXX|Feb:2019,000...
3 Jan:2019,000/XXX|
4 Dec:2018,000/XXX|Nov:2018,000/XXX|Oct:2018,000...
5 Feb:2019,000/XXX|Jan:2019,000/XXX|Dec:2018,000...
6 May:2015,XXX/XXX|Apr:2015,XXX/XXX|Mar:2015,XXX...`
i want this dataframe column to get every first value after comma by splitting it by "|".
example:
000,000,000.....
000,000,000...
000,000,000...
000...
000,000,000...
XXX,XXX,XXX...
and store it in dataframe.
i have used this function:
def my_split(string):
**for x in new.str.split("|"):**
**for y in x:**
**print(y.split(",")[-1][0:3])**
new.apply(my_split)
but i am getting values for every row one after the other.
000
000
000
000
000
000
000
CodePudding user response:
s = """0 Jan:2018,000/XXX|Dec:2017,000/XXX|Nov:2017,000...
1 Apr:2018,000/XXX|Mar:2018,000/STD|Feb:2018,000...
2 Apr:2019,000/XXX|Mar:2019,000/XXX|Feb:2019,000...
3 Jan:2019,000/XXX|
4 Dec:2018,000/XXX|Nov:2018,000/XXX|Oct:2018,000...
5 Feb:2019,000/XXX|Jan:2019,000/XXX|Dec:2018,000...
6 May:2015,XXX/XXX|Apr:2015,XXX/XXX|Mar:2015,XXX..."""
df = pd.DataFrame([x.split(';') for x in s.split('\n')], columns=['col'])
def custom_strip_fnc(m):
ar = [k.split(',')[1][0:3] for k in m.split('|') if (',') in k]
return ar# %%
df['splitted'] = df['col'].apply(custom_strip_fnc)
df
col splitted
0 0 Jan:2018,000/XXX|Dec:2017,000/XXX|Nov:2017,0... [000, 000, 000]
1 1 Apr:2018,000/XXX|Mar:2018,000/STD|Feb:2018,0... [000, 000, 000]
2 2 Apr:2019,000/XXX|Mar:2019,000/XXX|Feb:2019,0... [000, 000, 000]
3 3 Jan:2019,000/XXX| [000]
4 4 Dec:2018,000/XXX|Nov:2018,000/XXX|Oct:2018,0... [000, 000, 000]
5 5 Feb:2019,000/XXX|Jan:2019,000/XXX|Dec:2018,0... [000, 000, 000]
6 6 May:2015,XXX/XXX|Apr:2015,XXX/XXX|Mar:2015,X... [XXX, XXX, XXX]