I am working on a test dataset which is
print(df.head(10))
0 NaN
1 93/2; 99/3; 05/4;
2 NaN
3 NaN
4 NaN
5 NaN
Now i want to convert the string "93/2; 99/3; 05/4;" to a more neat data structure for following analysis. Thus the first step would be so split on the ";"
df= df.apply(lambda x: x.split(';'))
which yields
0 []
1 [93/2, 99/3, 05/4, ]
2 []
3 []
4 []
5 []
6 []
7 []
8 []
9 []
As you see the last element of the list is an empty value, which i want to delete. I was thinking about using the .pop()
function but that yields
df = df.apply(lambda x: x.pop())
print(df.head(10))
0
1
2
3
4
5
6
7
8
9
if i am using slicing
df = df.apply(lambda x: x[:-1])
i get the expected output
0 []
1 [93/2, 99/3, 05/4]
2 []
3 []
4 []
5 []
6 []
7 []
8 []
9 []
Could anyone please explain why the pop function is not working here as i expected?
Thank you in advance!
CodePudding user response:
It works. You assign the return of pop()
back into your df
. pop()
returns the element that gets popped. You assign that element to your df.
CodePudding user response:
There is a vectorial function to split strings str.split
, and one to strip
characters:
Assuming a dataframe here, although your example might indicate you have a Series:
# strip trailing ; # split on ;
df['lst'] = df['col'].str.rstrip(';').str.split(';\s*')
if Series:
ser2 = ser.str.rstrip(';').str.split(';\s*')
output:
col lst
0 NaN NaN
1 93/2; 99/3; 05/4; [93/2, 99/3, 05/4]
2 NaN NaN
3 NaN NaN
4 NaN NaN
5 NaN NaN