I have a Pandas string series as the following:
s = pd.Series(["12345678.0","45678912.0", "0", "2983129416.0", "62441626.0"])
I first of all must cut the decimal part, and then...
result = s.str.split(".", 1, expand=True)[0]
I want to find a way to return the string if that's length is 8, otherwise return empty string: ""
s[s.str.len() == 8]
Of course, this only would keep the strings that's length is 8, but I need empty strings added to fields where they are not 8 characters long. I couldn't figure out by myself how should this be done properly, so thanks in advance for all the ideas!
Expected result:
s = pd.Series(["12345678","45678912", "", "", "62441626"])
CodePudding user response:
import pandas as pd
import numpy as np
s = pd.Series(["12345678.0","45678912.0", "0", "2983129416.0", "62441626.0"])
# Cut the decimal part
result = s.str.split(".", 1, expand=True)[0]
# Use np.where() to return a new series with the desired output
result = np.where(result.str.len() == 8, result, '')
#or
#result = result.apply(lambda x: x if len(x) == 8 else "")
print(result)
CodePudding user response:
You can use regular expression: search for strings that are 8 digits long from start of the string to the end (ignoring the part after .
):
print( s.str.extract(r'^(\d{8})(?:\.\d )?$').fillna('') )
Prints:
0
0 12345678
1 45678912
2
3
4 62441626
CodePudding user response:
Pandas Series object has a where
method that you can use after you've split the strings on the .
character.
import pandas as pd
s = pd.Series(["12345678.0","45678912.0", "0", "2983129416.0", "62441626.0"])
s = s.str.split(".", 1, expand=True)[0]
print(s)
result = s.where(s.str.len() == 8, "")
print(result)
The where
method takes a condition and an other
argument. If the condition is true, it returns the value from the series. If it's false it returns the other
value.
Output:
0 12345678
1 45678912
2 0
3 2983129416
4 62441626
Name: 0, dtype: object
0 12345678
1 45678912
2
3
4 62441626
Name: 0, dtype: object
Process finished with exit code 0