Home > Blockchain >  Pandas String Series, return string if length equals number, otherwise return empty string
Pandas String Series, return string if length equals number, otherwise return empty string

Time:01-26

I have a Pandas string series as the following:

s = pd.Series(["12345678.0","45678912.0", "0", "2983129416.0", "62441626.0"])

I first of all must cut the decimal part, and then...

result = s.str.split(".", 1, expand=True)[0]

I want to find a way to return the string if that's length is 8, otherwise return empty string: ""

s[s.str.len() == 8]

Of course, this only would keep the strings that's length is 8, but I need empty strings added to fields where they are not 8 characters long. I couldn't figure out by myself how should this be done properly, so thanks in advance for all the ideas!

Expected result:

s = pd.Series(["12345678","45678912", "", "", "62441626"])

CodePudding user response:

import pandas as pd
import numpy as np

s = pd.Series(["12345678.0","45678912.0", "0", "2983129416.0", "62441626.0"])

# Cut the decimal part
result = s.str.split(".", 1, expand=True)[0]

# Use np.where() to return a new series with the desired output
result = np.where(result.str.len() == 8, result, '')

#or 

#result = result.apply(lambda x: x if len(x) == 8 else "")

print(result)

CodePudding user response:

You can use regular expression: search for strings that are 8 digits long from start of the string to the end (ignoring the part after .):

print( s.str.extract(r'^(\d{8})(?:\.\d )?$').fillna('') )

Prints:

          0
0  12345678
1  45678912
2          
3          
4  62441626

CodePudding user response:

Pandas Series object has a where method that you can use after you've split the strings on the . character.

import pandas as pd

s = pd.Series(["12345678.0","45678912.0", "0", "2983129416.0", "62441626.0"])
s = s.str.split(".", 1, expand=True)[0]
print(s)

result = s.where(s.str.len() == 8, "")
print(result)

The where method takes a condition and an other argument. If the condition is true, it returns the value from the series. If it's false it returns the other value.

Output:

0      12345678
1      45678912
2             0
3    2983129416
4      62441626

Name: 0, dtype: object
0    12345678
1    45678912
2            
3            
4    62441626
Name: 0, dtype: object

Process finished with exit code 0

  • Related