split dataframe character into small set of character list in dataframe?
This is a dataframe, I need to split into as 10 10 character in a list of dataframe.
| contact_num |
| -------------------------------|
| 01111784885788634878 |
| 247782788869775178889785427889 |
| not available |
| 2478544756 |
expected output:
| contact_num |
| ------------------------------- --|
| [0111178488,5788634878] |
| [2477827888,6977517888,9785427889]|
| not available |
| [2478544756] |
CodePudding user response:
Try:
mask = df["contact_num"].str.contains(r"^\d{10,}$", regex=True)
df.loc[mask, "contact_num"] = df.loc[mask, "contact_num"].str.findall(r"\d{10}")
print(df)
Prints:
contact_num
0 [0111178488, 5788634878]
1 [2477827888, 6977517888, 9785427889]
2 not available
3 [2478544756]
CodePudding user response:
You can use the pandas apply with a custom function (just to have control over what you are doing, otherwise you can do it in a more pythonic and less verbose way).
import pandas as pd
# your data in array of json
data = [
{"contact_num": "01111784885788634878"},
{"contact_num": "247782788869775178889785427889"},
{"contact_num": "not available"},
{"contact_num": "2478544756"}
]
df = pd.DataFrame(data)
def split_func(row):
if row.contact_num.isnumeric(): # check if current value is numeric
return [row.contact_num[i:i 10] for i in range(0, len(row.contact_num),10)]
return row.contact_num # if not numeric, return current value unchanged
df.contact_num = df.apply(lambda x: split_func(x), axis=1) # apply function to each row
print(df)
output will be:
contact_num
0 [0111178488, 5788634878]
1 [2477827888, 6977517888, 9785427889]
2 not available
3 [2478544756]