split character into small set of character list-CodePudding

split dataframe character into small set of character list in dataframe?

This is a dataframe, I need to split into as 10 10 character in a list of dataframe.

|         contact_num            | 
| -------------------------------|
| 01111784885788634878           |
| 247782788869775178889785427889 |
| not available                  |
| 2478544756                     |

expected output:

|         contact_num               | 
| ------------------------------- --|
| [0111178488,5788634878]           |
| [2477827888,6977517888,9785427889]|
| not available                     |
| [2478544756]                      |

CodePudding user response：

Try:

mask = df["contact_num"].str.contains(r"^\d{10,}$", regex=True)

df.loc[mask, "contact_num"] = df.loc[mask, "contact_num"].str.findall(r"\d{10}")
print(df)

Prints:

                            contact_num
0              [0111178488, 5788634878]
1  [2477827888, 6977517888, 9785427889]
2                         not available
3                          [2478544756]

CodePudding user response：

You can use the pandas apply with a custom function (just to have control over what you are doing, otherwise you can do it in a more pythonic and less verbose way).

import pandas as pd

# your data in array of json
data = [
    {"contact_num": "01111784885788634878"},
    {"contact_num": "247782788869775178889785427889"},
    {"contact_num": "not available"},
    {"contact_num": "2478544756"}
]

df = pd.DataFrame(data)

def split_func(row):
    if row.contact_num.isnumeric():  # check if current value is numeric
        return [row.contact_num[i:i 10] for i in range(0, len(row.contact_num),10)]
    return row.contact_num  # if not numeric, return current value unchanged

df.contact_num = df.apply(lambda x: split_func(x), axis=1)  # apply function to each row
print(df)

output will be:

                            contact_num
0              [0111178488, 5788634878]
1  [2477827888, 6977517888, 9785427889]
2                         not available
3                          [2478544756]