Home > Back-end >  Regex Python string
Regex Python string

Time:10-07

I need to read a worksheet and organize the numbers that are without punctuation. For example, I need this number: 12345678901234567890, look like this: 1234567-89.0123.4.56.7890

In Js I did something like this in a google sheets spreadsheet and it worked, but I need to embed this in a python script. I'm using pandas to read the spreadsheet. I'm having trouble mounting in python

function reorder() {
  const a = ["12345678901234567890","12345678901234567890"];//put all of the strings into an array
  const b = a.map(s => {
    return `${s.slice(0,7)}-${s.slice(7,9)}.${s.slice(9,13)}.${s.slice(13,14)}.${s.slice(14,16)}.${s.slice(16)}`
  })
}

CodePudding user response:

Considering that the dataframe that OP is reading from excel looks like this

df = pd.DataFrame({'string': ['12345678901234567890', '12345678901234567890']})

[Out]:
                 string
0  12345678901234567890
1  12345678901234567890

There are various ways to achieve OP's goal.

One, for example, is using pandas.Series.apply with a lambda function as follows

df['string'] = df['string'].apply(lambda x: f"{x[0:7]}-{x[7:9]}.{x[9:13]}.{x[13:14]}.{x[14:16]}.{x[16:]}")

[Out]:
                      string
0  1234567-89.0123.4.56.7890
1  1234567-89.0123.4.56.7890

One can also use pandas.DataFrame.query as

df['string'] = df.query('string.str.contains(r"\d{7}-\d{2}\.\d{4}\.\d\.\d{2}\.\d{2}")', engine='python')

[Out]:
                      string
0  1234567-89.0123.4.56.7890
1  1234567-89.0123.4.56.7890

Notes:

  • One might have to adjust the column name (in this case it is string) and/or the dataframe (in this case it is df).
  • Related