Home > Blockchain >  extract digits >10 from a column with several data types in python
extract digits >10 from a column with several data types in python

Time:06-17

I have a column in a dataset that has string and digits, (Column 2), I need to extract digits with 10 or more. as (Column 3) / output. any idea how to do this?

Column1 Column2
A ghjy 123456677777 rttt 123.987 rtdggd
ABC 90999888877 asrteg 12.98 tggff 12300004
B thdhdjdj 123 jsjsjjsjl tehshshs 126666555533333
DLT 1.2897 thhhsskkkk 456633388899000022
XYZ tteerr 12.34

Expected output: |Column3| |-------| |123456677777| |90999888877| |126666555533333| |456633388899000000| | |

I tried a few codes, regex, lambda function, apply, map, but is taking the entire column as one string. didnt want to split it because real dataset has so many words and digits on it.

CodePudding user response:

You could try:

df['Column3'] = df['Column2'].str.extract(r'(\d{10,})')
print(df)

  Column1                                          Column2             Column3
0       A            ghjy 123456677777 rttt 123.987 rtdggd        123456677777
1     ABC          90999888877 asrteg 12.98 tggff 12300004         90999888877
2       B  thdhdjdj 123 jsjsjjsjl tehshshs 126666555533333     126666555533333
3     DLT             1.2897 thhhsskkkk 456633388899000022  456633388899000022
4     XYZ                                     tteerr 12.34                 NaN

To allow for multiple matches per string, you could do:

df['Column3'] = df['Column2'].str.findall(r'(\d{10,})').apply(', '.join)

CodePudding user response:

Maybe this works:

  • Take the value of the Column 2
  • Split the values
  • for loop the values
  • Check if the value is numeric and if the length is equal or greater than 10
  • Get the value if the previous validation is true
  • Set the value to the Column 3
  • Related