Home > other >  split column pandas and get string between |
split column pandas and get string between |

Time:06-07

I have a df column like this:

idx PRECIOS
0 1//,2/3699/,3/3699/,4/3699/,6/3699/,7/4499/
1 1//,2/3650/,3/3650/,4/3650/,6/3650/,7/9087/

I need to get the values between // and put them into a new column. Output should look like this:

idx PRECIO1 PRECIO 2 PRECIO 3 PRECIO 4 PRECIO 5
0 3699 3699 3699 3699 4499
1 3650 3650 3650 3650 9087

is it possible to split ? Thanks in advance

CodePudding user response:

Here's an approach that uses apply to walk through each string one character at a time. It's not vectorized and doesn't take advantage of much pandas utilities so its slow, but hopefully fast enough

import pandas as pd

def extract_data(s):
    """
    Walk through the string keeping track of if we are within a / pair
    """
    within = False
    values = []
    curr_val = ''
    
    for c in s:
        if c == '/':
            #if we hit a / then we "reverse" whether or not we were within a pair of slashes
            #if we were within a pair, then now we're not and the opposite too
            within = not within
            continue
            
            
        if within:
            #c is not a / and we are within a pair, so add c to the curr_val string
            curr_val  = c
        elif curr_val:
            #we are not in a pair and curr_val has a non-null value
            #add the previous curr_val as an int to the list of values
            values.append(int(curr_val))
            curr_val = ''
            
    if curr_val:
        #add the last curr_val after the loop
        values.append(int(curr_val))
        
    return values
            

df = pd.DataFrame({
    'idx':['r1','r2'],
    'age':[27,78],
    'PRECIOS':[
        '1//,2/3699/,3/3699/,4/3699/,6/3699/,7/4499/',
        '1//,2/3650/,3/3650/,4/3650/,6/3650/,7/9087/',
    ],
})

split_values = df['PRECIOS'].apply(extract_data) #get a series of list of int

val_df = pd.DataFrame(index=df.index, data=split_values.to_list()) #create a df from split values
val_df.columns = ['PRECIO{}'.format(i 1) for i,c in enumerate(val_df.columns)] #name the columns

out_df = pd.concat((df,val_df), axis=1) #concatenate the original df and the split-df together (not sure if you care about doing this)
out_df

Output

enter image description here

CodePudding user response:

try this:

(df.join(df.pop('PRECIOS')
         .str.findall(r'(?:,\d \/)(\d )(?:\/)')
         .apply(pd.Series)
         .rename(columns=lambda x: f'PRECIO {x 1}')))

>>>
   idx  PRECIO 1    PRECIO 2    PRECIO 3    PRECIO 4    PRECIO 5
0   0   3699        3699        3699        3699        4499
1   1   3650        3650        3650        3650        9087
  • Related