I have a df column like this:
idx | PRECIOS |
---|---|
0 | 1//,2/3699/,3/3699/,4/3699/,6/3699/,7/4499/ |
1 | 1//,2/3650/,3/3650/,4/3650/,6/3650/,7/9087/ |
I need to get the values between // and put them into a new column. Output should look like this:
idx | PRECIO1 | PRECIO 2 | PRECIO 3 | PRECIO 4 | PRECIO 5 |
---|---|---|---|---|---|
0 | 3699 | 3699 | 3699 | 3699 | 4499 |
1 | 3650 | 3650 | 3650 | 3650 | 9087 |
is it possible to split ? Thanks in advance
CodePudding user response:
Here's an approach that uses apply
to walk through each string one character at a time. It's not vectorized and doesn't take advantage of much pandas utilities so its slow, but hopefully fast enough
import pandas as pd
def extract_data(s):
"""
Walk through the string keeping track of if we are within a / pair
"""
within = False
values = []
curr_val = ''
for c in s:
if c == '/':
#if we hit a / then we "reverse" whether or not we were within a pair of slashes
#if we were within a pair, then now we're not and the opposite too
within = not within
continue
if within:
#c is not a / and we are within a pair, so add c to the curr_val string
curr_val = c
elif curr_val:
#we are not in a pair and curr_val has a non-null value
#add the previous curr_val as an int to the list of values
values.append(int(curr_val))
curr_val = ''
if curr_val:
#add the last curr_val after the loop
values.append(int(curr_val))
return values
df = pd.DataFrame({
'idx':['r1','r2'],
'age':[27,78],
'PRECIOS':[
'1//,2/3699/,3/3699/,4/3699/,6/3699/,7/4499/',
'1//,2/3650/,3/3650/,4/3650/,6/3650/,7/9087/',
],
})
split_values = df['PRECIOS'].apply(extract_data) #get a series of list of int
val_df = pd.DataFrame(index=df.index, data=split_values.to_list()) #create a df from split values
val_df.columns = ['PRECIO{}'.format(i 1) for i,c in enumerate(val_df.columns)] #name the columns
out_df = pd.concat((df,val_df), axis=1) #concatenate the original df and the split-df together (not sure if you care about doing this)
out_df
Output
CodePudding user response:
try this:
(df.join(df.pop('PRECIOS')
.str.findall(r'(?:,\d \/)(\d )(?:\/)')
.apply(pd.Series)
.rename(columns=lambda x: f'PRECIO {x 1}')))
>>>
idx PRECIO 1 PRECIO 2 PRECIO 3 PRECIO 4 PRECIO 5
0 0 3699 3699 3699 3699 4499
1 1 3650 3650 3650 3650 9087