Home > Software design >  Splitting Row into Multiple Rows in Pandas Dataframe
Splitting Row into Multiple Rows in Pandas Dataframe

Time:02-19

Say I have a dataframe like this:

data = {'state':['FL', 'FL', 'FL'],
        'territory':[1, 2, 3],
        'proportion':['A-0.7/B-0.3', 'A-0.7/B-0.3', 'A-0.7/B-0.3'],
        'value':[10, 10, 10]}
df = pd.DataFrame(data)
  state  territory   proportion  value
0    FL          1  A-0.7/B-0.3     10
1    FL          2  A-0.7/B-0.3     10
2    FL          3  A-0.7/B-0.3     10

For each row, I want to split the 'value' into two rows based on the 'proportion'

data = {'state':['FL', 'FL', 'FL', 'FL', 'FL', 'FL'], 
        'territory':[1, 1, 2, 2, 3, 3],
        'proportion':['A', 'B', 'A', 'B', 'A', 'B'],
        'value':[7, 3, 7, 3, 7, 3,]}
pd.DataFrame(data)
  state  territory proportion  value
0    FL          1          A      7
1    FL          1          B      3
2    FL          2          A      7
3    FL          2          B      3
4    FL          3          A      7
5    FL          3          B      3

How do I do this?

CodePudding user response:

You could split proportion on /, then explode to create the new rows.

Then split again to get the letter and number, assign the letter, convert the number to float and multiply to the original value:

df2 = (
 df.assign(proportion=df['proportion'].str.split('/'))
   .explode('proportion')
   .reset_index(drop=True)
)
d = df2['proportion'].str.split('-', expand=True)

df2['proportion'] = d[0]
df2['value'] *= d[1].astype(float)

output:

  state  territory proportion  value
0    FL          1          A    7.0
1    FL          1          B    3.0
2    FL          2          A    7.0
3    FL          2          B    3.0
4    FL          3          A    7.0
5    FL          3          B    3.0

CodePudding user response:

We do split 2 times , one follow with explode to add additional rows another need to adding the additional columns

out = df.explode('proportion').reset_index(drop=True)
out = out.join(out.pop('proportion').str.split('-',expand=True).rename(columns = {0:'proportion',1:'values'}))
out
Out[217]: 
  state  territory  value proportion values
0    FL          1     10          A    0.7
1    FL          1     10          B    0.3
2    FL          2     10          A    0.7
3    FL          2     10          B    0.3
4    FL          3     10          A    0.7
5    FL          3     10          B    0.3
  • Related