Home > Software design >  Pandas duplicate and alter records
Pandas duplicate and alter records

Time:07-18

I want to split the record if it contains / below into 2 rows with the same volume and premium values but with group values of 4942 and 5350. The second transformation

Current dataframe
                  Group     Volume    Premium
                 4941.0     5721.0    5057.76
              4942/5350     6154.0    5462.46

transformed
                  Group     Volume    Premium
                 4941.0     5721.0    5057.76
                   4942     6154.0    5462.46
                   5350     6154.0    5462.46

I tried the code below and the first record group value should be 4941 but it returns nan instead

     dfRosters=(dfRosters.set_index(['Volume', 'Premium'])
     .apply(lambda x: x.str.split('/').explode())
    .reset_index())

  Volume    Premium           Group
0      5721.0    5057.76             NaN
1      6154.0    5462.46            4942
2      6154.0    5462.46            5350

CodePudding user response:

This should work:

df = pd.DataFrame({"group": ["4941.0", "4942/5350"], "volume": ["5721.0", "6154.0"], "Premium": ["5057.76", "5462.46"]})
df['group'] =  df['group'].apply(lambda x: x.split("/"))
df = df.explode("group")

The explode method converts a list into a separate rows and duplicates the other rows for each item in the list.

CodePudding user response:

Use str.split for vectorial and error-proof splitting, then explode:

out = (df.assign(**{'Group': df['Group'].str.split('/')})
         .explode('Group', ignore_index=True)
      )

output:

    Group  Volume  Premium
0  4941.0  5721.0  5057.76
1    4942  6154.0  5462.46
2    5350  6154.0  5462.46

To also convert the types:

out = (df
   .assign(**{'Group': df['Group'].str.split('/')})
   .explode('Group', ignore_index=True)
   .astype({'Group': float})
)

output:

    Group  Volume  Premium
0  4941.0  5721.0  5057.76
1  4942.0  6154.0  5462.46
2  5350.0  6154.0  5462.46

CodePudding user response:

    dfRosters['Group'] = dfRosters['Group'].astype(str)
    dfRosters=(dfRosters.set_index(['Volume', 'Premium'])
    .apply(lambda x: x.str.split('/').explode())
    .reset_index())

Had to format group as string to resolve the issue

  • Related