Home > database >  How to create an arithmetic expression with lenght of items in each row of a column using python pan
How to create an arithmetic expression with lenght of items in each row of a column using python pan

Time:09-21

I have a data frame like this:

data = {'id': ['id_01, id_02', 
               'id_03', 
               'id_04', 
               'id_05', 
               'id_06, id_07, id_08'], 
        'price': [100, 200, 300, 400, 500]}

df = pd.DataFrame(data)
df

output:

enter image description here

I did this to split each id and analyse each as single row:

new_df = df.assign(new_id=df.id.str.split(",")).explode('new_id')
new_df

output:

enter image description here

So far, so good =) Now I'd like to reach this result below, where I calculate each price divided by the lenght of id items in each row, like this:

enter image description here

How can I reach this result using the most simple way for a beginner student?

CodePudding user response:

Calculate this before you explode when you can access the length of id items via .str.len():

(df.assign(new_id=df.id.str.split(","))
   .assign(new_price=lambda df: df.price / df.new_id.str.len())
   .explode('new_id'))

                    id  price  new_id   new_price
0         id_01, id_02    100   id_01   50.000000
0         id_01, id_02    100   id_02   50.000000
1                id_03    200   id_03  200.000000
2                id_04    300   id_04  300.000000
3                id_05    400   id_05  400.000000
4  id_06, id_07, id_08    500   id_06  166.666667
4  id_06, id_07, id_08    500   id_07  166.666667
4  id_06, id_07, id_08    500   id_08  166.666667

CodePudding user response:

Another way and in one line, str. split into a column, find the len of new list and use it to find average on dynamically. Faster than a lambda.

   new_df = (df.assign(new_id=df.id.str.split(","),#new colume
    price=df['price'].div(df.id.str.split(",").str.len())#Find average
                .astype(int)).explode('new_id')#Explode to expnd the df
     )

output

            id           price  new_id
0         id_01, id_02     50   id_01
0         id_01, id_02     50   id_02
1                id_03    200   id_03
2                id_04    300   id_04
3                id_05    400   id_05
4  id_06, id_07, id_08    166   id_06
4  id_06, id_07, id_08    166   id_07
4  id_06, id_07, id_08    166   id_08

CodePudding user response:

Probably not the best solution, but you can use the fact that exploded rows have the same index:

new_df['new_price'] = new_df['price']/new_df.groupby(new_df.index).transform('count')['id']
  • Related