Home > Software design >  Mesh / divide / explode values in a column of a DataFrames according to a number of meshes for each
Mesh / divide / explode values in a column of a DataFrames according to a number of meshes for each

Time:10-16

Given a DataFrame

df1 :

        value   mesh

0       10      2
1       12      3
2       5       2

obtain a new DataFrame df2 in which for each value of df1 there are mesh values, each one obtained by dividing the corresponding value of df1 by its mesh:

df2 :

    value/mesh
0   5
1   5
2   4
3   4
4   4
5   2.5
6   2.5

More general:

df1 :

    value   mesh_value  other_value 
0   10      2           0
1   12      3           1
2   5       2           2

obtain:

df2 :

    value/mesh_value    other_value
0   5                   0
1   5                   0
2   4                   1
3   4                   1
4   4                   1
5   2.5                 2
6   2.5                 2

CodePudding user response:

You can do map

df2['new'] = df2['value/mesh'].map(dict(zip(df1.eval('value/mesh'),df1.index)))
Out[243]: 
0    0
1    0
2    1
3    1
4    1
5    2
6    2
Name: value/mesh, dtype: int64

CodePudding user response:

Try as follows:

  • Use Series.div for value / mesh_value, and apply Series.reindex using np.repeat with df.mesh_value as the input array for the repeats parameter.
  • Next, use pd.concat to combine the result with df.other_value along axis=1.
  • Finally, rename the column with result of value / mesh_value (its default name will be 0) using df.rename, and chain df.reset_index to reset to a standard index.
df2 = pd.concat([df.value.div(df.mesh_value).reindex(
    np.repeat(df.index,df.mesh_value)),df.other_value], axis=1)\
    .rename(columns={0:'value_mesh_value'}).reset_index(drop=True)

print(df2)

   value_mesh_value  other_value
0               5.0            0
1               5.0            0
2               4.0            1
3               4.0            1
4               4.0            1
5               2.5            2
6               2.5            2

Or slightly different:

  • Use df.assign to add a column with the result of df.value.div(df.mesh_value), and reindex / rename in same way as above.
  • Use df.drop to get rid of columns that you don't want (value, mesh_value) and use df.iloc to change the column order (e.g. we want ['value_mesh_value','other_value'] instead of other way around (hence: [1,0]). And again, reset index.
  • We put all of this between brackets and assign it to df2.
df2 = (df.assign(tmp=df.value.div(df.mesh_value)).reindex(
    np.repeat(df.index,df.mesh_value))\
    .rename(columns={'tmp':'value_mesh_value'})\
        .drop(columns=['value','mesh_value']).iloc[:,[1,0]]\
            .reset_index(drop=True))

# same result
  • Related