Home > Back-end >  Matching a data frame row (pandas) to a separate data frame row and performing a calculation if the
Matching a data frame row (pandas) to a separate data frame row and performing a calculation if the

Time:10-07

newbie python/coder who is trying to make data logger downloads and calculations a smoother process as a side project. Anyways I have two data frames. The first is "data" which contains the following (number of rows shortened for simplicity):

    Logger Name    Date and Time  Battery   Temp(C)  Sensor Reading(dg)  Sensor Temp(C)  Array #
0   TDX  10/1/2021 13:35                2.93             15.59            8772.737            14.5      833
1   TDX  10/1/2021 13:36                2.93             15.59            8773.426            14.5      834
2   TDX  10/1/2021 13:36                2.93             15.59            8773.570            14.5      835
3   TDX  10/1/2021 13:37                2.93             15.59            8773.793            14.5      836

The second is "param" which has parameters which contains values that I use to make calculations:

Transducer_ID  elevation_tom  elevation_toc  elevation_ground  elevation_tos calculation  gage_factor  xd_zero_reading  thermal_factor  xd_temp_at_zero_reading  piezo_elev  piezo_downhole_depth
0   TDX            NaN            NaN              1000            NaN      linear     -0.04135             9138        0.003119                     24.8        1600                   400
1  Test            NaN            NaN              1000            NaN      linear     -0.18320             8997       -0.170100                     22.6         800                   200

Now what I hope the code will be able to do is make a new column in "data" called "Linear P" which populates based on this calculation that uses variables from both dataframes: [digits_zero_digits - Sensor Reading(dg)] * abs(gage_factor). Now this is not a problem if "param" only had one Transducer ID and the same number of rows as "data", but in reality it has lots of rows with different IDs.

So my question is this. What is the best way to accomplish my goal? Is it to loop over the column or is there something more efficient using the pandas library?

Thanks in advance!

edit: the output I am looking for is this

    Logger Name    Date and Time  Battery Voltage(v)  Internal Temp(C)  Sensor Reading(dg)  Sensor Temp(C)  Array #   Linear P
0   TDX  10/1/2021 13:35                2.93             15.59            8772.737            14.5      833  15.103625
1   TDX  10/1/2021 13:36                2.93             15.59            8773.426            14.5      834  15.075135
2   TDX  10/1/2021 13:36                2.93             15.59            8773.570            14.5      835  15.069181
3   TDX  10/1/2021 13:37                2.93             15.59            8773.793            14.5      836  15.059959

CodePudding user response:

The more efficient way would be based on my experience :

  1. join the two data frame using (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.join.html).
  2. make calculation on the result dataframe ( df["Linear P"] = df["Sensor Reading(dg)"] * ... ) .

CodePudding user response:

Just figured out a way to do it that seems pretty efficient. I simply remove the data in "param" that I do not need:

z = data.iloc[0,0]
param = param[param.Transducer_ID == z]

With the data filtered I pull out only the needed values from param:

x = piezo_param.iloc[0, 7]
y = piezo_param.iloc[0, 6]

And perform the calculation: data['Linear P'] = (x - data['Sensor Reading(dg)']) * abs(y)

Let me know if this seems like the best way to get the job done!

  • Related