Home > Software design >  Python row wise operation without loops
Python row wise operation without loops

Time:02-08

Solved: Thanks to the help of @mozway I solved my problem. Basically merged the df's, grouped them up and combined them into one column deleted the duplicates.

df2 = df2.merge(df1, how='left', left_on=df2["code2_firstTwoDigits"], right_on=df1["code"])
df2['code'] = df2.groupby(['code2_firstTwoDigits'])['code'].transform(lambda x : ', '.join(x))
df2= df2.drop_duplicates()

I want to execute an operation row wise but I read, as a Python newby, that I shouldn't use loops in Python.

I have two dataframes which basically look like this

df1 =
   code    random_values   code_Values
0   1234    house           3.21
1   2345    car             5.32
2   2342    white           1.00
3   2344    wolf            2.23
4   2345    black           5.00
5   3456    house           5.99
6   3422    car             3.21
7   3666    white           5.32
8   4567    wolf            1.00
9   4511    black           2.23
10  4721    red             5.00

df2 = 
   code2   random_values   code2_firstTwoDigits
0   1264    house           12
1   2397    car             23
2   3498    white           34
3   4588    wolf            45
4   6388    black           63

I'm probably trying to do too many operations at once and I have a feeling that my brain didn't adjust to Python coming from C#. But what I'm trying to do is:

  • Create columns "SameCodeStructure" and "SameCodeStructure_SumOfValues" in df2
  • Iterate through all rows of df2 and search for values in df1["code"] which match with the value of "code2_firstTwoDigits"
  • Save those values from df1["code"] in df2["SameCodeStructure"]
  • Add the value from those matching rows into df2["SameCodeStructure_SumOfValues"]

It's hard to explain without getting confused again, so I tried a visual representation of my question how I would love it to look like:

df3 = 
    code2   random_values   code2_firstTwoDigits    SameCodeStructure           SameCodeStructure_SumOfValues
0   1264    house           12                      1234                        3.21
1   2397    car             23                      2345, 2342, 2344, 2345      13.55
2   3498    white           34                      3456, 3422                  9.2
3   4588    wolf            45                      4567, 4511                  3.23
4   6388    black           63                      -                           -

I managed to look up the matching codes for a single row (3), but I'm struggling to find a helping hand how to transform this into an overall solution for all rows of df2

pd.Series(df2[ df2["code2"].str.match (df1.iloc[3]["code_Values"] ) ]["code2"]).str.cat(sep=", ", others=None)

And I'm thankful for all the advice. Python seems to be easier in terms of "the amount of written code" for me, but it's really difficult to think "in less code"

CodePudding user response:

The exact logic of the output is unclear, but you want merge and groupby agg:

(df1.merge(df2, left_on=df1['code']//100, right_on='code2_firstTwoDigits', suffixes=('', '_2'))
    .groupby('code2_firstTwoDigits')
    .agg({'code_Values': 'sum',
          'code': list,
          'random_values': 'first'})
)

output:

                      code_Values                      code random_values
code2_firstTwoDigits                                                     
12                           3.21                    [1234]         house
23                          13.55  [2345, 2342, 2344, 2345]           car
34                           9.20              [3456, 3422]         house
45                           3.23              [4567, 4511]          wolf
  •  Tags:  
  • Related