Home > Enterprise >  Pandas compare column in same data frame and replace values based on comparison
Pandas compare column in same data frame and replace values based on comparison

Time:02-18

I have a excel sheet that consist of over 6000 rows. There are two column , "IP Address CMDB" that contain IP addresses and another column called "IP Address LM". I am trying to look for IP address that belongs to "IP Address CMDB" in "IP Address LM" and if "IP Address LM" contain that IP address return ABCD. I could not attach excel sheet so I have attached screenshot of it.

enter image description here

for col in report:
    if col == "IP Address CMDB":
        col_num = report[col]
        for num in col_num:
            if report["IP Address LM"].str.contains(num):
                print("ABCD")
<ipython-input-13-40cfae2bd937>:5: UserWarning: This pattern has match groups. To actually get the groups, use str.extract.
  if report["IP Address LM"].str.contains(num):
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-13-40cfae2bd937> in <module>
      3         col_num = report[col]
      4         for num in col_num:
----> 5             if report["IP Address LM"].str.contains(num):
      6                 print("ABCD")
      7 

c:\users\rohit verma\appdata\local\programs\python\python39\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
   1535     @final
   1536     def __nonzero__(self):
-> 1537         raise ValueError(
   1538             f"The truth value of a {type(self).__name__} is ambiguous. "
   1539             "Use a.empty, a.bool(), a.item(), a.any() or a.all()."

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

CodePudding user response:

You can simply use the below code to check whether the IP Address LM contains what belongs to IP Address CMDB:

checkColumn = []
for index, row in df.iterrows():
  Ip = row["IP Address LM"]
  toCheck = row["IP Address CMDB"]
  if toCheck in Ip:
    checkColumn.append("ABCD")
  else:
    checkColumn.append(None)
df["check"] = checkColumn

Explanation

iterrows() function loops over all of the dataframe's rows. Then using a logic statement such as toCheck in Ip we are trying to check whether the value exists in the abovementioned column or not. If not, returns None, otherwise, returns ABCD as requested.

CodePudding user response:

As the source DataFrame (report) I created:

                        IP Address CMDB IP Address LM
0                   10.1.0.36,10.1.53.1     10.1.0.36
1                            10.1.11.21    10.1.11.21
2  10.1.148.20,192.168.128.3,10.1.5.130    10.1.5.130
3                            10.1.5.100    10.1.5.140
4                            10.1.6.120    10.1.6.140

To identify rows where IP Address CMDB contains IP Address LM you can run e.g.:

report.apply(lambda row: row['IP Address LM'] in row['IP Address CMDB'], axis=1)

Details:

  1. report.apply - applies the given lambda function to each row (due to axis=1 parameter).
  2. row['IP Address LM'] in row['IP Address CMDB'] - creates temporary lists of characters from both columns in the current row and checks whether the left list is contained in the right one.
  3. The returned value actually answers your question (does IP Address CMDB contain IP Address LM).

The result is:

0     True
1     True
2     True
3    False
4    False
dtype: bool

As you can see, IP Address CMDB in first 3 rows contains IP Address LM from the current row.

If you want to do something more, write your own function including your actions, returning some result for the current row, and replace the lambda function with this function.

And a note about your code: str.contains can be used to check whether an element of a column contains a fixed value, but you actually want to check containment for values in the current row only.

  • Related