A more pythonic way to handle comparing value to previous value in a list-CodePudding

I have the following code which I feel is not very pythonic:

old_hostname = None
for i, row in dupes.iterrows():
    if i == 0:
        old_hostname = row['Hostname']
    else:
        if row['Hostname'] != old_hostname:
            print('-----')
    print(f"{row['Name']:<32} {row['MAC']:<18} {row['IPv4 Address']:<16} ")
    old_hostname = row['Hostname']

that operate on a panda dataframe called 'dupes' (but could just be plain list of lists) something like this:

hostname1,aabb.ccdd.eef0,1.1.1.1
hostname1,aabb.ccdd.eef1,1.1.1.2
hostname2,aabb.ccdd.eef5,1.1.2.1
hostname3,aabb.ccdd.e0ff,1.1.4.1
hostname3,aabb.ccdd.e1ff,1.1.5.1
hostname3,aabb.ccdd.e2ff,1.1.6.1
...

output would be this

hostname1,aabb.ccdd.eef0,1.1.1.1
hostname1,aabb.ccdd.eef1,1.1.1.2
-----
hostname2,aabb.ccdd.eef5,1.1.2.1
-----
hostname3,aabb.ccdd.e0ff,1.1.4.1
hostname3,aabb.ccdd.e1ff,1.1.5.1
hostname3,aabb.ccdd.e2ff,1.1.6.1
-----

The code works fine but I feel I am missing a more compact, pythonic way of doing it. The major snag that I see is how to handle the initial row where I need special handling to avoid looking for the previous row['hostname'] which of course does not exist. note I set old_hostname to None to avoid nag message (from Pycharm) about referencing a variable that may not exist.

CodePudding user response：

I'm not sure if this really answers your question but... If you have a plain list where you want to compare adjacent elements then zip() is your friend. For example:

myList = [1, 2, 3, 4, 5]

for x, y in zip(myList, myList[1:]):
  if x < y: # or whatever
    pass # do something

CodePudding user response：

It's possible that groupby might help. It's part of the standard library, and it will take any iterable and group the values based on a key. The key is a function that will be called on each object in the iterable. In this case, it'd be the Hostname field. It returns a generator object that provides tuples in the pattern of (<key>, Generator), with each Generator producing the grouped values.

from itertools import groupby


rows = [
    {"Hostname": "hostname1", "MAC": "aabb.ccdd.eef0", "IPv4 Address": "1.1.1.1"},
    {"Hostname": "hostname1", "MAC": "aabb.ccdd.eef1", "IPv4 Address": "1.1.1.2"},
    {"Hostname": "hostname2", "MAC": "aabb.ccdd.eef5", "IPv4 Address": "1.1.2.1"},
    {"Hostname": "hostname3", "MAC": "aabb.ccdd.e0ff", "IPv4 Address": "1.1.4.1"},
    {"Hostname": "hostname3", "MAC": "aabb.ccdd.e1ff", "IPv4 Address": "1.1.5.1"},
    {"Hostname": "hostname3", "MAC": "aabb.ccdd.e2ff", "IPv4 Address": "1.1.6.1"},
]



for hostname, grouped_rows in groupby(rows, key=lambda row: row['Hostname']):
    for row in grouped_rows:
        print(f"{row['Hostname']:<32} {row['MAC']:<18} {row['IPv4 Address']:<16} ")
    print ('------')

The output is:

hostname1                        aabb.ccdd.eef0     1.1.1.1          
hostname1                        aabb.ccdd.eef1     1.1.1.2          
------
hostname2                        aabb.ccdd.eef5     1.1.2.1          
------
hostname3                        aabb.ccdd.e0ff     1.1.4.1          
hostname3                        aabb.ccdd.e1ff     1.1.5.1          
hostname3                        aabb.ccdd.e2ff     1.1.6.1          
------

For efficiency purposes, the generators are provided. On the surface these don't give much insight into what the resulting structure looks like, so here's a better way to visualize what groupby returns if it provided lists instead:

[('hostname1',
  [{'Hostname': 'hostname1',
    'IPv4 Address': '1.1.1.1',
    'MAC': 'aabb.ccdd.eef0'},
   {'Hostname': 'hostname1',
    'IPv4 Address': '1.1.1.2',
    'MAC': 'aabb.ccdd.eef1'}]),
 ('hostname2',
  [{'Hostname': 'hostname2',
    'IPv4 Address': '1.1.2.1',
    'MAC': 'aabb.ccdd.eef5'}]),
 ('hostname3',
  [{'Hostname': 'hostname3',
    'IPv4 Address': '1.1.4.1',
    'MAC': 'aabb.ccdd.e0ff'},
   {'Hostname': 'hostname3',
    'IPv4 Address': '1.1.5.1',
    'MAC': 'aabb.ccdd.e1ff'},
   {'Hostname': 'hostname3',
    'IPv4 Address': '1.1.6.1',
    'MAC': 'aabb.ccdd.e2ff'}])]

CodePudding user response：

Your approach seems fine to me. The one change I would make would be to use an iter() on a list or set a variable to iterrows() so that one could call next().

import pandas

## -----------------------
## simple utility method to handle formatting
## -----------------------
def str_to_print(row):
    return f"{row['Name']:<32} {row['MAC']:<18} {row['IPv4 Address']:<16} "
## -----------------------

## -----------------------
## Example Data
## -----------------------
dupes = pandas.DataFrame([
    {"Hostname": "a", "Name": "a", "MAC": "a", "IPv4 Address": "a"},
    {"Hostname": "a", "Name": "a", "MAC": "a", "IPv4 Address": "a"},
    {"Hostname": "b", "Name": "b", "MAC": "b", "IPv4 Address": "b"},
    {"Hostname": "b", "Name": "b", "MAC": "b", "IPv4 Address": "b"},
    {"Hostname": "c", "Name": "c", "MAC": "c", "IPv4 Address": "c"},
])
## -----------------------

## -----------------------
## get the iter so we can call next()
## -----------------------
rows = dupes.iterrows()
## -----------------------

## -----------------------
## Set and handle the prior value
## -----------------------
_, prior_row = next(rows, None)
print(str_to_print(prior_row))
## -----------------------

## -----------------------
## handle the remaining values
## -----------------------
for _, row in rows:
    if row['Hostname'] != prior_row['Hostname']:
        print('-----')
    print(str_to_print(row))
    prior_row = row
## -----------------------

CodePudding user response：

Here's a pandas approach making use of duplicated

import pandas as pd

dupes = pd.DataFrame(
    {'Hostname': {0: 'hostname1',
      1: 'hostname1',
      2: 'hostname2',
      3: 'hostname3',
      4: 'hostname3',
      5: 'hostname3'},
     'IPv4 Address': {0: '1.1.1.1',
      1: '1.1.1.2',
      2: '1.1.2.1',
      3: '1.1.4.1',
      4: '1.1.5.1',
      5: '1.1.6.1'},
     'MAC': {0: 'aabb.ccdd.eef0',
      1: 'aabb.ccdd.eef1',
      2: 'aabb.ccdd.eef5',
      3: 'aabb.ccdd.e0ff',
      4: 'aabb.ccdd.e1ff',
  5: 'aabb.ccdd.e2ff'}}
)

is_dup = dupes['Hostname'].duplicated(keep='last')

for i,row in dupes.iterrows():
    if not is_dup[i]:
        print('----')
    print(f"{row['Hostname']:<32} {row['MAC']:<18} {row['IPv4 Address']:<16} ")