I have the following code which I feel is not very pythonic:
old_hostname = None
for i, row in dupes.iterrows():
if i == 0:
old_hostname = row['Hostname']
else:
if row['Hostname'] != old_hostname:
print('-----')
print(f"{row['Name']:<32} {row['MAC']:<18} {row['IPv4 Address']:<16} ")
old_hostname = row['Hostname']
that operate on a panda dataframe called 'dupes' (but could just be plain list of lists) something like this:
hostname1,aabb.ccdd.eef0,1.1.1.1
hostname1,aabb.ccdd.eef1,1.1.1.2
hostname2,aabb.ccdd.eef5,1.1.2.1
hostname3,aabb.ccdd.e0ff,1.1.4.1
hostname3,aabb.ccdd.e1ff,1.1.5.1
hostname3,aabb.ccdd.e2ff,1.1.6.1
...
output would be this
hostname1,aabb.ccdd.eef0,1.1.1.1
hostname1,aabb.ccdd.eef1,1.1.1.2
-----
hostname2,aabb.ccdd.eef5,1.1.2.1
-----
hostname3,aabb.ccdd.e0ff,1.1.4.1
hostname3,aabb.ccdd.e1ff,1.1.5.1
hostname3,aabb.ccdd.e2ff,1.1.6.1
-----
The code works fine but I feel I am missing a more compact, pythonic way of doing it. The major snag that I see is how to handle the initial row where I need special handling to avoid looking for the previous row['hostname'] which of course does not exist. note I set old_hostname to None to avoid nag message (from Pycharm) about referencing a variable that may not exist.
CodePudding user response:
I'm not sure if this really answers your question but... If you have a plain list where you want to compare adjacent elements then zip() is your friend. For example:
myList = [1, 2, 3, 4, 5]
for x, y in zip(myList, myList[1:]):
if x < y: # or whatever
pass # do something
CodePudding user response:
It's possible that groupby might help. It's part of the standard library, and it will take any iterable and group the values based on a key. The key is a function that will be called on each object in the iterable. In this case, it'd be the Hostname field. It returns a generator object that provides tuples in the pattern of (<key>, Generator)
, with each Generator producing the grouped values.
from itertools import groupby
rows = [
{"Hostname": "hostname1", "MAC": "aabb.ccdd.eef0", "IPv4 Address": "1.1.1.1"},
{"Hostname": "hostname1", "MAC": "aabb.ccdd.eef1", "IPv4 Address": "1.1.1.2"},
{"Hostname": "hostname2", "MAC": "aabb.ccdd.eef5", "IPv4 Address": "1.1.2.1"},
{"Hostname": "hostname3", "MAC": "aabb.ccdd.e0ff", "IPv4 Address": "1.1.4.1"},
{"Hostname": "hostname3", "MAC": "aabb.ccdd.e1ff", "IPv4 Address": "1.1.5.1"},
{"Hostname": "hostname3", "MAC": "aabb.ccdd.e2ff", "IPv4 Address": "1.1.6.1"},
]
for hostname, grouped_rows in groupby(rows, key=lambda row: row['Hostname']):
for row in grouped_rows:
print(f"{row['Hostname']:<32} {row['MAC']:<18} {row['IPv4 Address']:<16} ")
print ('------')
The output is:
hostname1 aabb.ccdd.eef0 1.1.1.1
hostname1 aabb.ccdd.eef1 1.1.1.2
------
hostname2 aabb.ccdd.eef5 1.1.2.1
------
hostname3 aabb.ccdd.e0ff 1.1.4.1
hostname3 aabb.ccdd.e1ff 1.1.5.1
hostname3 aabb.ccdd.e2ff 1.1.6.1
------
For efficiency purposes, the generators are provided. On the surface these don't give much insight into what the resulting structure looks like, so here's a better way to visualize what groupby returns if it provided lists instead:
[('hostname1',
[{'Hostname': 'hostname1',
'IPv4 Address': '1.1.1.1',
'MAC': 'aabb.ccdd.eef0'},
{'Hostname': 'hostname1',
'IPv4 Address': '1.1.1.2',
'MAC': 'aabb.ccdd.eef1'}]),
('hostname2',
[{'Hostname': 'hostname2',
'IPv4 Address': '1.1.2.1',
'MAC': 'aabb.ccdd.eef5'}]),
('hostname3',
[{'Hostname': 'hostname3',
'IPv4 Address': '1.1.4.1',
'MAC': 'aabb.ccdd.e0ff'},
{'Hostname': 'hostname3',
'IPv4 Address': '1.1.5.1',
'MAC': 'aabb.ccdd.e1ff'},
{'Hostname': 'hostname3',
'IPv4 Address': '1.1.6.1',
'MAC': 'aabb.ccdd.e2ff'}])]
CodePudding user response:
Your approach seems fine to me. The one change I would make would be to use an iter()
on a list or set a variable to iterrows()
so that one could call next()
.
import pandas
## -----------------------
## simple utility method to handle formatting
## -----------------------
def str_to_print(row):
return f"{row['Name']:<32} {row['MAC']:<18} {row['IPv4 Address']:<16} "
## -----------------------
## -----------------------
## Example Data
## -----------------------
dupes = pandas.DataFrame([
{"Hostname": "a", "Name": "a", "MAC": "a", "IPv4 Address": "a"},
{"Hostname": "a", "Name": "a", "MAC": "a", "IPv4 Address": "a"},
{"Hostname": "b", "Name": "b", "MAC": "b", "IPv4 Address": "b"},
{"Hostname": "b", "Name": "b", "MAC": "b", "IPv4 Address": "b"},
{"Hostname": "c", "Name": "c", "MAC": "c", "IPv4 Address": "c"},
])
## -----------------------
## -----------------------
## get the iter so we can call next()
## -----------------------
rows = dupes.iterrows()
## -----------------------
## -----------------------
## Set and handle the prior value
## -----------------------
_, prior_row = next(rows, None)
print(str_to_print(prior_row))
## -----------------------
## -----------------------
## handle the remaining values
## -----------------------
for _, row in rows:
if row['Hostname'] != prior_row['Hostname']:
print('-----')
print(str_to_print(row))
prior_row = row
## -----------------------
CodePudding user response:
Here's a pandas approach making use of duplicated
import pandas as pd
dupes = pd.DataFrame(
{'Hostname': {0: 'hostname1',
1: 'hostname1',
2: 'hostname2',
3: 'hostname3',
4: 'hostname3',
5: 'hostname3'},
'IPv4 Address': {0: '1.1.1.1',
1: '1.1.1.2',
2: '1.1.2.1',
3: '1.1.4.1',
4: '1.1.5.1',
5: '1.1.6.1'},
'MAC': {0: 'aabb.ccdd.eef0',
1: 'aabb.ccdd.eef1',
2: 'aabb.ccdd.eef5',
3: 'aabb.ccdd.e0ff',
4: 'aabb.ccdd.e1ff',
5: 'aabb.ccdd.e2ff'}}
)
is_dup = dupes['Hostname'].duplicated(keep='last')
for i,row in dupes.iterrows():
if not is_dup[i]:
print('----')
print(f"{row['Hostname']:<32} {row['MAC']:<18} {row['IPv4 Address']:<16} ")