I have a csv file which contains network packets. Each row contains a column detailing the protocol and packet size. I am trying to differentiate between two different protocols such that if it is a http packet then the packet size is made negative e.g. 168 becomes -168 if it is of type http.
I am using Pandas to read the csv file but have been unable to find a way of iterating through each row and checking if the protocol column contains the string 'http'.
I have tried the following but the output prints all rows not just the http protocol ones
dataset = pandas.read_csv('test.csv', engine='python')
dataset.columns = ['packet', 'Time', 'Source', 'Dest', 'Proto', 'Length', 'Info']
dataset.index.name = 'packet'
for x in dataset.index:
if dataset['Proto'].eq('http').any():
print(dataset['Length'])
CodePudding user response:
If I understand the question correctly, this should do what you're looking for.
import pandas as pd
data = pd.DataFrame({'protocol':['http', 'http', 'shh'], 'packet':[1,2,3]})
data.loc[data.protocol=='http', 'packet'] = -data.loc[data.protocol=='http','packet']
In your code, you are asking if any of the columns is equal to http
, and if so, return the length of your dataset.
CodePudding user response:
I understood the same as Marc. I have a similar answer, although using fillna at the end:
df=pd.DataFrame({'type':['http','https'],'packet':[168,168]})
df.loc[df.type=='http','new_col']=df.packet*-1
df.new_col.fillna(df.packet, inplace=True)
df
type packet new_col
0 http 168 -168
1 https 168 168