My code:
counter = 0
while True :
try :
test = pd.read_csv('test.csv' , nrows = 1 , skiprows = (counter) )
counter = 1
except:
sleep(1)
print('sleep')
continue
test = test.iloc[::,::]
print('counter' , counter)
test = test[newx] # the same model trained features
print('########### Binary classification is loading #############')
print('test' , test)
result1 = GB_model.predict(test) # lunch predication
#print(result1)
print('......Binary classfication result......')
for i in result1:
if i == 1:
print('Attack traffic!!')
else:
print('Benign')
example of the rows and columns in the csv file
the error , Just the first rows will be predicted then next rows will come up with this error which i think it appears because the next rows will get inside the predict function without rows headers (features)
└─# python script.py
/home/kali/Desktop/conda/Desktop/envs/project/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
from pandas import MultiIndex, Int64Index
counter 1
########### Binary classification is loading #############
test protocol flow_duration tot_fwd_pkts tot_bwd_pkts ... init_bwd_win_byts fwd_seg_size_min active_mean idle_mean
0 17 5.003721e 06 4 0 ... 0 8 0.0 0.0
[1 rows x 31 columns]
......Binary classfication result......
Benign
:( no attack traffic maybe next time
counter 2
Traceback (most recent call last):
File "/home/kali/Desktop/cicflowmeter-0.1.6/project/script.py", line 90, in <module>
test = test[newx]
File "/home/kali/Desktop/conda/Desktop/envs/project/lib/python3.9/site-packages/pandas/core/frame.py", line 3511, in __getitem__
indexer = self.columns._get_indexer_strict(key, "columns")[1]
File "/home/kali/Desktop/conda/Desktop/envs/project/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 5782, in _get_indexer_strict
self._raise_if_missing(keyarr, indexer, axis_name)
File "/home/kali/Desktop/conda/Desktop/envs/project/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 5842, in _raise_if_missing
raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Index(['protocol', 'flow_duration', 'tot_fwd_pkts', 'tot_bwd_pkts',\n 'totlen_fwd_pkts', 'totlen_bwd_pkts', 'fwd_pkt_len_mean',\n 'fwd_pkt_len_std', 'bwd_pkt_len_mean', 'flow_byts_s', 'flow_pkts_s',\n 'flow_iat_std', 'flow_iat_min', 'fwd_iat_tot', 'fwd_iat_min',\n 'bwd_iat_tot', 'bwd_iat_min', 'fwd_psh_flags', 'fwd_urg_flags',\n 'bwd_pkts_s', 'fin_flag_cnt', 'rst_flag_cnt', 'psh_flag_cnt',\n 'ack_flag_cnt', 'urg_flag_cnt', 'down_up_ratio', 'init_fwd_win_byts',\n 'init_bwd_win_byts', 'fwd_seg_size_min', 'active_mean', 'idle_mean'],\n dtype='object')] are in the [columns]"
CodePudding user response:
if you don't want to read all at once then you can read in chunks (using rows number)
import pandas as pd
for row in pd.read_csv('filename.csv', chunksize=1):
print(row)
But your real problem can be that your file is NOT correct CSV.
In CSV every row need the same columns but you have row different values in different rows and you should read it as normal text file and parse every row in different way (using own code)