How should I itrate over csv file row by row to get 1 prediction at the time-CodePudding

My code:

counter = 0
while True :
    try :     
        test = pd.read_csv('test.csv' , nrows = 1 , skiprows = (counter) )
        counter  = 1
    except:
        sleep(1)
        print('sleep')
        continue

    test = test.iloc[::,::]

    print('counter' , counter)

    test  = test[newx] # the same model trained features
    print('########### Binary classification is loading #############')
    print('test' , test)
    result1 = GB_model.predict(test) # lunch predication 

    #print(result1)

    print('......Binary classfication result......')

    for i in result1:
        if i == 1:
            print('Attack traffic!!')
        else:
            print('Benign')

example of the rows and columns in the csv file

https://pmqu-my.sharepoint.com/:x:/g/personal/3710137_upm_edu_sa/EWwYIThGjwhLnaHM-0OirdcBwQyIHfy8o1WG_M0tcohJOg?e=n8cvQR

the error , Just the first rows will be predicted then next rows will come up with this error which i think it appears because the next rows will get inside the predict function without rows headers (features)

└─# python script.py                                                                                                
/home/kali/Desktop/conda/Desktop/envs/project/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import MultiIndex, Int64Index
counter 1
########### Binary classification is loading #############
test    protocol  flow_duration  tot_fwd_pkts  tot_bwd_pkts  ...  init_bwd_win_byts  fwd_seg_size_min  active_mean  idle_mean
0        17   5.003721e 06             4             0  ...                  0                 8          0.0        0.0

[1 rows x 31 columns]
......Binary classfication result......
Benign
:( no attack traffic maybe next time
counter 2
Traceback (most recent call last):
  File "/home/kali/Desktop/cicflowmeter-0.1.6/project/script.py", line 90, in <module>
    test  = test[newx]
  File "/home/kali/Desktop/conda/Desktop/envs/project/lib/python3.9/site-packages/pandas/core/frame.py", line 3511, in __getitem__
    indexer = self.columns._get_indexer_strict(key, "columns")[1]
  File "/home/kali/Desktop/conda/Desktop/envs/project/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 5782, in _get_indexer_strict
    self._raise_if_missing(keyarr, indexer, axis_name)
  File "/home/kali/Desktop/conda/Desktop/envs/project/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 5842, in _raise_if_missing
    raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Index(['protocol', 'flow_duration', 'tot_fwd_pkts', 'tot_bwd_pkts',\n       'totlen_fwd_pkts', 'totlen_bwd_pkts', 'fwd_pkt_len_mean',\n       'fwd_pkt_len_std', 'bwd_pkt_len_mean', 'flow_byts_s', 'flow_pkts_s',\n       'flow_iat_std', 'flow_iat_min', 'fwd_iat_tot', 'fwd_iat_min',\n       'bwd_iat_tot', 'bwd_iat_min', 'fwd_psh_flags', 'fwd_urg_flags',\n       'bwd_pkts_s', 'fin_flag_cnt', 'rst_flag_cnt', 'psh_flag_cnt',\n       'ack_flag_cnt', 'urg_flag_cnt', 'down_up_ratio', 'init_fwd_win_byts',\n       'init_bwd_win_byts', 'fwd_seg_size_min', 'active_mean', 'idle_mean'],\n      dtype='object')] are in the [columns]"

CodePudding user response：

if you don't want to read all at once then you can read in chunks (using rows number)

import pandas as pd

for row in pd.read_csv('filename.csv', chunksize=1):
    print(row)

But your real problem can be that your file is NOT correct CSV.

In CSV every row need the same columns but you have row different values in different rows and you should read it as normal text file and parse every row in different way (using own code)