Home > Net >  How to create row wise CSV for vectorized dataframe?
How to create row wise CSV for vectorized dataframe?

Time:07-11

What I am trying to do is basically pulling out keywords from a processed file of a log file and creating a vectorized dataframe of those keywords. But when I am writing that dataframe into CSV, words are in the columns and their respective value in the second row. While I want the words to be in rows and their value in second column.

trial.py :

import re
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import ENGLISH_STOP_WORDS

def removeNumbers(list):
   #doing something

def processFiles(filename):
   #doing something

def readFile(fileName):
   #doing something

# Build our text
processFiles("log.txt")
text = readFile("processedFile.txt")


vectorizer = CountVectorizer()

matrix = vectorizer.fit_transform([text])

counts = pd.DataFrame(matrix.toarray(),
                      columns=vectorizer.get_feature_names_out())



counts.to_csv("keywords_count.csv")

keywords_count.csv looks like this :

,accept,accepted,action,add,address,agent,allocated,api,api_action_sender,api_reader,apihandle,apiinitialize,apiterminate,appl,associate,attempt,available,bd,bdfb,broken,ceased,check_signals,chose,cksm,cl,clcat,client,close,code,complete,conf,configuration,connection,connfd,constructing,control,creating,ctcd,delresp,dereg,deregistering,does,dreg_process,dst,dump,edci,engine,entering,entity,entity_initialize,entries,entry,event,event_establishsessionsend,event_timert_expire,exist,exists,exit,exiting,expect,expired,failed,fc,file,filter,flg,flow,flow_timer_start,flow_timer_stop,forward,gateway,handle,home,hop,if,ifaeddrg_byaddr,ifidx,image,images,index,inf,info,informational,init_policyapi,initialization,initialized,install,interface,ioctl,ip,len,level,lih,link,list,local,locate_configfile,log,loopback,mailbox,mailbox_register,mailslot,mailslot_create,mailslot_send,mailslot_sitter,main,mcast_add,module,msg,necessary,new,node,obj,old,open_socket,operation,os,outgoing,papi_debug,papilogfunc,papiuservalue,path,pathdelta,pathed,pathtear,pipe,policy,process,proterr,proto,qoshandle,qoshd,qosmgr,qosmgr_request,qosmgr_response,query,querying,rapi,raw,rc,read_physical_netif,readbuffer,ready,reason,received,reentering,reg_process,registered,registering,registerwithpolicyapi,registration,remove,req,request,reservation,response,result,resv,resvdelta,resved,resvresp,return,returned,route,router_forward_getoi,rpapi_getpolicydata,rpapi_getspecdata,rpapi_reg_unregflow,rsv,rsvp,rsvp_action_nhop,rsvp_api_open,rsvp_event,rsvp_event_establishsession,rsvp_event_mapsession,rsvp_event_propagate,rsvp_explode_packet,rsvp_flow_statemachine,rsvp_hop,rsvp_parse_objects,rsvpd,rsvpfindactionname,rsvpfindservicedetailsonactname,rsvpgettspec,rsvpputactionname,rsvpremactionname,rthdl,send,sender,sender_withdraw,sending,service,sess,session,sessioned,setsockopt,settcpimage,sigalrm,signal,sigterm,socket,source,specified,src,start,started,state,status,stop,stopped,style,successful,supported,tc,tcp,tcpcs,term,term_policyapi,terminate,terminated,terminator,timer,tout,tr,trace,traffic,traffic_action_oif,traffic_reader,ttl,type,udp,unregistered,unregisterfrompolicyapi,user,using,vlink,warning,wf,writing
0,1,1,1,1,18,1,28,8,1,6,1,3,2,1,1,2,4,2,1,1,1,1,1,4,1,3,1,1,1,1,1,1,2,1,9,2,22,2,1,1,1,2,3,3,2,5,2,20,7,7,1,7,31,1,6,1,6,1,17,1,6,4,8,1,2,4,4,12,7,2,7,7,1,4,1,2,7,1,1,7,7,147,2,14,1,8,1,18,9,5,4,1,4,2,1,1,1,1,1,24,23,20,27,9,7,3,4,1,2,2,2,1,4,1,2,1,1,1,3,1,1,7,1,2,4,2,2,10,1,3,2,1,2,4,4,6,1,1,4,4,8,12,1,2,12,9,3,1,1,3,2,2,1,4,3,2,6,4,1,20,1,1,1,17,35,11,3,12,4,38,8,1,4,1,7,1,4,26,4,8,2,3,3,3,3,3,1,1,1,1,9,3,3,10,4,4,2,6,8,1,6,12,1,3,4,9,26,2,5,2,4,10,1,2,2,1,1,8,2,2,1,2,6,1,119,2,2,3,4,5,14,1,3,1,1,1,4,4,1

CodePudding user response:

Transpose your dataframe:

counts.T.to_csv("keywords_count.csv")
  • Related