Home > Blockchain >  Convert Varying Column Length to Rows in Pandas
Convert Varying Column Length to Rows in Pandas

Time:12-29

I'm trying to create a graph with Seaborn that shows all of the Windows events in the Domain Controller that have taken place in a given time range, which means you have, say, five events now, but when you run the program again in 10 minutes, you might get 25 events.

With that said, I've been able to parse these events (labeled Actions) from a mumbo-jumbo of other data in the log file and then create a DataFrame in Pandas. The script outputs the data as a dictionary. After creating the DataFrame, this is what the output looks like:

   logged-out  kerberos-authentication-ticket-requested  logged-in  created-process  exited-process
1           1                                         5          2                1               1

Note: The values you see above are the number of times the process took place within that time frame.

That would be good enough for me, but only if a table was all I needed. When I try to put this DataFrame into Seaborn, I get an error because I don't know what to name the x and y axes because, well, they are always changing. So, my solution was to use the df.melt() function in order to convert those columns into rows, and then label the only two columns needed ('Actions','Count'). But that's where I fumbled multiple times. I can't figure out how to use the df.melt() functions correctly.

Here is my code:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

#Ever-changing data
actions = {'logged-out': 2, 'kerberos-authentication-ticket-requested': 5, 'logged-in': 2, 
           'created-process': 1, 'exited-process': 1, 'logged-out': 1}

#Create DataFrame
data = actions
index = 1 * pd.RangeIndex(start=1, stop=2) #add automatic index
df = pd.DataFrame(data,index=index,columns=['Action','Count'])
print(df)


#Convert Columns to Rows and Add 
df.melt(id_vars=["Action", "Count"], 
        var_name="Action", 
        value_name="Count")


#Create graph
sns.barplot(data=df,x='Action',y='Count',
              palette=['#476a6f','#e63946'],
              dodge=False,saturation=0.65)

plt.savefig('fig.png')
plt.show()

Any help is appreciated.

CodePudding user response:

You can use:

df.melt(var_name="Action", value_name="Count")

without using any id_vars!

  • Related