and from the begining I thanks everyone that seeks to help.
I have started to learn python and came across a opportunity to use python to my advantage at work
Im basically made a script that reads a google sheets file, import it into pandas and cleaned up the data.
In the end, I just wanna have the name of the agents in the columns and all of their values for resolucao column below them so I can take the average amount of time for all of the agentes, but I'm struggling to make it with a list comprehension / for loop.
This is what the DataFrame looks like after I cleaned it up
And this is the Code that I tried to Run
PS: Sorry for the messy code.
agentes_unique = list(df['Agente'].unique())
agentes_duplicated = df['Agente']
value_resolucao_duplicated = df['resolucao']
n_of_rows = []
for row in range(len(df)):
n_of_rows.append(row)
i = 0
while n_of_rows[i] < len(n_of_rows):
df2 = pd.DataFrame({agentes_unique[i]: (value for value in df['resolucao'][i] if df['Agente'][i] == agentes_unique[i])})
i = 1
df2.to_excel('teste.xlsx',index = True, header = True)
But in the end it came to this error:
Traceback (most recent call last):
File "C:\Users\FELIPE\Desktop\Python\webscraping\.venv\lib\site-packages\pandas\core\indexes\range.py", line 385, in get_loc
return self._range.index(new_key)
ValueError: 0 is not in range
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "c:\Users\FELIPE\Desktop\Python\webscraping\bot_csv_extract\bot_login\main.py", line 50, in <module>
df2 = pd.DataFrame({'Agente': (valor for valor in df['resolucao'][i] if df['Agente'][i] == 'Gabriel')})
File "C:\Users\FELIPE\Desktop\Python\webscraping\.venv\lib\site-packages\pandas\core\series.py", line 958, in __getitem__
return self._get_value(key)
File "C:\Users\FELIPE\Desktop\Python\webscraping\.venv\lib\site-packages\pandas\core\series.py", line 1069, in _get_value
loc = self.index.get_loc(label)
File "C:\Users\FELIPE\Desktop\Python\webscraping\.venv\lib\site-packages\pandas\core\indexes\range.py", line 387, in get_loc
raise KeyError(key) from err
KeyError: 0
I feel like I'm making some obvious mistake but I cant fix it
Again, thanks to anyone who tries to help
CodePudding user response:
Are you looking to do something like this? This is just sample data, but a good start for what you are looking to do if I understand what your wanting to do.
data = {
'Column1' : ['Data', 'Another_Data', 'More_Data', 'Last_Data'],
'Agente' : ['Name', 'Another_Name', 'More_Names', 'Last_Name'],
'Data' : [1, 2, 3, 4]
}
df = pd.DataFrame(data)
df = df.pivot(index = 'Column1', columns=['Agente'], values = 'Data')
df.reset_index()
CodePudding user response:
It is not recommended to use for loops against pandas DataFrames: It is considered messy and inefficient. With some practice you will be able to approach problems in such a way that you will not need to use for loops in these cases.
From what I understand, your goal can be realized in 3 simple steps:
1. Select the 2 columns of interest. I recommend you take a look at how to access different elements of a dataFrame:
df = df[["Agent", "resolucao"]]
2. Convert the column you want to average to a numeric value. Say seconds:
df["resolucao"] = pd.to_timedelta(df['resolucao'].astype(str)).dt.total_seconds()
3. Apply an average aggregation, via the groupby()
function:
df = df.groupby(["Agente"]).mean().reset_index()
Hope this helps.
For the next time, I also recommend you to not post the database as an image in order to be able to reproduce your code.
Cheers and keep it up!