how can I make a for loop to populate a DataFrame?-CodePudding

and from the begining I thanks everyone that seeks to help.

I have started to learn python and came across a opportunity to use python to my advantage at work

Im basically made a script that reads a google sheets file, import it into pandas and cleaned up the data.

In the end, I just wanna have the name of the agents in the columns and all of their values for resolucao column below them so I can take the average amount of time for all of the agentes, but I'm struggling to make it with a list comprehension / for loop.

This is what the DataFrame looks like after I cleaned it up Data cleaned up

And this is the Code that I tried to Run

PS: Sorry for the messy code.

agentes_unique = list(df['Agente'].unique())
agentes_duplicated = df['Agente']
value_resolucao_duplicated = df['resolucao']
n_of_rows = []
for row in range(len(df)):
    n_of_rows.append(row)

i = 0
while n_of_rows[i] < len(n_of_rows):
    df2 = pd.DataFrame({agentes_unique[i]: (value for value in df['resolucao'][i] if df['Agente'][i] == agentes_unique[i])})
    i = 1
df2.to_excel('teste.xlsx',index = True, header = True)

But in the end it came to this error:

Traceback (most recent call last):
  File "C:\Users\FELIPE\Desktop\Python\webscraping\.venv\lib\site-packages\pandas\core\indexes\range.py", line 385, in get_loc
    return self._range.index(new_key)
ValueError: 0 is not in range

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "c:\Users\FELIPE\Desktop\Python\webscraping\bot_csv_extract\bot_login\main.py", line 50, in <module>
    df2 = pd.DataFrame({'Agente': (valor for valor in df['resolucao'][i] if df['Agente'][i] == 'Gabriel')})
  File "C:\Users\FELIPE\Desktop\Python\webscraping\.venv\lib\site-packages\pandas\core\series.py", line 958, in __getitem__   
    return self._get_value(key)
  File "C:\Users\FELIPE\Desktop\Python\webscraping\.venv\lib\site-packages\pandas\core\series.py", line 1069, in _get_value
    loc = self.index.get_loc(label)
  File "C:\Users\FELIPE\Desktop\Python\webscraping\.venv\lib\site-packages\pandas\core\indexes\range.py", line 387, in get_loc
    raise KeyError(key) from err
KeyError: 0

I feel like I'm making some obvious mistake but I cant fix it

Again, thanks to anyone who tries to help

CodePudding user response：

Are you looking to do something like this? This is just sample data, but a good start for what you are looking to do if I understand what your wanting to do.

data = {
    'Column1' : ['Data', 'Another_Data', 'More_Data', 'Last_Data'],
    'Agente' : ['Name', 'Another_Name', 'More_Names', 'Last_Name'],
    'Data' : [1, 2, 3, 4]
}
df = pd.DataFrame(data)
df = df.pivot(index = 'Column1', columns=['Agente'], values = 'Data')
df.reset_index()

CodePudding user response：

It is not recommended to use for loops against pandas DataFrames: It is considered messy and inefficient. With some practice you will be able to approach problems in such a way that you will not need to use for loops in these cases.

From what I understand, your goal can be realized in 3 simple steps:

1. Select the 2 columns of interest. I recommend you take a look at how to access different elements of a dataFrame:

df = df[["Agent", "resolucao"]]

2. Convert the column you want to average to a numeric value. Say seconds:

df["resolucao"] = pd.to_timedelta(df['resolucao'].astype(str)).dt.total_seconds()

3. Apply an average aggregation, via the groupby() function:

df = df.groupby(["Agente"]).mean().reset_index()

Hope this helps.

For the next time, I also recommend you to not post the database as an image in order to be able to reproduce your code.

Cheers and keep it up!