Home > Net >  How to use pandas to create columns from one with pivot_table
How to use pandas to create columns from one with pivot_table

Time:10-04

I have a csv with unstructured information. I want to use pivot_table (or merge ?) from pandas to have only one rows for every instance.task_id.number and to spread out metric question in several colums.

For example if i have 4 instance.task_id_number, i need to have 4 columns of metric.question I tried with pivot and pivot.table and merge but nothing match my expectation.

Thanks for your help !

#Edit : as asked, i did it as a example :

What i have :

df = pd.DataFrame([["A", 2], ["A", 3], ["A", 6], ["B", 10], ["B", 11], ["B", 12]])

what i want :

df2 = pd.DataFrame([["A", 2, 3, 6], ["B", 10, 11, 12]])

#Edit 2 : What i tried with pivot_table with the real dataframe. I put aggfunc with "metric.question and drop it in values.

I got the error :

AttributeError: 'SeriesGroupBy' object has no attribute 'index'.

I tried to reset the index but it doesn't work better. The code :

import pandas as pd

stockage = pd.read_csv(r"C:\Users\vion1\Ele\Engie\Import_Engie\asmt_assessment_instance_question.csv", encoding="cp1252")
df = pd.DataFrame(stockage)
#df = df.filter(["instance.task_id.number", "metric.question"], axis = 1)


df2 = df.reset_index(drop = True).pivot_table(index=['instance.task_id.number'],
             columns='metric.question',
             values=["instance","instance.trigger_id","instance.task_id.number","instance.taken_on","instance.state",
             "string_value","metric.order","value","sys_updated_on","instance.task_id.company",
             "instance.user.u_company_customer.u_customer_trigram","instance.task_id.contact_type",
             "instance.task_id.assignment_group"], aggfunc="metric.question")

print(df2)
df2.to_csv(r"C:\Users\vion1\Ele\Engie\Import_Engie\resultat.csv")

CodePudding user response:

Can you try this:

>>> df.assign(cols=df.groupby('instance.task_id.number').cumcount()) \
      .pivot(index='instance.task_id.number',
             columns='cols',
             values='metric.question') \
      .rename_axis(index=None, columns=None)

             0   1   2   3
REQ0510079  Q1  Q2  Q3  Q4
REQ0527568  Q1  Q2  Q3  Q4

Old answer

Following my comment:

data = {'instance.task_id.number': ['REQ0510079','REQ0510079','REQ0510079','REQ0510079',
                                    'REQ0527568','REQ0527568','REQ0527568','REQ0527568'],
        'metric.question': ['Q1', 'Q2', 'Q3', 'Q4', 'Q1', 'Q2', 'Q3', 'Q4']}
df = pd.DataFrame(data)

Using pivot:

>>> df.pivot(index='instance.task_id.number',
             columns='metric.question',
             values='metric.question')

metric.question          Q1  Q2  Q3  Q4
instance.task_id.number                
REQ0510079               Q1  Q2  Q3  Q4
REQ0527568               Q1  Q2  Q3  Q4

  • Related