Home > Software engineering >  How to Grouping, Selecting Several Value to be a columns And Select Certain Word in Python
How to Grouping, Selecting Several Value to be a columns And Select Certain Word in Python

Time:11-13

I have A Data like this, or you can see my Notebook here : link or the raw file here : link

Id Type Label Value Value2
1 A Introduction This Project will be created By Mr.X
1 A Capacity 100MB
1 A Speed 10Km/h
1 A Weight 10kg
2 A Introduction This-Project-will-be-created-By-Mr.A
2 A Capacity 100MB
2 A Speed 5km/h
2 A Weight 1kg
3 B Introduction This Project will be created By Mr.C
3 B Capacity 100MB
3 B Speed 5km/h
3 B Weight 1kg
4 B Introduction This Project will be created By Mr.D
4 B Capacity 100MB
4 B Speed 5km/h
4 B Weight 1kg
4 B Height 1m
4 B Color red

You can see that Type A has label value in Value column but the type B has label value in Value2 Column. I want to grouping for each ID and transposing a Label value to be columns like this.

Id PJ Capacity Speed Weight
1 Mr.X 100MB 10Km/h 10kg
2 Mr.A 100MB 5Km/h 1kg
3 Mr.C 100MB 5Km/h 1kg

Where PJ Column is from the Value of Introduction But we only get the People Name, and my data also have - symbol for several value.

I'm a beginner Using Python and I didn't know how to do. Because I think It's hard if I cleaning the data using excel because there is a lot of data. Thank you

CodePudding user response:

First

fillna Value and use pivot

(df
 .assign(Value=df['Value'].fillna(df['Value2']))
 .pivot('Id', 'Label', 'Value'))

output:

Label   Capacity    Introduction                            Speed   Weight
Id              
1       100MB       This Project will be created By Mr.X    10Km/h  10kg
2       100MB       This-Project-will-be-created-By-Mr.A    5km/h   1kg
3       100MB       This Project will be created By Mr.C    5km/h   1kg



Next

make PJ column and reset_index (full code including First)

(df
 .assign(Value=df['Value'].fillna(df['Value2']))
 .pivot('Id', 'Label', 'Value')
 .assign(PJ=lambda x: x['Introduction'].str.split('By[ -]').str[1])
 .iloc[:, [-1, 0, 2, 3]].reset_index())

output:

Label   Id  PJ      Capacity    Speed   Weight
0       1   Mr.X    100MB       10Km/h  10kg
1       2   Mr.A    100MB       5km/h   1kg
2       3   Mr.C    100MB       5km/h   1kg

CodePudding user response:

You should melt to first reshape the "Value..." columns, then pivot using the new "value" column:

(df.melt(['Id', 'Type', 'Label'])
   .dropna(subset=['value'])
   .pivot(index=['Id', 'Type'], columns='Label', values='value')
   .rename_axis(columns=None)
   .dropna(axis=1) # remove incomplete columns
   .reset_index()
 )

Output:

   Id Type Capacity                          Introduction   Speed Weight
0   1    A    100MB  This Project will be created By Mr.X  10Km/h   10kg
1   2    A    100MB  This-Project-will-be-created-By-Mr.A   5km/h    1kg
2   3    B    100MB  This Project will be created By Mr.C   5km/h    1kg
  • Related