Home > Software design >  How to create new columns based on values before delimiter and add values after the delimiter in Pan
How to create new columns based on values before delimiter and add values after the delimiter in Pan

Time:02-15

I am reading a .txt in Pandas. I want to create new column names based on values before '=>' and add values in that column, based on value after '=>'

Input:

other_tags
"access"=>"agricultural","agricultural"=>"no"
"access"=>"customers"
"access"=>"customers","amenity"=>"parking"
"description"=>"GRAND PERE DE MON AMI"
"description"=>"GRAND PERE"
"design"=>"asymmetric","power"=>"tower"
"design"=>"asymmetric","power"=>"tower"

code:

import pandas as pd
df1 = pd.read_csv('try.txt', sep ='\t')
df1['access_type'] = df1['other_tags'].get('access')
df1['amenity'] = df1['other_tags'].get('amenity')
print(df1)

expected_output:

other_tags                                         access_type                            amenity
"access"=>"agricultural","agricultural"=>"no"      agricultural 
"access"=>"customers"                              customers    
"access"=>"customers","amenity"=>"parking"         customers                              parking
"description"=>"GRAND PERE DE MON AMI"      
"description"=>"GRAND PERE"     
"design"=>"asymmetric","power"=>"tower"     
"design"=>"asymmetric","power"=>"tower"     


CodePudding user response:

Explode and expand your column other_tags then pivot it and finally join it to your original dataframe.

df = df.join(df['other_tags'].str.replace('"', '')
                 .str.split(',').explode().str.split('=>', expand=True)
                 .reset_index().pivot('index', 0, 1).fillna(''))

Output

>>> df
                                      other_tags        access agricultural  amenity            description      design  power
0  "access"=>"agricultural","agricultural"=>"no"  agricultural           no                                                   
1                          "access"=>"customers"     customers                                                                
2     "access"=>"customers","amenity"=>"parking"     customers               parking                                          
3         "description"=>"GRAND PERE DE MON AMI"                                      GRAND PERE DE MON AMI                   
4                    "description"=>"GRAND PERE"                                                 GRAND PERE                   
5        "design"=>"asymmetric","power"=>"tower"                                                             asymmetric  tower
6        "design"=>"asymmetric","power"=>"tower"                                                             asymmetric  tower

If you want to keep only the 2 columns access and amenity:

COLS_TO_KEEP = ['access', 'amenity']

df = df.join(df['other_tags'].str.replace('"', '')
                 .str.split(',').explode().str.split('=>', expand=True)
                 .reset_index().pivot('index', 0, 1)[COLS_TO_KEEP].fillna(''))

Output:

>>> df
                                      other_tags        access  amenity
0  "access"=>"agricultural","agricultural"=>"no"  agricultural         
1                          "access"=>"customers"     customers         
2     "access"=>"customers","amenity"=>"parking"     customers  parking
3         "description"=>"GRAND PERE DE MON AMI"                       
4                    "description"=>"GRAND PERE"                       
5        "design"=>"asymmetric","power"=>"tower"                       
6        "design"=>"asymmetric","power"=>"tower"                       
  • Related