Home > Blockchain >  How to split text file with Pipe delimiter in Python and then have number of columns equal to number
How to split text file with Pipe delimiter in Python and then have number of columns equal to number

Time:11-18

I have a text file as shown below with "|" delimiters, the number of values for an attribute can be any. Need new columns based equal to number of values for an attribute.

Attribute1|6
Attribute2|10|15|27
Attribute3|3|7

Required output in DataFrame should be:

Attribute1         6
Attribute2_val_1  10
Attribute2_val_2  15
Attribute3_val_3  27
Attribute3_val_1   3
Attribute3_val_2   7

CodePudding user response:

You can use the following:

MAX_COL = 10
df = (pd.read_csv('filename.txt', sep=r'\|', names=range(MAX_COL 1))
        .dropna(how='all', axis=1)
        .set_index(0).rename_axis('id').stack()
        .reset_index(name='value')
        .assign(id=lambda d: (d['id'].where(d.groupby('id')['id'].transform('size').le(1),
                                            d['id'] '_val_' d['level_1'].astype(str))
                             )
               )
        .drop(columns=['level_1'])
      )

output:

>>> df
                 id  value
0        Attribute1    6.0
1  Attribute2_val_1   10.0
2  Attribute2_val_2   15.0
3  Attribute2_val_3   27.0
4  Attribute3_val_1    3.0
5  Attribute3_val_2    7.0

alternative:

with open('filename.txt') as f:
    df = pd.DataFrame([[s[0], s[1:]] for s in
                        map(lambda s: s.strip().split('|'), f.readlines())],
                      columns=['id', 'value']
                     )

df = (df.explode('value')
        .assign(id=lambda d: (d['id'].where(d.groupby('id')['id'].transform('size').le(1),
                                            d['id'] '_val_' d.groupby('id')['id'].cumcount().add(1).astype(str))
                             )
               )
      )

output:

                 id value
0        Attribute1     6
1  Attribute2_val_1    10
1  Attribute2_val_2    15
1  Attribute2_val_3    27
2  Attribute3_val_1     3
2  Attribute3_val_2     7
  • Related