Home > OS >  How to add new dataframe column from json type series(string)
How to add new dataframe column from json type series(string)

Time:12-29

I want to parse and process my dataframe data.

I tried using join, assign ... etc. I succeed parsing 'allowed' column with below code:

allowed_expanded = df1.allowed.apply(lambda x:pd.Series(x))
allowed_expanded.columns = ['{}.{}'.format('allowed',i) for i in allowed_expanded]

and the result:

# allowed_expanded

                                             allowed.0                  allowed.1   allowed.2
0           {'IPProtocol': 'tcp', 'ports': ['53']}                        NaN         NaN
1   {'IPProtocol': 'tcp', 'ports': ['22', '3389']}                    NaN         NaN
2                               {'IPProtocol': 'icmp'}     {'IPProtocol': 'sctp'}         NaN
3                            {'IPProtocol': 'all'}                        NaN         NaN

but this is not what I want.

what should do I do ?

now my data looks:

# print(df)
          network                                            allowed
0           vpc-1           [{'IPProtocol': 'tcp', 'ports': ['53']}]
1           vpc-1   [{'IPProtocol': 'tcp', 'ports': ['22', '3389']}]
2           vpc-1   [{'IPProtocol': 'icmp'}, {'IPProtocol': 'sctp'}]
3           vpc-1                            [{'IPProtocol': 'all'}]
...

and... what I want:

# print(df)
          network           allowed.IPProtocol    allowed.ports
0           vpc-1                          tcp               53
1           vpc-1                          tcp         22, 3389
2           vpc-1                   icmp, sctp                -
3           vpc-1                          all                -
...

CodePudding user response:

def func(row):
    IPProtocol = []
    ports = []
    for item in row:
        IPProtocol.append(item.get('IPProtocol', None))
        ports.append(item.get('ports', None))
    return pd.Series([IPProtocol, ports])

df[['allowed.IPProtocol', 'allowed.ports']] = df['allowed'].apply(lambda x: func(x))

I hope it helps!

CodePudding user response:

Can you try this:

import numpy as np

df['allowed.IPProtocol']=df['allowed'].apply(lambda x: ', '.join([i['IPProtocol'] for i in x]))
df['allowed.ports']=df['allowed'].apply(lambda x: ', '.join([', '.join(i['ports']) if 'ports' in list(i.keys()) else 'nan' for i in x]))

Output:

|    | network   | allowed                                          | allowed.IPProtocol   | allowed.ports   |
|---:|:----------|:-------------------------------------------------|:---------------------|:----------------|
|  0 | vpc-1     | [{'IPProtocol': 'tcp', 'ports': ['53']}]         | tcp                  | 53              |
|  1 | vpc-1     | [{'IPProtocol': 'tcp', 'ports': ['22', '3389']}] | tcp                  | 22, 3389        |
|  2 | vpc-1     | [{'IPProtocol': 'icmp'}, {'IPProtocol': 'sctp'}] | icmp, sctp           | nan, nan        |
|  3 | vpc-1     | [{'IPProtocol': 'all'}]                          | all                  | nan             |
  • Related