Home > Net >  Split columns conditionally on string
Split columns conditionally on string

Time:05-17

I have a data frame with the following shape:

    0   1
0   OTT:81  DVBC:398
1   OTT:81  DVBC:474
2   OTT:81  DVBC:474
3   OTT:81  DVBC:454
4   OTT:81  DVBC:443
5   OTT:1   DVBC:254
6   DVBC:151    None
7   OTT:1   DVBC:243
8   OTT:1   DVBC:254
9   DVBC:227    None

I want for column 1 to be same as column 0 if column 1 contains "DVBC". The split the values on ":" and the fill the empty ones with 0. The end data frame should look like this

    OTT DVBC
0   81  398
1   81  474
2   81  474
3   81  454
4   81  443
5   1   254
6   0   151
7   1   243
8   1   254
9   0   227

I try to do this starting with:

if df[0].str.contains("DVBC") is True:
    df[1] = df[0]

But after this the data frame looks the same not sure why. My idea after is to pass the values to the respective columns then split by ":" and rename the columns. How can I implement this?

CodePudding user response:

Universal solution for split values by : and pivoting- first create Series by DataFrame.stack, split by Series.str.splitSeries.str.rsplit and last reshape by DataFrame.pivot:

df = df.stack().str.split(':', expand=True).reset_index()

df = df.pivot('level_0',0,1).fillna(0).rename_axis(index=None, columns=None)
print (df)
  DVBC OTT
0  398  81
1  474  81
2  474  81
3  454  81
4  443  81
5  254   1
6  151   0
7  243   1
8  254   1
9  227   0

CodePudding user response:

Here is one way that should work with any number of columns:

(df
 .apply(lambda c: c.str.extract(':(\d )', expand=False))
 .ffill(axis=1)
 .mask(df.replace('None', pd.NA).isnull().shift(-1, axis=1, fill_value=False), 0)
)

output:

    OTT   DVBC
0    81   398
1    81   474
2    81   474
3    81   454
4    81   443
5     1   254
6     0   151
7     1   243
8     1   254
9     0   227
  • Related