Home > database >  split columns wrt column names using pandas dataframe
split columns wrt column names using pandas dataframe

Time:12-06

I have a dataframe df:

(A,B)   (B,C)   (D,B)   (E,F)
  0      3        0       1
  1      1        3       0
  2      2        4       2

I want to split it into different columns for all columns in df as shown below:

A   B   B   C   D   B   E   F
0   0   3   3   0   0   1   1
1   1   1   1   3   3   0   0
2   2   2   2   4   4   2   2

and add similar columns together:

 A  B   C   D   E   F
 0  3   3   0   1   1
 1  5   1   3   0   0
 2  6   2   4   2   2

how to achieve this using pandas?

CodePudding user response:

With , you can use this :

out = (
        df
          .T
          .reset_index()
          .assign(col= lambda x: x.pop("index").str.strip("()").str.split(","))
          .explode("col")
          .groupby("col", as_index=False).sum()
          .set_index("col")
          .T
          .rename_axis(None, axis=1)
       )

# Output :

print(out)
​
   A  B  C  D  E  F
0  0  3  3  0  1  1
1  1  5  1  3  0  0
2  2  8  2  4  2  2

CodePudding user response:

i think (A, B) as tuple

pd.concat([pd.DataFrame([df[i].tolist()] * len(i), index=list(i)) for i in df.columns]).sum(level=0).T

result:

    A   B   C   D   E   F
0   0   3   3   0   1   1
1   1   5   1   3   0   0
2   2   8   2   4   2   2

if future warning occur, use following code:

pd.concat([pd.DataFrame([df[i].tolist()] * len(i), index=list(i)) for i in df.columns]).groupby(level=0).sum().T

same result

CodePudding user response:

Use concat with removed levels with MultiIndex in columns by Series.str.findall:

df.columns = df.columns.str.findall('(\w )').map(tuple)

df = (pd.concat([df.droplevel(x, axis=1) for x in range(df.columns.nlevels)], axis=1)
        .groupby(level=0, axis=1)
        .sum())
print (df)
   A  B  C  D  E  F
0  0  3  3  0  1  1
1  1  5  1  3  0  0
2  2  8  2  4  2  2

For write ouput to file without index use:

df.to_csv('file.csv', index=False)

CodePudding user response:

You can use findall to extract the variables in the header, then melt and explode, finallypivot_table:

out = (df
   .reset_index().melt('index')
   .assign(variable=lambda d: d['variable'].str.findall('(\w )'))
   .explode('variable')
   .pivot_table(index='index', columns='variable', values='value', aggfunc='sum')
   .rename_axis(index=None, columns=None)
)

Output:

   A  B  C  D  E  F
0  0  3  3  0  1  1
1  1  5  1  3  0  0
2  2  8  2  4  2  2

Reproducible input:

df = pd.DataFrame({'(A,B)': [0, 1, 2],
                   '(B,C)': [3, 1, 2],
                   '(D,B)': [0, 3, 4],
                   '(E,F)': [1, 0, 2]})

printing/saving without index:

print(out.to_string(index=False))
 A  B  C  D  E  F
 0  3  3  0  1  1
 1  5  1  3  0  0
 2  8  2  4  2  2


# as file
out.to_csv('yourfile.csv', index=False)
  • Related