Home > Back-end >  python-CSV Multiple Columns with the same header into one column
python-CSV Multiple Columns with the same header into one column

Time:11-11

I have a CSV file with company data with 22 rows and 6500 columns. The columns have the same names and I should get the columns with the same names stacked into individual columns according to their headers.

I have now the data in one df like this:

Y   C   Y   C   Y   C

1.  a   1.  b.  1.  c.

2.  a.  2.  b.  2.  c.  

and I need to get it like this:

Y   C

1.  a. 

2.  a. 

1.  b.

2.  b.

1.  c.

2.  c. 

CodePudding user response:

I would try an attempt where you slice the df in chunks by iteration and concat them back together, since the column names can't be identified distinctly.

EDIT
Changed answer to new input:

chunksize = 2
stacked_df = (
    pd.concat(
        [
            df.iloc[:, i:i chunksize] for i in range(0, len(df.columns), chunksize)
        ]
    )
    .reset_index(drop=True))
print(stacked_df)
   Y  C
0  1  a
1  2  a
2  1  b
3  2  b
4  1  c
5  2  c

CodePudding user response:

I couldn't resist looking for a solution. The best I found so far accounts for the fact that pd.read_csv addresses repeated column names by appending '.N' to the duplicates.

In [2]: df = pd.read_csv('duplicate_columns.csv')
In [3]: df
Out[3]:
   1  2  3  4 1.1 2.1 3.1 4.1 1.2 2.2 3.2 4.2
0  a  q  j  e   w   e   r   t   y   u   d   s
1  b  w  w  f   c   e   f   g   d   c   s   a
2  d  q  e  h   c   f   b   f   a   w   q   r

To put your data into the same column...

  1. Group the columns by their original names.
  2. Apply a flattener to convert to a series of arrays.
  3. Create a new data frame from the series viewed as a dict.
In [3]: grouper = lambda l: l.split('.')[0]    # peels off added suffix
In [4]: flattener = lambda v: v.stack().values # reshape groups
In [4]: pd.DataFrame(df.groupby(by=grouper, axis='columns')
   ...:                .apply(flattener)
   ...:                .to_dict())
Out[4]:
   1  2  3  4
0  a  q  j  e
1  w  e  r  t
2  y  u  d  s
3  b  w  w  f
4  c  e  f  g
5  d  c  s  a
6  d  q  e  h
7  c  f  b  f
8  a  w  q  r

I'd love to see a cleaner, less obtuse, general solution.

  • Related