Home > Enterprise >  Split Single Column(1,000 rows) into two smaller columns(500 each)
Split Single Column(1,000 rows) into two smaller columns(500 each)

Time:06-17

How to split a single column containing 1000 rows into chunk of two columns containing 500 rows per column in pandas.

I have a csv file that contains a single column and I need to split this into multiple columns. Below is the format in csv.

Steps I took:
I had multiple csv files containing one column with 364 rows. I concatenated them after converting them into a dataframe, but it copies the file in a linear fashion.

Code I tried

monthly_list = []
for file in ['D0_monthly.csv','c1_monthly.csv','c2_monthly.csv','c2i_monthly.csv','c3i_monthly.csv','c4i_monthly.csv','D1_monthly.csv','D2i_monthly.csv','D3i_monthly.csv','D4i_monthly.csv',
                        'D2j_monthly.csv','D3j_monthly.csv','D4j_monthly.csv','c2j_monthly.csv','c3j_monthly.csv','c4j_monthly.csv']:
    monthly_file = pd.read_csv(file,header=None,index_col=None,skiprows=[0])
    monthly_list.append(monthly_file)
monthly_all_file = pd.concat(monthly_list)

How the data is:

column1
1
2
3
.
.
364
1
2
3
.
.
364

I need to split the above column in the format shown below.
What the data should be:

column1 column2
1 1
2 2
3 3
4 4
5 5
. .
. .
. .
364 364

CodePudding user response:

I don't know if anyone has a more efficient solution but using pd. merge on a temp column should solve your issue. Here is a quick implementation of what you could write.

csv1['temp'] = 1
csv2['temp'] = 1
new_df=pd.merge(csv1,csv2,on=["temp"])
new_df.drop("temp",axis=1)

I hope this helps!

CodePudding user response:

This answer just shows 10 target rows as an example. That can easily be changed to 364 or 500.

A dataframe where column1 contains 2 sets of 10 rows

tgt_row_len = 10

df = pd.DataFrame({'column1': np.tile(np.arange(1,tgt_row_len 1),2)})
print(df)

    column1
0         1
1         2
2         3
3         4
4         5
5         6
6         7
7         8
8         9
9        10
10        1
11        2
12        3
13        4
14        5
15        6
16        7
17        8
18        9
19       10

Move the bottom set of rows to column2

df.assign(column2=df['column1'].shift(-tgt_row_len)).iloc[:tgt_row_len].astype(int)

   column1  column2
0        1        1
1        2        2
2        3        3
3        4        4
4        5        5
5        6        6
6        7        7
7        8        8
8        9        9
9       10       10
  • Related