I'll try to represent my problem, basing on simple example below. I have below main csv and I am trying to split into 2 or more csv basing on column header, keeping the unique column id in intact in every csv file.
Below is the code I am trying to figure out, but not quite getting the result.
import pandas as pd
df = pd.read_csv('abc.csv')
df[['id','name','age']] = df['csv1'].str.split(' ', expand=True)
csv
id name age color Gender
0 101 Jack 23 white M
1 102 Mary 25 black F
2 103 Tom 24 brown M
Output required
csv1
id name age
0 101 Jack 23
1 102 Mary 25
2 103 Tom 24
csv2 -
id color Gender
0 101 white M
1 102 black F
2 103 brown M
CodePudding user response:
UPDATE
I found a better apporach with np.array_split
I used this example df:
x y R TR x_c y_c xxx yyy RRR TTTR xxx_c yyy_c
id
1256780.0 13989 6241 6.689222 20.986341 14050.83 6315.33 213989 36241 46.689222 520.986341 614050.83 76315.33
12000.0 14013 6278 53.152036 0.000000 14060.00 6288.00 214013 36278 453.152036 5.000000 614060.00 76288.00
1100.0 14111 6379 87.598357 5.000000 14070.55 7000.00 214111 36379 487.598357 55.000000 614070.55 76288.00
which has 12 columns.
# the 4 means, split the df into 4 evenly sized chunks
chunks = np.array_split(df,4, axis=1)
Chunks
is a list containing all seperate dataframes.
Output:
# chunks[0]
x y R
id
1256780.0 13989.0 6241.0 6.689222
12000.0 14013.0 6278.0 53.152036
1100.0 14111.0 6379.0 87.598357
# chunks[1]
TR x_c y_c
id
1256780.0 20.986341 14050.83 6315.33
12000.0 0.000000 14060.00 6288.00
1100.0 5.000000 14070.55 7000.00
# chunks[2]
xxx yyy RRR
id
1256780.0 213989.0 36241.0 46.689222
12000.0 214013.0 36278.0 453.152036
1100.0 214111.0 36379.0 487.598357
# chunks[3]
TTTR xxx_c yyy_c
id
1256780.0 520.986341 614050.83 76315.33
12000.0 5.000000 614060.00 76288.00
1100.0 55.000000 614070.55 76288.00
Old answer:
You could calculate half of the columns and then use iloc
to split the dataframes into two parts.
df = df.set_index('id')
half = len(df.columns)//2
df1, df2 = df.iloc[:,:half], df.iloc[:,half:]
df1 = df1.reset_index()
df2 = df2.reset_index()
Output:
#df1
id name age
0 101 Jack 23
1 102 Mary 25
2 103 Tom 24
#df2
id color Gender
0 101 white M
1 102 black F
2 103 brown M