Home > Software design >  How to split a csv_file into two file: one containing 40% of the original data, the other 60%. The d
How to split a csv_file into two file: one containing 40% of the original data, the other 60%. The d

Time:11-10

I have a csv file. The columns are ['A' 'B' 'C'], and there are 1000 rows of original data. A B C 1 0 1 -1 2 0 . . . 1 0 0. So I need 40% of these data in one csv_file, 60 % in the other. But first, the rows must be shuffled randomly. Hopefully using the pandas module in python.

I tried

Import pandas as pd
df=pd.read_csv('filename.csv')
np.random.permutation(df)
df[0:400].to_csv('filename1.csv')
df[401:].to_csv('filename2.csv')

but np.random.permutation(df) returns only arrays.

CodePudding user response:

Try this way

with shuffling before saving & complete snippet

import numpy as np
import pandas as pd


per = 40
mask =int(len(df))

perdf=df.head(int((mask*(per/100))))

perdf =perdf.iloc[np.random.permutation(len(perdf))]
perdf.to_csv('40perdf.csv')


perdf60=df[:mask]
perdf60 =perdf60.iloc[np.random.permutation(len(perdf60))]
perdf60.to_csv('60perdf.csv')

Note: Not tested...Pls test it & let me know

CodePudding user response:

Problem was, that You don't return result of permutation

import pandas as pd
import numpy as np

df = pd.read_csv(r"C:\temp\test1.csv", sep=',')
# source file like this
# A,B,C
# 0,1,1
# 0,0,0
# 1,1,0
# 0,0,0
# 0,0,1
# 2,0,0

df = pd.DataFrame( np.random.permutation(df))
df = df.rename(columns={0: 'A',1:'B',2:'C'})

split_place = int(df.shape[0]*0.4)
df[0:split_place].to_csv(r'c:\temp\filename1.csv', index=False, columns=None, sep=',')
# in file get somthing like
# A,B,C
# 0,0,1
# 0,0,0

df[split_place:].to_csv(r'c:\temp\filename2.csv',index=False,  sep=',')
# if don't need header, can use header=False,

more info bout saving to CSV in pandas documentation

CodePudding user response:

Use pandas.DataFrame.sample to get shuffled 40% without replacement then drop from main table to get the 60%.

df_40 = df.sample(frac=0.4)
df_60 = df.drop(df_40.index)
  • Related