I have a textfile with the separator ";".
"Age ";"AgeRange "
0;"000019";
1;"000019";
2;"000019";
3;"000019";
4;"000019";
5;"000019";
6;"000019";
7;"000019";
8;"000019";
9;"000019";
10;"000019";
11;"000019";
12;"000019";
13;"000019";
14;"000019";
15;"000019";
16;"000019";
17;"000019";
18;"000019";
19;"000019";
20;"020024";
When using pd.DataFrame with sep=";" I am getting columns Index(['Age ', 'AgeRange '], dtype='object').
with open(f, "rb") as f:
file_io_obj = BytesIO(f.read())
if config['file_type'] == 'txt':
fil:pd.DataFrame = pd.read_csv(file_io_obj, header=dataHeader, skipfooter=dataSkipFooter, dtype=str, sep=config['file_separator'])
But in my dataframe my rows have now also a NaN value.
Age AgeRange
0 000019 NaN
1 000019 NaN
2 000019 NaN
3 000019 NaN
4 000019 NaN
I want the following DataFrame:
Age AgeRange
0 000019
1 000019
2 000019
3 000019
4 000019
Same script will handle multiple files with the same setup:
"Inst ";"Year ";"WageType ";"Budget/consumption ";"consumption.type ";"consumption.type "
"DY";"2017";"_L_";"F";"90";"DY201790";
"DY";"2017";"000";"B";"01";"DY201701";
"DY";"2017";"000";"F";"01";"DY201701";
"DY";"2017";"005";"B";"01";"DY201701";
"DY";"2017";"005";"F";"01";"DY201701";
"DY";"2017";"006";"B";"01";"DY201701";
"DY";"2017";"006";"F";"01";"DY201701";
"DY";"2017";"008";"B";"01";"DY201701";
"DY";"2017";"008";"F";"01";"DY201701";
Can anyone help? Thanks in advance.
CodePudding user response:
Do this:
import pandas as pd
data = pd.read_csv('age.txt', sep=';', dtype='str', usecols=[0,1])
or
data = pd.read_csv('age.txt', sep=';', dtype='str', usecols=[0,1]).reset_index(drop=True)
which returns
Age AgeRange
0 0 000019
1 1 000019
2 2 000019
3 3 000019
4 4 000019
5 5 000019
6 6 000019
7 7 000019
8 8 000019
9 9 000019
10 10 000019
11 11 000019
12 12 000019
13 13 000019
14 14 000019
15 15 000019
16 16 000019
17 17 000019
18 18 000019
19 19 000019
20 20 020024
CodePudding user response:
Is this what you need?
import pandas as pd
data = pd.read_csv('yourfile.txt', sep=';', dtype='str', usecols=[1])
Then if needed, rename your index:
data.index.set_names(['Age'], inplace=True)
And/or reset it:
data.reset_index(inplace=True)
Output:
Age AgeRange
0 0 000019
1 1 000019
2 2 000019
3 3 000019
4 4 000019
etc.