Pandas handle textfile ending with separator-CodePudding

I have a textfile with the separator ";".

"Age ";"AgeRange "
0;"000019";
1;"000019";
2;"000019";
3;"000019";
4;"000019";
5;"000019";
6;"000019";
7;"000019";
8;"000019";
9;"000019";
10;"000019";
11;"000019";
12;"000019";
13;"000019";
14;"000019";
15;"000019";
16;"000019";
17;"000019";
18;"000019";
19;"000019";
20;"020024";

When using pd.DataFrame with sep=";" I am getting columns Index(['Age ', 'AgeRange '], dtype='object').

    with open(f, "rb") as f:
        file_io_obj = BytesIO(f.read())

    if config['file_type'] == 'txt':
        fil:pd.DataFrame = pd.read_csv(file_io_obj, header=dataHeader, skipfooter=dataSkipFooter, dtype=str, sep=config['file_separator'])

But in my dataframe my rows have now also a NaN value.


  Age  AgeRange 
0  000019             NaN
1  000019             NaN
2  000019             NaN
3  000019             NaN
4  000019             NaN

I want the following DataFrame:

Age  AgeRange
0    000019
1    000019
2    000019
3    000019
4    000019

Same script will handle multiple files with the same setup:

"Inst ";"Year ";"WageType ";"Budget/consumption ";"consumption.type ";"consumption.type "
"DY";"2017";"_L_";"F";"90";"DY201790";
"DY";"2017";"000";"B";"01";"DY201701";
"DY";"2017";"000";"F";"01";"DY201701";
"DY";"2017";"005";"B";"01";"DY201701";
"DY";"2017";"005";"F";"01";"DY201701";
"DY";"2017";"006";"B";"01";"DY201701";
"DY";"2017";"006";"F";"01";"DY201701";
"DY";"2017";"008";"B";"01";"DY201701";
"DY";"2017";"008";"F";"01";"DY201701";

Can anyone help? Thanks in advance.

CodePudding user response：

Do this:

import pandas as pd
data = pd.read_csv('age.txt', sep=';', dtype='str', usecols=[0,1])

data = pd.read_csv('age.txt', sep=';', dtype='str', usecols=[0,1]).reset_index(drop=True)

which returns

   Age  AgeRange 
0     0    000019
1     1    000019
2     2    000019
3     3    000019
4     4    000019
5     5    000019
6     6    000019
7     7    000019
8     8    000019
9     9    000019
10   10    000019
11   11    000019
12   12    000019
13   13    000019
14   14    000019
15   15    000019
16   16    000019
17   17    000019
18   18    000019
19   19    000019
20   20    020024

CodePudding user response：

Is this what you need?

import pandas as pd
data = pd.read_csv('yourfile.txt', sep=';', dtype='str', usecols=[1])

Then if needed, rename your index:

data.index.set_names(['Age'], inplace=True)

And/or reset it:

data.reset_index(inplace=True)

Output:

    Age AgeRange
0   0   000019
1   1   000019
2   2   000019
3   3   000019
4   4   000019

etc.