Home > database >  How to read data that has been split into multiple columns?
How to read data that has been split into multiple columns?

Time:12-11

I have the following dataframe:

q 
  1 0.83         97 0.7         193 0.238782    289 0.129692    385 0.090692
  2 0.75         98 0.7         194 0.238782    290 0.129692    386 0.090692
  ...
 96 0.94693     192 0.299753    288 0.145046    384 0.0965338   480 0.0823061

This data comes from somewhere else, and it has been split. However, the values correspond to a single variable 'q', along with its indices. To clarify, even though there are many columns, they all correspond to one column 'q', plus an index column (notice that the starting index of each column is the continuation of the end of the previous column).

How can I read the data with pandas? I believe I can do it by assigning names to each column and then merging them all together, but I was looking for a more elegant solution. Plus, the number of columns is not fixed.

This is the code that I am using at the moment:

q_param = pd.read_csv('Initial_solutions/initial_q_20y.dat', delim_whitespace=True)

Which does not do the trick. I would prefer to use pandas to solve this issue, but I can also work without it.

EDIT:

At the request of @user17242583, the following command:

print(q_param.head().to_dict())

Gives this output:

{'q': {(1, 0.83, 97, 0.7, 193, 0.238782, 289, 0.129692, 385): 0.090692, (2, 0.75, 98, 0.7, 194, 0.238782, 290, 0.129692, 386): 0.090692, (3, 0.64, 99, 0.64, 195, 0.238782, 291, 0.129692, 387): 0.090692, (4, 0.7, 100, 0.7, 196, 0.238782, 292, 0.129692, 388): 0.0884839, (5, 0.64, 101, 0.64, 197, 0.238782, 293, 0.129692, 389): 0.090692}}

CodePudding user response:

Try this:

data = {
    0: pd.concat(q[c] for c in q.columns[0::2]).reset_index(drop=True),
    1: pd.concat(q[c] for c in q.columns[1::2]).reset_index(drop=True),
}
df = pd.DataFrame(data)

Output:

>>> df
      0         1
0     1  0.830000
1     2  0.750000
2     3  0.640000
3     4  0.700000
4     5  0.640000
5    97  0.700000
6    98  0.700000
7    99  0.640000
8   100  0.700000
9   101  0.640000
10  193  0.238782
11  194  0.238782
12  195  0.238782
13  196  0.238782
14  197  0.238782
15  289  0.129692
16  290  0.129692
17  291  0.129692
18  292  0.129692
19  293  0.129692
20  385  0.090692
21  386  0.090692
22  387  0.090692
23  388  0.088484
24  389  0.090692

CodePudding user response:

It seems most of your data is index. Try:

df = pd.DataFrame(q_param).reset_index()
  • Related