I have a NumPy array of the form (3*1):
['1 2 3', '4 5 6 7 8 9', '10 11 12 13'].
Each row is a collection of feature values and each row has a different number of features. How can I split this array into multiple columns with each column representing a feature for all rows?
Update: I am now using on DataFrames. So, I have a data frame with one column. Want to expand it and also convert it to integers. How can I do that?
Expectation: I want to split data in each row into multiple columns (features). Then, each column to numerics.
CodePudding user response:
You can use expand=True
in str.split
like below:
df = pd.DataFrame({'features':['191 367 614 634 711',
'1202 1220 131 1730 2281 2572 2602 2611 2824',
'2855 2940 3149 3313 3560 3568 3824 4185 4266']})
df.features.str.split(expand=True).astype(float).add_prefix('feature_')
Output:
feature_0 feature_1 feature_2 ... feature_6 feature_7 feature_8
0 191.0 367.0 614.0 ... NaN NaN NaN
1 1202.0 1220.0 131.0 ... 2602.0 2611.0 2824.0
2 2855.0 2940.0 3149.0 ... 3824.0 4185.0 4266.0
CodePudding user response:
Since numpy doesn't support non-rectangular arrays, it would be easiest to use pandas for this:
df = pd.concat([pd.Series(l) for l in pd.Series(a).str.split(' ').tolist()], axis=1).T.add_prefix('feature_')
Output:
>>> df
feature_0 feature_1 feature_2 feature_3 feature_4 feature_5
0 1 2 3 NaN NaN NaN
1 4 5 6 7 8 9
2 10 11 12 13 NaN NaN
CodePudding user response:
You show a list (but the following also works if it's an array of strings):
In [44]: alist = ['1 2 3', '4 5 6 7 8 9', '10 11 12 13']
In [45]: alist
Out[45]: ['1 2 3', '4 5 6 7 8 9', '10 11 12 13']
Split each string into a list:
In [46]: blist = [s.split() for s in alist]
In [47]: blist
Out[47]: [['1', '2', '3'], ['4', '5', '6', '7', '8', '9'], ['10', '11', '12', '13']]
convert each number string to int:
In [48]: clist = [[int(x) for x in s] for s in blist]
In [49]: clist
Out[49]: [[1, 2, 3], [4, 5, 6, 7, 8, 9], [10, 11, 12, 13]]
Since the sublists differ in length, I'm going to refrain from processing this as arrays. If you specify how such a list of lists maps on to your notion of features, we can proceed.