I have a DataFrame that looks as follows. There are two columns, the second of which contains numpy arrays that differ in shape (here: (2, 1)
, (2, 2)
, (2, 3)
). Example:
class data
0 0 [[3], [17]]
1 1 [[9, 5], [8, 19]]
2 1 [[8, 16, 13], [17, 19, 10]]
I would now like to flatten the data
column to get a 1D array [3, 17, 9, 5, 8, 19, 8, 16, 13, 17, 19, 10]
, apply a function to this vector, and restore the original shape of the DataFrame. For example, if I want to subtract the mean of the vector from all elements, the desired output is this:
class data
0 0 [[-9], [5]]
1 1 [[-3, -7], [-4, 7]]
2 1 [[-4, 4, 1], [5, 7, -2]]
How can I best achieve this transformation?
Edit for @mozway:
I generated the DataFrame like this:
data = []
np.random.seed(8)
for i in range(1, 4):
data.append(np.random.randint(0, 20, (2, i)))
category = {"class": [0, 1, 1]}
df = pd.DataFrame(category)
df["data"] = data
A function to transform the 1D array mentioned before would be arr -= np.mean(arr)
.
CodePudding user response:
Assuming flat
is an array as below:
[-9. 5. -3. -7. -4. 7. -4. 4. 1. 5. 7. -2.]
One approach could be the following:
def nested_unflatten(da, placeholder):
res = []
for e in placeholder:
if isinstance(e, Iterable):
res.append(nested_unflatten(da, e))
else:
res.append(next(da))
return res
flat = np.array([-9., 5., -3., -7., -4., 7., -4., 4., 1., 5., 7., -2.])
un_flat = nested_unflatten(iter(flat), df["data"])
print(un_flat)
Output
[[[-9.0], [5.0]], [[-3.0, -7.0], [-4.0, 7.0]], [[-4.0, 4.0, 1.0], [5.0, 7.0, -2.0]]]
If you are also interested in a flatten
function:
def flatten(da):
res = []
for e in da:
if isinstance(e, Iterable):
res.extend(flatten(e))
else:
res.append(e)
return res
It can be used to obtain flat
from the example, like:
flat = np.array(flatten(df["data"]), dtype=np.float64)
flat -= np.mean(flat)