I have DataFrame containing values about shops and categories in one column.
Date | Spent | ... | Category/Shop |
---|---|---|---|
2022-08-04 | 126.98 | ... | Supermarkets |
2022-08-04 | NaN | ... | ShopName |
2022-08-04 | 119.70 | ... | Supermarkets |
2022-08-04 | NaN | ... | ShopName |
...
I need to separate last column into to columns:
Date | Spent | ... | Category | Shop |
---|---|---|---|---|
2022-08-04 | 126.98 | ... | Supermarkets | ShopName |
2022-08-04 | 119.70 | ... | Supermarkets | ShopName |
How can this be done?
We can assume that every second row in the Category/Shop column contains the name of the store that needs to be moved to a new column.
CodePudding user response:
Based on the sample and expecting future similar behavior I would do it with groupby
df = df.fillna(method='ffill').groupby(['Date','Spent'])['Category/Shop'].apply(list).reset_index()
df['Category'],df['Shop'] = df['Category/Shop'].str[0],df['Category/Shop'].str[1]
df = df.drop(columns='Category/Shop')
Outputting:
Date Spent Category Shop
0 2022-08-04 119.70 Supermarkets ShopName
1 2022-08-04 126.98 Supermarkets ShopName
CodePudding user response:
Apply the pandas series str. split() function on the “Address” column and pass the delimiter (comma in this case) on which you want to split the column. Also, make sure to pass True to the expand parameter.
CodePudding user response:
I would go for iloc
to retrieve every second row, and build a new dataframe with pd.concat
.
Given df
is your source DataFrame, it would look like this:
pd.concat(
[
# start at the first row and use every 2nd row of df
df.iloc[::2].reset_index(drop=True),
# start at the second row and use every 2nd row of df, but only the last column
df.iloc[1::2]["category/shop"].reset_index(drop=True)
],
# concatenate along columns
axis=1
)
Output:
date spent category/shop category/shop
0 2022-08-04 126.98 Supermarkets ShopName
1 2022-08-04 119.70 Supermarkets ShopName