I have following csv
Name Date Qty Date Qty Date Qty
---------------------------------------------------
ABC Jan 2023 10 Feb 2023 11 Mar 2023 12
XYZ Jan 2023 20 Feb 2023 21 Mar 2023 22
I want output as follows in csv/dataframe
Name Date Qty
---------------------
ABC Jan 2023 10
ABC Feb 2023 11
ABC Mar 2023 12
XYZ Jan 2023 20
XYZ Feb 2023 21
XYZ Mar 2023 22
How I achieve this result?
CodePudding user response:
A bit complicated but does the job. You can execute step by step to view the transformation:
>>> (df.melt('Name').assign(row=lambda x: x.groupby('variable').cumcount())
.pivot(['row', 'Name'], 'variable', 'value')
.reset_index('Name').rename_axis(index=None, columns=None))
Name Date Qty
0 ABC Jan 2023 10
1 XYZ Jan 2023 20
2 ABC Feb 2023 11
3 XYZ Feb 2023 21
4 ABC Mar 2023 12
5 XYZ Mar 2023 22
CodePudding user response:
Less streamlined solution compared to @Corralien's. Also uses melt and pivot.
import pandas as pd
import io
#-----------------------------------------------#
#Recreate OP's table with duplicate column names#
#-----------------------------------------------#
df = pd.read_csv(io.StringIO("""
ABC Jan-2023 10 Feb-2023 11 Mar-2023 12
XYZ Jan-2023 20 Feb-2023 21 Mar-2023 22
"""),header=None,delim_whitespace=True)
df.columns = ['Name','Date','Qty','Date','Qty','Date','Qty']
#-----------------#
#Start of solution#
#-----------------#
#melt from wide to long (maintains order)
melted_df = df.melt(
id_vars='Name',
var_name='col',
value_name='val',
)
#add a number for Date1/Date2/Date3 to keep track of Qty1/Qty2/Qty3 etc
melted_df['col_number'] = melted_df.groupby(['Name','col']).cumcount()
#pivot back to wide form
wide_df = melted_df.pivot(
index=['Name','col_number'],
columns='col',
values='val',
).reset_index().drop(columns=['col_number'])
wide_df.columns.name = None #remove column index name
#Final output
print(wide_df)
Output
Name Date Qty
0 ABC Jan-2023 10
1 ABC Feb-2023 11
2 ABC Mar-2023 12
3 XYZ Jan-2023 20
4 XYZ Feb-2023 21
5 XYZ Mar-2023 22
CodePudding user response:
import pandas as pd
# create a dataframe from your original csv
df = pd.read_csv("original.csv")
# use pandas' melt function to transform the dataframe
df = df.melt(id_vars=["Name"], var_name="Date", value_name="Qty")
# split the "Date" column into two separate columns for "Date" and "Qty"
df[['Date', 'Qty']] = df['Date'].str.split(expand=True)
# change the "Date" and "Qty" columns to appropriate data types
df["Date"] = pd.to_datetime(df["Date"])
df["Qty"] = df["Qty"].astype(int)
# sort the values by Name and Date
df = df.sort_values(by=["Name", "Date"])
# write the transformed dataframe to a new csv
df.to_csv("transformed.csv", index=False)