I have a sample similar to the problem I am running into. Here, I have company name and revenue for 3 years. The revenue is given in 3 different datasets. When I concatenate the data, it looks as follows:
company_name 2020 Revenue 2021 Revenue 2022 Revenue
0 company_1 10.0 NaN NaN
1 company_2 20.0 NaN NaN
2 company_3 30.0 NaN NaN
3 company_1 NaN 20.0 NaN
4 company_2 NaN 30.0 NaN
5 company_3 NaN 40.0 NaN
6 company_1 NaN NaN 50.0
7 company_2 NaN NaN 60.0
8 company_3 NaN NaN 70.0
9 company_4 NaN NaN 80.0
What I am trying to do is have company name followed by the actual revenue columns. In a sense drop the duplicate company_name rows and put that data into the corresponding company_name. An image of my desired output:
company_name 2020 Revenue 2021 Revenue 2022 Revenue
0 company_1 10 20 50
1 company_2 20 30 60
2 company_3 30 40 70
3 company_4 0 0 80
CodePudding user response:
Use melt
and pivot_table
:
out = (df.melt('company_name').dropna()
.pivot_table('value', 'company_name', 'variable', fill_value=0)
.rename_axis(columns=None).reset_index())
print(out)
# Output
company_name 2020 Revenue 2021 Revenue 2022 Revenue
0 company_1 10 20 50
1 company_2 20 30 60
2 company_3 30 40 70
3 company_4 0 0 80
CodePudding user response:
You can try:
df.set_index('company_name').stack().unstack().reset_index()
Or
df.groupby('company_name', as_index=False).first()
Output:
company_name 2020 Revenue 2021 Revenue 2022 Revenue
0 company_1 10.0 20.0 50.0
1 company_2 20.0 30.0 60.0
2 company_3 30.0 40.0 70.0
3 company_4 NaN NaN 80.0
CodePudding user response:
I would say your concat might not be the join you should be using, but instead try df_merge pd.merge(df1, df2, how = 'inner', left_on = 'company', left_on = 'company')
Then you can do that against with df_merge (your newly merged data) and the next dataframe. This should keep everything in line with each other and only add columns that they do not share. If they don't only have the 2 columns you are looking at you might need to do a little more cleaning of the data to get only the results you are looking for, but that should for the most part get you started and your data all in the correct place.