I have a dataFrame which has non aligned columns, which are of different lengths. How do I sort the dataFrame to have it start at a certain year which there exists values for all. Furthermore some of the year data has some noise e.g. USA 1858.
Year Value Country Year Value Country Year Value Country
1900 1000 France 1920 1250 Germany 1855 872 USA
1901 1010 France 1921 1255 Germany 1856 870 USA
1902 1014 France 1922 1258 Germany 1857 885 USA
1903 1020 France 1923 1278 Germany 1858[a] 895 USA
2021 2680 USA
The values are blank at the end of the dataFrame for columns that are shorter than the USA.
CodePudding user response:
i hope that what you need :
clomuns_year=[col for col in df_1D if 'Year'in col]
clomuns_value=[col for col in df_1D if 'Value'in col]
clomuns_country=[col for col in df_1D if 'Country'in col]
new_df=pd.DataFrame()
new_df['Year']=df[clomuns_year].values.tolist()[0]#i propose that your data frame name is df (you can adapte it
new_df['Value']=df[clomuns_value].values.tolist()[0]
new_df['Country']=df[clomuns_country].values.tolist()[0]
new_df.sort_values(by=['Year'])
CodePudding user response:
You could use pd.concat
. This will be specific to this issue and will require for you to know the structure of your columns though.
# df data
>>> df
Year Value Country Year Value Country Year Value Country
0 1900 1000 France 1920.0 1250.0 Germany 1855 872.0 USA
1 1901 1010 France 1921.0 1255.0 Germany 1856 870.0 USA
2 1902 1014 France 1922.0 1258.0 Germany 1857 885.0 USA
3 1903 1020 France 1923.0 1278.0 Germany 1858[a] 895.0 USA
4 2021 2680 USA NaN NaN None None NaN None
new = pd.concat([df.iloc[:, :3], df.iloc[:, 3:6], df.iloc[:, 6:9]])
>>> new
Year Value Country
0 1900 1000.0 France
1 1901 1010.0 France
2 1902 1014.0 France
3 1903 1020.0 France
4 2021 2680.0 USA
0 1920.0 1250.0 Germany
1 1921.0 1255.0 Germany
2 1922.0 1258.0 Germany
3 1923.0 1278.0 Germany
4 NaN NaN None
0 1855 872.0 USA
1 1856 870.0 USA
2 1857 885.0 USA
3 1858[a] 895.0 USA
4 None NaN None