Home > Software engineering >  Sorting misaligned DataFrame by certain columns
Sorting misaligned DataFrame by certain columns

Time:12-22

I have a dataFrame which has non aligned columns, which are of different lengths. How do I sort the dataFrame to have it start at a certain year which there exists values for all. Furthermore some of the year data has some noise e.g. USA 1858.

Year Value Country Year Value Country Year   Value Country
1900 1000  France  1920 1250  Germany 1855    872   USA
1901 1010  France  1921 1255  Germany 1856    870   USA
1902 1014  France  1922 1258  Germany 1857    885   USA
1903 1020  France  1923 1278  Germany 1858[a] 895   USA

                                      2021    2680  USA

The values are blank at the end of the dataFrame for columns that are shorter than the USA.

CodePudding user response:

i hope that what you need :

clomuns_year=[col for col in df_1D if 'Year'in col]
clomuns_value=[col for col in df_1D if 'Value'in col]
clomuns_country=[col for col in df_1D if 'Country'in col]
new_df=pd.DataFrame()
new_df['Year']=df[clomuns_year].values.tolist()[0]#i propose that your data frame name is df (you can adapte it 
new_df['Value']=df[clomuns_value].values.tolist()[0]
new_df['Country']=df[clomuns_country].values.tolist()[0]
new_df.sort_values(by=['Year'])

CodePudding user response:

You could use pd.concat. This will be specific to this issue and will require for you to know the structure of your columns though.

# df data
>>> df
   Year  Value Country    Year   Value  Country     Year  Value Country
0  1900   1000  France  1920.0  1250.0  Germany     1855  872.0     USA
1  1901   1010  France  1921.0  1255.0  Germany     1856  870.0     USA
2  1902   1014  France  1922.0  1258.0  Germany     1857  885.0     USA
3  1903   1020  France  1923.0  1278.0  Germany  1858[a]  895.0     USA
4  2021   2680     USA     NaN     NaN     None     None    NaN    None

new = pd.concat([df.iloc[:, :3], df.iloc[:, 3:6], df.iloc[:, 6:9]])
>>> new
    Year    Value   Country
0   1900    1000.0  France
1   1901    1010.0  France
2   1902    1014.0  France
3   1903    1020.0  France
4   2021    2680.0  USA
0   1920.0  1250.0  Germany
1   1921.0  1255.0  Germany
2   1922.0  1258.0  Germany
3   1923.0  1278.0  Germany
4   NaN NaN None
0   1855    872.0   USA
1   1856    870.0   USA
2   1857    885.0   USA
3   1858[a] 895.0   USA
4   None    NaN None
  • Related