Home > Software design >  How to create a dataframe from existing data frame where few column are appended as rows?
How to create a dataframe from existing data frame where few column are appended as rows?

Time:05-19

I have a data frame like below:

data = {'Age': [20, 30, 19, 21],'city1':['ny','nj','ln','tampa'],'country1':['usa','usa','usa','usa'],'city2':['london','edinburg',np.nan,'tampa'],
        'country2':['uk','uk','uk','usa'],
        'city1':['ny','london',np.nan,np.nan],'country2':['usa','uk',np.nan,np.nan]}  
df1=pd.DataFrame(data)
print(df1)
    Age city1   country1    city2   country2
0   20  ny      usa        london    usa
1   30  london  usa        edinburg  uk
2   19  NaN     usa        NaN       NaN
3   21  NaN     usa        tampa     NaN

Now I want to create a new data frame where the age column values are repeated based on the half of the number of columns leaving the age column. In above data frame leaving age column there are four column who's half is 2. So, the age column values have to be repeated twice. Once the new age column is formed then I need to append city1, country1 has a row and city2,country2 as a second row(something like shown in the excepted output). Although I was able to repeat the values as list and tried to get the values as a list from the other columns and append as rows like shown below:-

code:-
#for repeating the value.
main_list = np.repeat(df1['Age'],2)
#for getting the column values 
r=[]
for i in range(len(df1)):
    r.append(df1.iloc[:,1:3].loc[i].values.tolist())
print(r)
[['ny', 'usa'], ['london', 'usa'], [nan, 'usa'], [nan, 'usa']]

but as you see it gives only the values for city1,country1 but not for city2,country2 which is raising an error when appending the list r values as rows to the new data frame as below:-

newdata = {'Age':main_list}
res=pd.DataFrame(newdata)
print(res)
   Age
0   20
0   20
1   30
1   30
2   19
2   19
3   21
3   21

res.loc[len(res)] = r
print(res)
ValueError: cannot set a row with mismatched columns

How do I get the excepted list of values and create a data frame like below:-

Excepted output:-

r =[['ny', 'usa'], ['london', 'usa'],['london', 'usa'],['edinburg','uk'],
                     [nan, 'usa'],[nan,nan],[nan, 'usa'],['tampa',nan]]

Final data frame:-

   Age  city     country
0   20  'ny'     'usa'
0   20  'london' 'usa'
1   30  'london' 'usa'
1   30  'edinburg''uk'
2   19   NaN     'usa'
2   19   NaN      NaN
3   21   NaN     'usa'
3   21  'tampa'  NaN

CodePudding user response:

You can use wide_to_long:

(pd
 .wide_to_long(df1.reset_index(),
               stubnames=['city', 'country'], i=['index', 'Age'], j='id')
 .droplevel(-1)
 .reset_index('Age')
)

output:

       Age      city country
index                       
0       20        ny     usa
0       20    london     usa
1       30    london     usa
1       30  edinburg      uk
2       19       NaN     usa
2       19       NaN     NaN
3       21       NaN     usa
3       21     tampa     NaN

CodePudding user response:

You could also use the function pivot_longer provided by janitor:

import janitor
df1.pivot_longer('Age', names_to = ['.value', 'group'], names_pattern = '(\\D )(\\d )')

   Age group      city country
0   20     1        ny     usa
1   30     1    london     usa
2   19     1       NaN     usa
3   21     1       NaN     usa
4   20     2    london     usa
5   30     2  edinburg      uk
6   19     2       NaN     NaN
7   21     2     tampa     NaN
  • Related