I have a data frame like below:
data = {'Age': [20, 30, 19, 21],'city1':['ny','nj','ln','tampa'],'country1':['usa','usa','usa','usa'],'city2':['london','edinburg',np.nan,'tampa'],
'country2':['uk','uk','uk','usa'],
'city1':['ny','london',np.nan,np.nan],'country2':['usa','uk',np.nan,np.nan]}
df1=pd.DataFrame(data)
print(df1)
Age city1 country1 city2 country2
0 20 ny usa london usa
1 30 london usa edinburg uk
2 19 NaN usa NaN NaN
3 21 NaN usa tampa NaN
Now I want to create a new data frame where the age
column values are repeated based on the half of the number of columns leaving the age
column. In above data frame leaving age column there are four column who's half is 2. So, the age column values have to be repeated twice. Once the new age column is formed then I need to append city1,
country1
has a row and city2,country2
as a second row(something like shown in the excepted output). Although I was able to repeat the values as list and tried to get the values as a list from the other columns and append as rows like shown below:-
code:-
#for repeating the value.
main_list = np.repeat(df1['Age'],2)
#for getting the column values
r=[]
for i in range(len(df1)):
r.append(df1.iloc[:,1:3].loc[i].values.tolist())
print(r)
[['ny', 'usa'], ['london', 'usa'], [nan, 'usa'], [nan, 'usa']]
but as you see it gives only the values for city1,country1
but not for city2,country2
which is raising an error when appending the list r values as rows to the new data frame as below:-
newdata = {'Age':main_list}
res=pd.DataFrame(newdata)
print(res)
Age
0 20
0 20
1 30
1 30
2 19
2 19
3 21
3 21
res.loc[len(res)] = r
print(res)
ValueError: cannot set a row with mismatched columns
How do I get the excepted list of values and create a data frame like below:-
Excepted output:-
r =[['ny', 'usa'], ['london', 'usa'],['london', 'usa'],['edinburg','uk'],
[nan, 'usa'],[nan,nan],[nan, 'usa'],['tampa',nan]]
Final data frame:-
Age city country
0 20 'ny' 'usa'
0 20 'london' 'usa'
1 30 'london' 'usa'
1 30 'edinburg''uk'
2 19 NaN 'usa'
2 19 NaN NaN
3 21 NaN 'usa'
3 21 'tampa' NaN
CodePudding user response:
You can use wide_to_long
:
(pd
.wide_to_long(df1.reset_index(),
stubnames=['city', 'country'], i=['index', 'Age'], j='id')
.droplevel(-1)
.reset_index('Age')
)
output:
Age city country
index
0 20 ny usa
0 20 london usa
1 30 london usa
1 30 edinburg uk
2 19 NaN usa
2 19 NaN NaN
3 21 NaN usa
3 21 tampa NaN
CodePudding user response:
You could also use the function pivot_longer
provided by janitor
:
import janitor
df1.pivot_longer('Age', names_to = ['.value', 'group'], names_pattern = '(\\D )(\\d )')
Age group city country
0 20 1 ny usa
1 30 1 london usa
2 19 1 NaN usa
3 21 1 NaN usa
4 20 2 london usa
5 30 2 edinburg uk
6 19 2 NaN NaN
7 21 2 tampa NaN