I have a csv file like the list below (which has thousands of rows);
name,location,time
james,ond,5
conor,aas,2
james,jja,3
elisa,aab,1
mike,sjs,1
elisa,ond,5
elisa,mmm,2
How can I turn this into (basicly make the values in second column (location) as new columns, search in the names for each and make the names don't repeat and put 0 as values on new columns if it is not there. The desired outcome with the code is:
name,ond,aas,jja,aab,sjs,mmm
james,5,0,3,0,0,0
conor,0,2,0,0,0,0
elisa,5,0,0,1,0,2
mike,0,0,0,0,1,0
What I tried so far to do this is to load it twice as two different databases and get a value but it fails all the time giving exactly the same result before.
import pandas as pd
df1 = pd.read_csv("file.csv")
df2 = pd.read_csv("file.csv")
df1['time'] = df2['time'].where(df1[['name','location']].isin(df2).all(axis=1)).fillna('0')
or the below code to extract the values but still does not work:
df1.merge(df2, on=['name','location'], how='left').fillna(0)
CodePudding user response:
You can use pd.crosstab
:
pd.crosstab(index=df['name'], columns=df['location'], values=df['time'], aggfunc=lambda x: x).\
fillna(0).reset_index()
Output:
location name aab aas jja mmm ond sjs
0 conor 0.0 2.0 0.0 0.0 0.0 0.0
1 elisa 1.0 0.0 0.0 2.0 5.0 0.0
2 james 0.0 0.0 3.0 0.0 5.0 0.0
3 mike 0.0 0.0 0.0 0.0 0.0 1.0
Here location
is the column name, you can get rid of it via df.columns.name = None
CodePudding user response:
Maybe this one is the solution you are looking for:
Given the dataframe df
name location time
0 james ond 5
1 conor aas 2
2 james jja 3
3 elisa aab 1
4 mike sjs 1
5 elisa ond 5
6 elisa mmm 2
import pandas as pd
pivot = pd.pivot_table(df, values='time', index='name', columns='location', aggfunc='sum', fill_value=0)
Output:
location aab aas jja mmm ond sjs
name
conor 0 2 0 0 0 0
elisa 1 0 0 2 5 0
james 0 0 3 0 5 0
mike 0 0 0 0 0 1
Note that if you have different values for the same combination of name
and location
, you have to select an appropriate aggfunc
, to properly display the value you need. Ypu can also pass a dictionary
to the aggfunc value, as explained in Discussion on aggfunc