(I'm fairly new to Python and completely new to Pandas.)
I have software usage data in a tab-separated txt file like this:
IP_Addr Date Col2 Version Col4 Col5 Lang Country
160.86.229.29 2021-11-01 00:00:14.919 9.6 337722669 3 ja JPN
154.28.188.105 2021-11-01 00:00:19.774 9.7 480113424 3 de DEU
154.6.16.129 2021-11-01 00:00:52.460 9.0 3278201755 2 en USA
218.45.244.124 2021-11-01 00:01:33.853 9.7 1961440872 2 ja JPN
178.248.141.33 2021-11-01 00:01:51.114 9.5 2795265301 2 en EST
The DataFrame is imported correctly, and groupby methods like this work all right:
df.IP_Addr.groupby(df.Country).nunique()
However, when I'm trying to create a pivot table with this line:
country_and_lang = df.pivot_table(index=df.Country, columns=df.Lang, values=df.IP_Addr, aggfunc=df.IP_Addr.count)
I get
KeyError: '160.86.229.29'
where the "key" is the first IP value - which should not be used as a key at all.
What am I doing wrong?
CodePudding user response:
Use column names instead values:
country_and_lang = df.pivot_table(index='Country', columns='Lang',
values='IP_Addr', aggfunc='count')
print(country_and_lang)
# Output
Lang de en ja
Country
DEU 1.0 NaN NaN
EST NaN 1.0 NaN
JPN NaN NaN 2.0
USA NaN 1.0 NaN
Or use pd.crosstab
:
country_and_lang = pd.crosstab(df['Country'], df['Lang'],
df['IP_Addr'], aggfunc='count')
print(country_and_lang)
# Output
Lang de en ja
Country
DEU 1.0 NaN NaN
EST NaN 1.0 NaN
JPN NaN NaN 2.0
USA NaN 1.0 NaN