argument of type "float" is not iterable when trying to use for loop-CodePudding

I have a countrydf as below, in which each cell in the country column contains a list of the countries where the movie was released.

countrydf

id  Country            release_year
s1  [US]                 2020
s2  [South Africa]       2021
s3  NaN                  2021
s4  NaN                  2021
s5  [India]              2021

I want to make a new df which look like this:

country_yeardf

Year    US   UK    Japan  India 
1925    NaN  NaN   NaN    NaN
1926    NaN  NaN   NaN    NaN
1927    NaN  NaN   NaN    NaN
1928    NaN  NaN   NaN    NaN

It has the release year and the number of movies released in each country. My solution is that: with a blank df like the second one, run a for loop to count the number of movies released and then modify the value in the cell relatively.

countrylist=['Afghanistan', 'Aland Islands', 'Albania', 'Algeria', 'American Samoa', 'Andorra', 'Angola', 'Anguilla', 'Antarctica', ….]
for x in countrylist:
    for j in  list(range(0,8807)):
        if x in countrydf.country[j]:
            t=int (countrydf.release_year[j] )
            country_yeardf.at[t, x] = country_yeardf.at[t, x] 1

an error occurred which read:

TypeError                                 Traceback (most recent call last)
<ipython-input-25-225281f8759a> in <module>()
      1 for x in countrylist:
      2  for j in li:
----> 3     if x in countrydf.country[j]:
      4         t=int(countrydf.release_year[j])
      5         country_yeardf.at[t, x] = country_yeardf.at[t, x] 1

TypeError: argument of type 'float' is not iterable

I don’t know which one is of float type here, I have check the type of countrydf.country[j] and it returned int. I was using pandas and I am just getting started with it. Can anyone please explain the error and suggest a solution for a df that I want to create? P/s: my English is not so good so hop you guys understand.

CodePudding user response：

Here is a solution using groupby

df = pd.DataFrame([['US', 2015], ['India', 2015], ['US', 2015], ['Russia', 2016]], columns=['country', 'year'])

country year
0   US  2015
1   India   2015
2   US  2015
3   Russia  2016

Now just groupby country and year and unstack the output:

df.groupby(['year', 'country']).size().unstack()
country India   Russia  US
year            
2015    1.0 NaN 2.0
2016    NaN 1.0 NaN

CodePudding user response：

Some alternative ways to achieve this in pandas without loops.

If the Country Column have more than 1 value in the list in each row, you can try the below:

>>df['Country'].str.join("|").str.get_dummies().groupby(df['release_year']).sum()

              India  South Africa  US
release_year                         
2020              0             0   1
2021              1             1   0

Else if Country has just 1 value per row in the list as you have shown in the example, you can use crosstab

>>pd.crosstab(df['release_year'],df['Country'].str[0])

Country       India  South Africa  US
release_year                         
2020              0             0   1
2021              1             1   0