How to loop to consecutively go through a list of strings, assign value to each string and return it-CodePudding

Say instead of a dictionary I have these lists:

cities = ('New York', 'Vancouver', 'London', 'Berlin', 'Tokyo', 'Bangkok')

Europe = ('London', 'Berlin')

America = ('New York', 'Vancouver')

Asia = ('Tokyo', 'Bangkok')

I want to create a pd.DataFrame from this such as:

City	Continent
New York	America
Vancouver	America
London	Europe
Berlin	Europe
Tokyo	Asia
Bangkok	Asia

Note: this is the minimum reproductible example to keep it simple, but the real dataset is more like city -> country -> continent

I understand with such a small sample it would be possible to manually create a dictionary, but in the real example there are many more data-points. So I need to automate it.

I've tried a for loop and a while loop with arguments such as "if Europe in cities" but that doesn't do anything and I think that's because it's "false" since it compares the whole list "Europe" against the whole list "cities".

Either way, my idea was that the loops would go through every city in the cities list and return (city continent) for each. I just don't know how to um... actually make that work.

I am very new and I wasn't able to figure anything out from looking at similar questions.

Thank you for any direction!

CodePudding user response：

Problem in your Code:

First of all, let's take a look at a Code Snippet used by you: if Europe in cities: was returned nothing Correct!
It is because you are comparing the whole list [Europe] instead of individual list element ['London', 'Berlin']

Solution:

Initially, I have imported all the important modules and regenerated a List of Sample Data provided by you.

# Import all the Important Modules
import pandas as pd

# Read Data
cities = ['New York', 'Vancouver', 'London', 'Berlin', 'Tokyo', 'Bangkok']
Europe = ['London', 'Berlin']
America = ['New York', 'Vancouver']
Asia = ['Tokyo', 'Bangkok']

Now, As you can see in your Expected Output we have 2 Columns mentioned below:

City [Which is already available in the form of cities (List)]

Continent [Which we have to generate based on other Lists. In our case: Europe, America, Asia]

For Generating a proper Continent List follow the Code mentioned below:

# Make Continent list
continent = []

# Compare the list of Europe, America and Asia with cities  
for city in cities:
    if city in Europe:
        continent.append('Europe')
    elif city in America:
        continent.append('America')
    elif city in Asia:
        continent.append('Asia')
    else:
        pass

# Print the continent list
continent

# Output of Above Code:
['America', 'America', 'Europe', 'Europe', 'Asia', 'Asia']

As you can see we have received the expected Continent List. Now let's generate the pd.DataFrame() from the same:

# Make dataframe from 'City' and 'Continent List`
data_df = pd.DataFrame({'City': cities, 'Continent': continent})

# Print Results
data_df

# Output of the above Code:
    City        Continent
0   New York    America
1   Vancouver   America
2   London      Europe
3   Berlin      Europe
4   Tokyo       Asia
5   Bangkok     Asia

Hope this Solution helps you. But if you are still facing Errors then feel free to start a thread below.

CodePudding user response：

1 : Counting elements

You just count the number of cities in each continent and create a list with it :

cities = ('New York', 'Vancouver', 'London', 'Berlin', 'Tokyo', 'Bangkok')
Europe = ('London', 'Berlin')    
America = ('New York', 'Vancouver')

continent = []
cities = []
for name, cont in zip(['Europe', 'America', 'Asia'], [Europe, America, Asia]):
    continent  = [name for _ in range(len(cont))]
    cities  = [city for city in cont]

df = pd.DataFrame({'City': cities, 'Continent': continent}
print(df)

And this gives you the following result :

        City Continent
0     London    Europe
1     Berlin    Europe
2   New York   America
3  Vancouver   America
4      Tokyo      Asia
5    Bangkok      Asia

This is I think the best solution.

2: With dictionnary

You can create an intermediate dictionnary. Starting from your code

cities = ('New York', 'Vancouver', 'London', 'Berlin', 'Tokyo', 'Bangkok')
Europe = ('London', 'Berlin')    
America = ('New York', 'Vancouver')    
Asia = ('Tokyo', 'Bangkok')

You would do this :

continent = dict()
for cont_name, cont_cities in zip(['Europe', 'America', 'Asia'], [Europe, America, Asia]):
    for city in cont_cities:
        continent[city] = cont_name

This give you the following result :

{
    'London': 'Europe', 'Berlin': 'Europe',
    'New York': 'America', 'Vancouver': 'America',
    'Tokyo': 'Asia', 'Bangkok': 'Asia'
}

Then, you can create your DataFrame :

df = pd.DataFrame(continent.items())
print(df)
           0        1
0     London   Europe
1     Berlin   Europe
2   New York  America
3  Vancouver  America
4      Tokyo     Asia
5    Bangkok     Asia

This solution allows you not to override your cities tuple

CodePudding user response：

I think on the long run you might want to elimninate loops for large datasets. Also, you might need to include more continent depending on the content of your data.

import pandas as pd
continent = {
    '0': 'Europe',
    '1': 'America',
    '2': 'Asia'
}

df= pd.DataFrame([Europe, America, Asia]).stack().reset_index()
df['continent']= df['level_0'].astype(str).map(continent)
df.drop(['level_0','level_1'], inplace=True, axis=1)

You should get this output

     0          continent
0   London      Europe
1   Berlin      Europe
2   New York    America
3   Vancouver   America
4   Tokyo       Asia
5   Bangkok     Asia

Feel free to adjust to suit your use case