Home > Enterprise >  python string manipulation with pandas
python string manipulation with pandas

Time:04-20

I'm trying to do some string manipulations with Pandas and I would deeply appreciate your help! Here's my problem: I loaded a list of words from a csv file into a pandas dataframe called df, so that it looks as follows (here, I created the df manually):

data = {'Keyword': ['Apple', 'Banana', 'Peach', 'Strawberry', 'Blueberry'], 'Kategory': ['A', 'A', 'A', 'B', 'B']}  

df = pd.DataFrame(data) 

Now what I would like to do is some string manipulation based on the following conditions shown below. The output of the string manipulation should be saved to a new column.

# new column to store the results
output = []

# set up the conditions
for Keyword in df:
    if df[Kategory] == 'A':
        output.append(Keyword   'first choice')
        print(Keyword   'first choice')
    else:
        output.append(Keyword   'second choice')        
        print(Keyword   'second choice') 

Thank you very much for your help!!

CodePudding user response:

You can try np.where

df['col'] = np.where(df['Kategory'].eq('A'), df['Keyword'].add(' first choice'), df['Keyword'].add(' second choice'))
print(df)

      Keyword Kategory                       col
0       Apple        A        Apple first choice
1      Banana        A       Banana first choice
2       Peach        A        Peach first choice
3  Strawberry        B  Strawberry second choice
4   Blueberry        B   Blueberry second choice

CodePudding user response:

data = {'Keyword': ['Apple', 'Banana', 'Peach', 'Strawberry', 'Blueberry'], 'Kategory': ['A', 'A', 'A', 'B', 'B']}  

df = pd.DataFrame(data) 
output = []
for idx, rows in df.iterrows():
    if rows['Kategory'] == 'A':
        output.append(rows['Keyword']   " " 'first choice')
        # print(Keyword   'first choice')
    else:
        output.append(rows['Keyword']  " "  'second choice')        
        # print(Keyword   'second choice') 

df['output'] = output
print(df)

Keyword Kategory                    output
0       Apple        A        Apple first choice
1      Banana        A       Banana first choice
2       Peach        A        Peach first choice
3  Strawberry        B  Strawberry second choice
4   Blueberry        B   Blueberry second choice


I have tried to replicate your approach , but you can use np.where , to iterate on a dataframe you have to use index and rows

CodePudding user response:

I imagine your error would be something like key error: Kategory Kategory does not exist. This is because the variable Kategory doesn't actually exist. When accessing keys in dictionaries, you must treat them as strings not variables.

Like this:

# new column to store the results
output = []

# set up the conditions
for Keyword in df:
    if df["Kategory"] == 'A':
        output.append(Keyword   'first choice')
        print(Keyword   'first choice')
    else:
        output.append(Keyword   'second choice')        
        print(Keyword   'second choice') 

Good luck.

  • Related