String content become random integer after using append()-CodePudding

I'm writing a function to filter tweet data that contains search word. Here's my code:

def twitter_filter(df, search):
  coun = 0
  date_ls = []
  id_ls = []
  content_ls = []
  lan_ls = []
  name_ls = []
  retweet_ls = []
  cleaned_tweet_ls = []

  for i, row in df.iterrows():
    if search in row.cleaned_tweet:
      date_ls.append(row.date)
      id_ls.append(row.id)
      content_ls.append(row.content)
      lan_ls.append(row.language)
      name_ls.append(row.name)
      retweet_ls.append(row.retweet)
      cleaned_tweet_ls.append(row.cleaned_tweet)
      
  new_dict = {
      "date": date_ls,
      "id": id_ls,
      "content": content_ls,
      "lan" : lan_ls,
      "name" : name_ls,
      "retweet" : retweet_ls,
      "cleaned_tweeet": cleaned_tweet_ls,

  }
  new_df = pd.DataFrame(new_dict)
  return new_df

Before filter:

cleandf['name']
Out[6]: 
0            PryZmRuleZZ
1         Arbitration111
2                4kjweed
3         THEREALCAMOJOE
5              DailyBSC_
     
130997     Rabbitdogebsc
130999          gmtowner
131000    topcryptostats
131001     vGhostvRiderv
131002          gmtowner
Name: name, Length: 98177, dtype: object

After filter, user's name becomes random integer:

cleanedogetweet['name']
Out[7]: 
0             3
1             5
2             9
3            12
4            34
 
80779    130997
80780    130999
80781    131000
80782    131001
80783    131002
Name: name, Length: 80784, dtype: int64

This problem only happened in user's name columns, other columns that contains string are ok.

I expected to remain the original user name, how can i solve the problem ?

CodePudding user response：

In pandas dataframes, each row has an attribute called name.

You can use the name attribute to get the name of the row. By default, the name of the row is the index of the row.

So it's better that your column name would not be name because it will conflict with the name attribute of the row.

You can use the rename method to rename the column name and use another name like username, or you can change your function to this:

def twitter_filter(df, search):
coun = 0
date_ls = []
id_ls = []
content_ls = []
lan_ls = []
name_ls = []
retweet_ls = []
cleaned_tweet_ls = []

for i, row in df.iterrows():
    if search in row.cleaned_tweet:
        date_ls.append(row['date'])
        id_ls.append(row['id'])
        content_ls.append(row['content'])
        lan_ls.append(row['language'])
        name_ls.append(row['name'])
        retweet_ls.append(row['retweet'])
        cleaned_tweet_ls.append(row['cleaned_tweet'])

new_dict = {
    "date": date_ls,
    "id": id_ls,
    "content": content_ls,
    "lan": lan_ls,
    "user_name": name_ls,
    "retweet": retweet_ls,
    "cleaned_tweeet": cleaned_tweet_ls,

}
new_df = pd.DataFrame(new_dict)
return new_df