Home > Software design >  How to copy a value at specific number and location of rows in a column in Pandas?
How to copy a value at specific number and location of rows in a column in Pandas?

Time:09-05

I am converting json_response to a dataframe by using the following commands:

df = pd.DataFrame(columns=["created_at", "username", "description", "tweet_id"]) #an empty dataframe to save data

data_nested = pd.json_normalize(json_response['data'])
df_temp = data_nested[["created_at", "username", "description"]].copy()
df = pd.concat([df, df_temp], ignore_index=True)
df.reset_index(inplace=True, drop=True)

Following is my sample json_response:

{
    "data": [
        {
            "created_at": "2020-01-01T12:24:45.000Z",
            "description": "This is a sample description",
            "id": "12345678",
            "name": "Sample Name",
            "username": "sample_name"
        }
    ],
    "meta": {
        "next_token": "sample_token",
        "result_count": 1
    }
}

This response is a result of querying "Retweeted_by" endpoint of Twitter API V2. I am trying to save "tweet_id" against each response in the loop (to understand which resulting row corresponds to which requesting tweet_id) by doing -> df['tweet_id'] = tweet_id. I understand that by using this, last tweet_id will replace everything else in the column.

I tried to do the following as well using index:

idx = df["username"].last_valid_index()
if pd.isnull(idx) or idx is None:
  df.loc[0, "tweet_id"] = tweet_id
else:
  df.loc[idx   1, "tweet_id"] = tweet_id

But this fails as well because if result_count in json_response > 1, it will save tweet_id at the next row leaving previous ones as NaN.

Can someone please suggest a solution? Thank you.

CodePudding user response:

Based on our exchange in the comments here is my proposed solution:

tweet_id_list = [1,2,3] # a list of all of your tweet ids

# here you will start looping through each id, and getting retweets. 
# You could make this async but I would be careful since token limits are very
# strict on twitter. They can disable it if you go over the limit a lot. 

all_dfs=[]
for tweet_id in tweet_id_list:
    response = requests.post("url/tweet_id")
    json_response = json.loads(response.text)

    temp_df = pd.DataFrame.from_records(json_response['data'])
    temp_df['tweet_id'] = tweet_id

    all_dfs.append(temp_df)

# if you want to then have one big table with all the retweets and tweet_ids
# simply do:

df = pd.concat(all_dfs)

Just a bit of explanation.

You are creating a dataframe for each tweet_id retweets (temp_df). You are also creating an extra column in that dataframe called tweet_id. When you assign a value to a dataFrame column it propagates it to each row of said df.

You are then carefully collecting all the dataframes for each tweet_id into a list all_dfs.

After you exit the loop you are left with a list of dataframes. If you want to have one big table you concatenate them as a I have shown in the code.

  • Related