im struggeling to get the results from my tweepy count into usabale format.
Im using the elevated Twitter access and the tweepy package.
counts = tweepy.Paginator(
client.get_all_tweets_count,
query=query, start_time=start_time,
end_time=end_time,
granularity='day') time.sleep(1)"
tweet_count = []
for count in counts:
tweet_count.append(count.data)
Im getting as a result my Dataframe as a list of dictionaries.
The result is looking like:
Basically i have a output which is like:
data = [[{'end': '2020-12-01T00:00:00.000Z', 'start': '2020-11-30T00:00:00.000Z', 'tweet_count': 5780}, {'end': '2020-12-02T00:00:00.000Z', 'start': '2020-12-01T00:00:00.000Z', 'tweet_count': 3093}, {'end': '2020-12-03T00:00:00.000Z', 'start': '2020-12-02T00:00:00.000Z', 'tweet_count': 7379},...}]]
How can i get a nice DataFrame with pandas for example which looks like:
Start End Count
0 2020-11-30T00:00:00.000Z 2020-12-01T00:00:00.000Z 5780
1 2020-12-01T00:00:00.000Z 2020-12-02T00:00:00.000Z 3093
2 ... ... ...
Im not that good in python but i guess i miss something with the format in list with dictionaries inside.
CodePudding user response:
Use chain.from_iterable
to flatten the nested list then use pandas's DataFrame()
constructor to build your dataframe.
from itertools import chain
data = list(chain.from_iterable(data))
df = pd.DataFrame(data).reindex(['start', 'end', 'tweet_count'], axis=1)
# Output :
print(df)
start end tweet_count
0 2020-11-30T00:00:00.000Z 2020-12-01T00:00:00.000Z 5780
1 2020-12-01T00:00:00.000Z 2020-12-02T00:00:00.000Z 3093
2 2020-12-02T00:00:00.000Z 2020-12-03T00:00:00.000Z 7379