I have YouTube video data in a JSON format.
{
"KWWLwotNcTo": {
"id": "KWWLwotNcTo",
"publishedAt": "2022-02-23T03:30:15Z",
"tags": [
"bgmi",
"dynamogaming",
"alphaclasher",
"hydrabts",
"hydraayush",
"hydrawrath",
"hydraemperorgaming",
"hydrahrishav",
"dynovlogs",
"alphavlogs",
"bittyexplore",
"vlogs",
"hydrabootcamp",
"hailhydra",
"hydradanger",
"hydrakanika",
"s8ulvlogs",
"hastervlogs",
"sauravjoshivlog",
"mumbaikarnikhil",
"tanmaybhatt",
"hydravss8ul",
"mortal",
"k18vlogs",
"8bitthug",
"8bitgoldy",
"8bitmamba",
"8bitmercy",
"karansehgal",
"akashdodeja",
"mohitchiikara",
"travelingdesi",
"MSK",
"flyingbeast",
"hydrabootcampvideo",
"hydrabagheera",
"ghatakgaming",
"GodLlollz",
"GodLkrontron",
"payalgaming",
"regaltoss",
"ghatakcontroversy",
"TxContro"
],
"categoryId": "24",
"duration": "PT13M58S",
"viewCount": "91677",
"likeCount": "24724",
"commentCount": "437",
"topicCategories": [
"https://en.wikipedia.org/wiki/Food",
"https://en.wikipedia.org/wiki/Lifestyle_(sociology)"
]
},
"PC_pAgJopIA": {
"id": "PC_pAgJopIA",
"publishedAt": "2021-08-27T14:00:45Z",
"tags": [
"polymars",
"game dev challenge",
"$1000",
"best game wins $1000",
"pygame",
"pygame jam",
"$1000 game dev challenge",
"gmtk",
"game jam",
"gmtk game jam",
"game dev",
"indie dev",
"24 hours",
"i made a game",
"two birds one stone",
"buckshot",
"barji",
"miziziziz",
"learning pygame",
"pygame game",
"indie game devlog",
"winner gets $1000",
"$1000 game jam",
"2 game devs vs $1000",
"game making challenge",
"$1000 game making challenge",
"$1000 gamedev challenge",
"gamedev",
"best c game wins $1000 - game making challenge"
],
"categoryId": "28",
"duration": "PT15M4S",
"viewCount": "546503",
"likeCount": "16851",
"commentCount": "728",
"topicCategories": ["https://en.wikipedia.org/wiki/Video_game_culture"]
},
"isAFtqGHz6Y": {
"id": "isAFtqGHz6Y",
"publishedAt": "2019-03-16T01:10:24Z",
"tags": [
"python telugu tutorial",
"python telugu",
"python tutorial",
"python tutorial in telugu",
"python introduction telugu",
"python introduction in telugu",
"python tutorial telugu web guru",
"python tutorial by telugu web guru",
"python tutorial by santosh",
"learn python",
"learn python programming",
"what is python",
"applications of python",
"learn python in telugu",
"core python programming",
"how to install python",
"how to execute python programs",
"how to run python programs"
],
"categoryId": "27",
"duration": "PT27M7S",
"viewCount": "670671",
"likeCount": "15805",
"commentCount": "1951",
"topicCategories": ["https://en.wikipedia.org/wiki/Knowledge"]
},
"I2wURDqiXdM": {
"id": "I2wURDqiXdM",
"publishedAt": "2018-07-07T02:16:12Z",
"tags": [
"howCode",
"how",
"code",
"howcode.org",
"howco.de",
"python",
"learn",
"5 minutes"
],
"categoryId": "27",
"duration": "PT6M41S",
"viewCount": "610607",
"likeCount": "25114",
"commentCount": "871",
"topicCategories": ["https://en.wikipedia.org/wiki/Knowledge"]
},
"qzyVMhAW9FQ": {
"id": "qzyVMhAW9FQ",
"publishedAt": "2021-08-21T10:48:05Z",
"tags": ["simplified learner", "python"],
"categoryId": "27",
"duration": "PT1M",
"viewCount": "1076470",
"likeCount": "84993",
"commentCount": "782",
"topicCategories": ["https://en.wikipedia.org/wiki/Knowledge"]
},
"WvhQhj4n6b8": {
"id": "WvhQhj4n6b8",
"publishedAt": "2019-10-22T16:30:02Z",
"tags": [
"yt:cc=on",
"what is python",
"python programming",
"python programming for beginners",
"what is python programming",
"python programming tutorial",
"python (programming language)",
"python for beginners",
"introduction to python",
"python basics",
"python training videos",
"python training",
"python tutorial",
"python tutorial for beginners",
"python programming language",
"learn python programming",
"why python",
"python edureka",
"python tutorial edureka",
"learn python from scratch",
"edureka",
"edureka python"
],
"categoryId": "27",
"duration": "PT9M37S",
"viewCount": "678497",
"likeCount": "16669",
"commentCount": "150",
"topicCategories": ["https://en.wikipedia.org/wiki/Knowledge"]
}
}
I have read the data in a dataframe and extracted all the tags as a list.
df = pd.read_json('video_relevant_data.json', orient='index')
all_tags = list(set(",".join(df['tags'].apply(lambda x: ",".join(x)).to_list()).split(",")))
Using the tags I have created new columns
df_tags_as_columns = pd.concat([df, pd.DataFrame(columns=all_tags)])
Now, I want to fill the columns such that value should be set to 1 if the tag is present in the list of tags.
Example:
let's say in row 1, tags has the following list of tags,
['python', 'django']
and I have the following columns
tags python django flask
0 ['python', 'django'] nan nan nan
I want the output to be
tags python django flask
0 ['python', 'django'] 1 1 nan
CodePudding user response:
import pandas as pd
items = ['python', 'django', 'flask']
df['check'] = df['tags'].apply(lambda x: map(str, [int(item in x) for item in items])).apply(';'.join)
pd.DataFrame(df["check"].str.split(';', expand=True).values, columns=items)
or
for item in items:
df[item] = df['tags'].apply(lambda x: item in x).map(int)
CodePudding user response:
Based on J. Doe's initial response, this is the solution I am using:
df['check'] = df['tags'].apply(lambda x: ';'.join([str(item in x) for item in all_tags]))
tag_present_df = pd.DataFrame(df['check'].str.split(";", expand=True).values, columns=all_tags)
df.reset_index(inplace=True)
pd.concat([df, tag_present_df], axis=1)