I have a Dataframe df,you can have it by running:
import pandas as pd
data = [10,20,30,40,50,60]
df = pd.DataFrame(data, columns=['Numbers'])
df
now I want to check if df's columns are in an existing list,if not then create a new column and set the column value as 0,column name is same as the value of the list:
columns_list=["3","5","8","9","12"]
for i in columns_list:
if i not in df.columns.to_list():
df[i]=0
How can I code it in one line,I have tried this:
[df[i]=0 for i in columns_list if i not in df.columns.to_list()]
However the IDE return :
SyntaxError: cannot assign to subscript here. Maybe you meant '==' instead of '='?
Any friend can help ?
CodePudding user response:
import numpy as np
import pandas as pd
# Some example data
df = pd.DataFrame(
np.random.randint(10, size=(5, 6)),
columns=map(str, range(6))
)
# 0 1 2 3 4 5
# 0 9 4 8 7 3 6
# 1 6 9 0 5 3 4
# 2 7 9 0 9 0 3
# 3 4 4 6 4 6 4
# 4 6 9 7 1 5 5
columns_list=["3","5","8","9","12"]
# Figure out which columns in your list do not appear in your dataframe
# by creating a new Index and using pd.Index.difference:
df[ pd.Index(columns_list).difference(df.columns, sort=False) ] = 0
# 0 1 2 3 4 5 8 9 12
# 0 9 4 8 7 3 6 0 0 0
# 1 6 9 0 5 3 4 0 0 0
# 2 7 9 0 9 0 3 0 0 0
# 3 4 4 6 4 6 4 0 0 0
# 4 6 9 7 1 5 5 0 0 0
CodePudding user response:
try:
columns_list=["3","5","8","9","12"]
df = df.reindex(
list(
set(
list(df.columns) columns_list
)
),
axis=1,
fill_value=0,
)