Home > Software design >  How to write a for-loop/if-statement for a dataframe (integer) column
How to write a for-loop/if-statement for a dataframe (integer) column

Time:03-16

I have a dataframe with a column of integers that symbolise birthyears. Each row has 20xx or 19xx in it but some rows have only the xx part.

What I wanna do is add 19 in front of those numbers with only 2 "elemets" if the integer is bigger than 22(starting from 0), or/and add 20 infront of those that are smaller or equal to 22.

This is what I wrote;

for x in DF.loc[DF["Year"] >= 2022]:
  x   1900
  if:
    x >= 22 
  else:
    x   2000

You can also change the code completely, I would just like you to maybe explain what exactly your code does.

Thanks for everybody who takes time to answer this.

CodePudding user response:

Instead of iterating through the rows, use where to change the whole column:

y = df["Year"] # just to save typing
df["Year"] = y.where(y > 99, (1900   y).where(y > 22, y   2000))

or indexing:

df["Year"][df["Year"].between(0, 21)]  = 2000
df["Year"][df["Year"].between(22, 99)]  = 1900

or loc:

df.loc[df["Year"].between(0, 21), "Year"]  = 2000
df.loc[df["Year"].between(22, 99), "Year"]  = 1900

CodePudding user response:

You can do it in one line with the apply method.

Example:

df = pd.DataFrame({'date': [2002, 95, 1998, 3, 56, 1947]})
print(df)

   date
0  2002
1    95
2  1998
3     3
4    56
5  1947

Then:

df = df.date.apply(lambda x: x 1900 if (x<100) & (x>22) else (x 2000 if (x<100)&(x<22) else x) )
print(df)

   date
0  2002
1  1995
2  1998
3  2003
4  1956
5  1947

CodePudding user response:

It is basically what you did, an if inside a for:

new_list_of_years = []
for year in DF.loc[DF["Year"]:
    full_year = year 1900 if year >22 else year 2000
    new_list_of_years.append(full_year)

DF['Year'] = pd.DataFrame(new_list_of_years)

Edit: You can do that with for-if list comprehension also:

DF['Year'] = [year 1900 if year > 22 else year 2000 for year in DF.loc[DF["Year"]]]
  • Related