How to write a for-loop/if-statement for a dataframe (integer) column-CodePudding

I have a dataframe with a column of integers that symbolise birthyears. Each row has 20xx or 19xx in it but some rows have only the xx part.

What I wanna do is add 19 in front of those numbers with only 2 "elemets" if the integer is bigger than 22(starting from 0), or/and add 20 infront of those that are smaller or equal to 22.

This is what I wrote;

for x in DF.loc[DF["Year"] >= 2022]:
  x   1900
  if:
    x >= 22 
  else:
    x   2000

You can also change the code completely, I would just like you to maybe explain what exactly your code does.

Thanks for everybody who takes time to answer this.

CodePudding user response：

Instead of iterating through the rows, use where to change the whole column:

y = df["Year"] # just to save typing
df["Year"] = y.where(y > 99, (1900   y).where(y > 22, y   2000))

or indexing:

df["Year"][df["Year"].between(0, 21)]  = 2000
df["Year"][df["Year"].between(22, 99)]  = 1900

or loc:

df.loc[df["Year"].between(0, 21), "Year"]  = 2000
df.loc[df["Year"].between(22, 99), "Year"]  = 1900

CodePudding user response：

You can do it in one line with the apply method.

Example:

df = pd.DataFrame({'date': [2002, 95, 1998, 3, 56, 1947]})
print(df)

   date
0  2002
1    95
2  1998
3     3
4    56
5  1947

Then:

df = df.date.apply(lambda x: x 1900 if (x<100) & (x>22) else (x 2000 if (x<100)&(x<22) else x) )

CodePudding user response：

It is basically what you did, an if inside a for:

new_list_of_years = []
for year in DF.loc[DF["Year"]:
    full_year = year 1900 if year >22 else year 2000
    new_list_of_years.append(full_year)

DF['Year'] = pd.DataFrame(new_list_of_years)

Edit: You can do that with for-if list comprehension also:

DF['Year'] = [year 1900 if year > 22 else year 2000 for year in DF.loc[DF["Year"]]]