Home > Enterprise >  How do I remove the values of some cells in the datetime.datetime format? How do I change those valu
How do I remove the values of some cells in the datetime.datetime format? How do I change those valu

Time:12-09

I am working with a document where the values in column A are numbers. When I import it, Python reads the column as a dtype: object, however, I cannot transform the column to Integer of Float because there are some values that are understood as datetime.datetime and not object data type. For example, the number 19.1 when imported in Python, it is read as datetime.datetime(1900, 1, 19, 0, 0).

I already spotted those cases where the data type is datetime.datetime and I know how to find them using the following code:

data.loc[data['A'] == datetime.datetime(1900, 1, 19, 0, 0)]

Nevertheless, I don't know how to access those specific cases to either delete these wrong values or change them for other values, in order to transform later the entire column to dtype: Int.

I tried to do so by using the following codes, but none worked:

data['A']=data['A'].replace(datetime.datetime(1900, 1, 19, 0, 0),'0') 
data['A']=data['A'].replace(datetime(1900, 1, 19, 0, 0),'0').

This is the data frame in the original Excel file:

        a
1      10
2      5
3      19.1
4      15
5      3.8

column dtype: number

This is the data frame imported by python using Pandas:

        a
1      10
2      5
3      1900-01-19 00:00:00
4      15
5      2021-08-03 00:00:00

column dtype: object 

The two cases (3 and 5) data type:

[datetime.datetime(1900, 1, 19, 0, 0),
 datetime.datetime(2021, 8, 3, 0, 0),]

CodePudding user response:

You can use drop to remove and where to replace as in:

import pandas as pd
import datetime

df = pd.DataFrame.from_dict({'a': [1,2,3,datetime.datetime(1900, 1, 19, 0, 0),4,5,datetime.datetime(1900, 1, 19, 0, 0),6]})

print("Original:")
print(df)

df_drop = df.drop(df.loc[df['a'] == datetime.datetime(1900, 1, 19, 0, 0)].index)

print("Rows dropped:")
print(df_drop)

df_replace = df.copy()

df_replace['a'] = df_replace['a'].where(df_replace['a']!=datetime.datetime(1900, 1, 19, 0, 0),-1)

print("Rows replaced:")
print(df_replace)

which results in:

Original:
                     a
0                    1
1                    2
2                    3
3  1900-01-19 00:00:00
4                    4
5                    5
6  1900-01-19 00:00:00
7                    6
Rows dropped:
   a
0  1
1  2
2  3
4  4
5  5
7  6
Rows replaced:
    a
0   1
1   2
2   3
3  -1
4   4
5   5
6  -1
7   6
  • Related