I have the next csv file:
A|B|C
1100|8718|2021-11-21
1104|21|
I want to create a dataframe that gives me the date output as follows:
A B C
0 1100 8718 20211121000000
1 1104 21 ""
This means
if C is empty:
put doublequotes
else:
format date to yyyymmddhhmmss (adding 0s to hhmmss)
My code:
df['C'] = np.where(df['C'].empty, df['C'].str.replace('', '""'), df['C'] '000000')
but it gives me the next:
A B C
0 1100 8718 2021-11-21
1 1104 21 0
I have tried another piece of code:
if df['C'].empty:
df['C'] = df['C'].str.replace('', '""')
else:
df['C'] = df['C'].str.replace('-', '') '000000'
OUTPUT:
A B C
0 1100 8718 20211121000000
1 1104 21 0000000
CodePudding user response:
Use dt.strftime
:
df = pd.read_csv('data.csv', sep='|', parse_dates=['C'])
df['C'] = df['C'].dt.strftime('%Y%m%d%H%M%S').fillna('""')
print(df)
# Output:
A B C
0 1100 8718 20211121000000
1 1104 21 ""
CodePudding user response:
A good way would be to convert the column into datetime using pd.to_datetime
with parameter errors='coerce'
then dropping None
values.
import pandas as pd
x = pd.DataFrame({
'one': 20211121000000,
'two': 'not true',
'three': '20211230'
}, index = [1])
x.apply(lambda x: pd.to_datetime(x, errors='coerce')).T.dropna()
# Output:
1
one 1970-01-01 05:36:51.121
three 2021-12-30 00:00:00.000