I have a random dataframe with multi index as such:
import numpy as np
from itertools import product
import pandas as pd
c1 = np.arange(3,5,1)
c2 = np.arange(7,9,1)
c3 = np.arange(0,135,45)
df= pd.DataFrame(list(product(c1, c2, c3)), columns=['c1', 'c2','c3'])
df['c4'] = df.index
df = df.set_index(['c1', 'c2','c3'])
When I save the dataframe to csv, I get a csv with duplicate values within the MultiIndex c1,c2,c3. I want to have only the unique values of c1, c2 occuring once in the csv file since they all occur successively. How can I mask these values in Pandas before saving it to csv?
CodePudding user response:
You can mask before write to_csv
notice here no need set_index
df.c2.mask(df.duplicated(['c1','c2']),'',inplace=True)
df.c1.mask(df.duplicated('c1'),'',inplace=True)
df
Out[415]:
c1 c2 c3 c4
0 3 7 0 0
1 45 1
2 90 2
3 8 0 3
4 45 4
5 90 5
6 4 7 0 6
7 45 7
8 90 8
9 8 0 9
10 45 10
11 90 11