I´m using pandasql library in Python to make an special query of DataFrames, I trying to make a new row number column from the combination of some columns. I have the next example:
tabla_1= pd.DataFrame(['a','b','c'],columns=['letras'])
tabla_2= pd.DataFrame([1,2],columns=['numeros'])
tabla_3= pd.DataFrame(['rojo','verde'],columns=['colores'])
pysqldf = lambda q: sqldf(q, globals())
# Here is my query that doesn't work
q = '''
SELECT
letras,
numeros,
colores,
ROW_NUMBER() OVER (PARTITION BY (letras||numeros))
FROM
tabla_1 CROSS JOIN tabla_2 CROSS JOIN tabla_3
'''
pysqldf(q)
The table without the row number looks like this:
And the table with the new row number should look like this:
Row number should create a different number depending on the combinatio of the columns 'letras' and 'numeros'.
I appreciate the help!
CodePudding user response:
No need to use pandasql, you can easily do this using builtin pandas methods:
# Do a cross join
df = tabla_1.merge(tabla_2, how='cross').merge(tabla_3, how='cross')
# Do a groupby/partition and use ngroup to assign group numbers
df['rownum'] = df.groupby(['letras', 'numeros']).ngroup() 1
Result
letras numeros colores rownum
0 a 1 rojo 1
1 a 1 verde 1
2 a 2 rojo 2
3 a 2 verde 2
4 b 1 rojo 3
5 b 1 verde 3
6 b 2 rojo 4
7 b 2 verde 4
8 c 1 rojo 5
9 c 1 verde 5
10 c 2 rojo 6
11 c 2 verde 6