Home > other >  Excuse me each bosses: how to df or RDD add a new column since to add
Excuse me each bosses: how to df or RDD add a new column since to add

Time:09-20

As title assumption is currently has a dataframe or transformation of RDD
A, b, c
D, e, f
G, h, I
Now I want to add a column on the
1, a, b, c
2, d, e, f
3, g, h, I
Dataframe or RDD form can be
Could you tell me how to achieve bosses?

CodePudding user response:

There are two ways,
Is a global device (for example, they have the Sequence type node, or get a dialing services constantly produce increasing value as hair), but the efficiency is low,
Two is mapPartition, get the current partition number of partitions, and then a partition number x + current partition coefficient increasing local values, article coefficient is one of the largest data partition number + a certain redundancy,
The most convenient option is the former, the fastest but easy to a problem is the latter,

CodePudding user response:

Additional memory is the most convenient option, but easy to blasting repartition for a partition, there is only one partition increment is global, large amount of data will be OOM

CodePudding user response:

I also need to solve similar problems, please ask the landlord to solve no
  • Related