apache spark - How to cube on two columns as if they were one? -


i have following attributes interested on performing aggregations (regular count example) on attributes:

'category', 'sub-category', age, city, education... (around 10 more) 

i interested in possible combinations of attributes in group by, using dataframes cube function me achieve that.

but here catch: sub-category not make sense without category, in order achieve need combine rollup(category, sub-category) cube(age, city. education...).

how this?

this tried, test name of table:

val data = sqlcontext.sql("select category,'sub-category',age test group cube(rollup(category,'sub-category'), age )") 

and error get:

org.apache.spark.sql.analysisexception: expression 'test.category' neither present in group by, nor aggregate function. add group or wrap in first() (or first_value) if don't care value get.;

i think want struct or expr functions combine 2 columns 1 , use cube on.

with struct it'd follows:

df.rollup(struct("category", "sub-category") "(cat,sub)") 

with expr it's simple using "pure" sql, i.e.

df.rollup(expr("(category, 'sub-category')") "(cat,sub)") 

but i'm guessing...


Comments

Popular posts from this blog

php - How to add and update images or image url in Volusion using Volusion API -

Laravel mail error `Swift_TransportException in StreamBuffer.php line 269: Connection could not be established with host smtp.gmail.com [ #0]` -

c# SetCompatibleTextRenderingDefault must be called before the first -