apache spark - How to cube on two columns as if they were one? -
i have following attributes interested on performing aggregations (regular count example) on attributes:
'category', 'sub-category', age, city, education... (around 10 more)
i interested in possible combinations of attributes in group by, using dataframes cube function me achieve that.
but here catch: sub-category not make sense without category, in order achieve need combine rollup(category, sub-category) cube(age, city. education...).
how this?
this tried, test name of table:
val data = sqlcontext.sql("select category,'sub-category',age test group cube(rollup(category,'sub-category'), age )")
and error get:
org.apache.spark.sql.analysisexception: expression 'test.
category
' neither present in group by, nor aggregate function. add group or wrap in first() (or first_value) if don't care value get.;
i think want struct
or expr
functions combine 2 columns 1 , use cube
on.
with struct
it'd follows:
df.rollup(struct("category", "sub-category") "(cat,sub)")
with expr
it's simple using "pure" sql, i.e.
df.rollup(expr("(category, 'sub-category')") "(cat,sub)")
but i'm guessing...
Comments
Post a Comment