apache spark - How to cube on two columns as if they were one? -

i have following attributes interested on performing aggregations (regular count example) on attributes:

'category', 'sub-category', age, city, education... (around 10 more)

i interested in possible combinations of attributes in group by, using dataframes cube function me achieve that.

but here catch: sub-category not make sense without category, in order achieve need combine rollup(category, sub-category) cube(age, city. education...).

how this?

this tried, test name of table:

val data = sqlcontext.sql("select category,'sub-category',age test group cube(rollup(category,'sub-category'), age )")

and error get:

org.apache.spark.sql.analysisexception: expression 'test.category' neither present in group by, nor aggregate function. add group or wrap in first() (or first_value) if don't care value get.;

i think want struct or expr functions combine 2 columns 1 , use cube on.

with struct it'd follows:

df.rollup(struct("category", "sub-category") "(cat,sub)")

with expr it's simple using "pure" sql, i.e.

df.rollup(expr("(category, 'sub-category')") "(cat,sub)")

but i'm guessing...

Search This Blog

Alcombright

apache spark - How to cube on two columns as if they were one? -

Comments

Post a Comment

Popular posts from this blog

c# SetCompatibleTextRenderingDefault must be called before the first -

c++ - Fill runtime data at compile time with templates -

C#.NET Oracle.ManagedDataAccess ConfigSchema.xsd -