data.table - Widening a dataframe to get monthly sums of revenue for all unique values of catogorical columns in R -
i have df has data this:
sub = c("x001","x002", "x001","x003","x002","x001","x001","x003","x002","x003","x003","x002") month = c("201506", "201507", "201506","201507","201507","201508", "201508","201507","201508","201508", "201508", "201508") tech = c("mobile", "tablet", "pc","mobile","mobile","tablet", "pc","tablet","pc","pc", "mobile", "tablet") brand = c("apple", "samsung", "dell","apple","samsung","apple", "samsung","dell","samsung","dell", "dell", "dell") revenue = c(20, 15, 10,25,20,20, 17,9,14,12, 9, 11) df = data.frame(sub, month, brand, tech, revenue)
i want use sub , month key , 1 row every subscriber per month displays sum of revenues unique values in tech , brand subscriber month. example simplified , less columns have huge data set decided try doing data.table
.
i have managed 1 catagorical column, either tech or brand using this:
df1 <- dcast(df, sub + month ~ tech, fun=sum, value.var = "revenue")
but want 2 or more caqtogorical columns, far i've tried this:
df2 <- dcast(df, sub + month ~ tech+brand, fun=sum, value.var = "revenue")
and concatenates unique values of both catogorical columns , sums not want that. wan seperate columns each unique value of catogorical columns.
i'm new r , appreciate help.
(i assume df
data.table
rather data.frame
in example.)
one possible solution first melt
data while keeping sub
, month
, revenue
keys. way, brand
, tech
converted single variable value corresponding each existing combination of keys. way able dcast
operating against single column- in first example
dcast(melt(df, c(1:2, 5)), sub + month ~ value, sum, value.var = "revenue") # sub month pc apple dell mobile samsung tablet # 1: x001 201506 10 20 10 20 0 0 # 2: x001 201508 17 20 0 0 17 20 # 3: x002 201507 0 0 0 20 35 15 # 4: x002 201508 14 0 11 0 14 11 # 5: x003 201507 0 25 9 25 0 9 # 6: x003 201508 12 0 21 9 0 0
as per ops comment, can add prefix adding variable
formula. way, column ordered properly
dcast(melt(df, c(1:2, 5)), sub + month ~ variable + value, sum, value.var = "revenue") # sub month brand_apple brand_dell brand_samsung tech_pc tech_mobile tech_tablet # 1: x001 201506 20 10 0 10 20 0 # 2: x001 201508 20 0 17 17 0 20 # 3: x002 201507 0 0 35 0 20 15 # 4: x002 201508 0 11 14 14 0 11 # 5: x003 201507 25 9 0 0 25 9 # 6: x003 201508 0 21 0 12 9 0
Comments
Post a Comment