r - How to select a subset of least resembling combinations of a common vector? -
i need select 10 times different combinations of 15 items common set of 30 items, several constraints.
first constraint: each of 10 combinations, need remove 3 items should not present in specific combination. let's wf01, wf02 , wf03 first combination; wf04, wf05 , wf06 second combination , on.
second constraint: on ten combinations, each item should present same number of times (5).
third constraint: ten combinations should different possible 1 (i.e. should avoid sharing common items as possible).
i doing in excel, guess there should more efficient way in r!
here coding far:
# data: 30 items (pictures choose from) picturenames <- c("wf01", "wf02", "wf03", "wf04", "wf05", "wf06", "wf07", "wf08", "wf09", "wf10", "wf11", "wf12", "wf13", "wf14", "wf15", "wf16", "wf17", "wf18", "wf19", "wf20", "wf21", "wf22", "wf23", "wf24", "wf25", "wf26", "wf27", "wf28", "wf29", "wf30") set.seed(10) sample01 <- sample(picturenames[4:30], 15) sample02 <- sample(picturenames[c(1:3, 7:30)], 15) sample03 <- sample(picturenames[c(1:6, 10:30)], 15) sample04 <- sample(picturenames[c(1:9, 13:30)], 15) sample05 <- sample(picturenames[c(1:12, 16:30)], 15) sample06 <- sample(picturenames[c(1:15, 19:30)], 15) sample07 <- sample(picturenames[c(1:18, 22:30)], 15) sample08 <- sample(picturenames[c(1:21, 25:30)], 15) sample09 <- sample(picturenames[c(1:24, 28:30)], 15) sample10 <- sample(picturenames[c(1:27)], 15)
up now, implemented constraints manually in excel (i know…). first, ensured each item present 5 times across 10 different combinations. then, created matrix name of picture row , column. each cell counted how many times pair of pictures present in same combination across 10 combinations. value ranges 0 (pictures never together) 5 (pictures together). then, manually shifting pictures between combinations, managed avoid presence of 0s , 5s in matrix. afterwards, implemented optimisation task, each picture should associated many different pictures possible, while respecting previous constraints. so, each picture calculate standard deviation of column (or row) values. then, calculated standard deviation of these standard deviations. shift reduces value while respecting other constraints retained. managed arrive standard deviation of 0.036, have no idea whether best result can get. appreciated!!!
thanks in advance!
edit
after @roman-luštrik's remark, here new version of code. still horrible, , not close optimising matrix, getting there!
# data: 30 items (pictures choose from) picturenames <- data.frame(names = c("wf01", "wf02", "wf03", "wf04", "wf05", "wf06", "wf07", "wf08", "wf09", "wf10", "wf11", "wf12", "wf13", "wf14", "wf15", "wf16", "wf17", "wf18", "wf19", "wf20", "wf21", "wf22", "wf23", "wf24", "wf25", "wf26", "wf27", "wf28", "wf29", "wf30"), stringsasfactors = false) library(dplyr) set.seed(10) sample01 <- data.frame(names = sort(sample(picturenames$names[c(4:30)], 15))) sample02 <- anti_join(picturenames, sample01) sample03 <- data.frame(names = sort(sample(picturenames$names[c(1:6, 10:30)], 15))) sample04 <- anti_join(picturenames, sample03) sample05 <- data.frame(names = sort(sample(picturenames$names[c(1:12, 16:30)], 15))) sample06 <- anti_join(picturenames, sample05) sample07 <- data.frame(names = sort(sample(picturenames$names[c(1:18, 22:30)], 15))) sample08 <- anti_join(picturenames, sample07) sample09 <- data.frame(names = sort(sample(picturenames$names[c(1:24, 28:30)], 15))) sample10 <- anti_join(picturenames, sample09) allsamples <- data.frame(sample01, sample02, sample03, sample04, sample05, sample06, sample07, sample08, sample09, sample10) colnames(allsamples) <- c("sa", "sb", "sc", "sd", "se", "sf", "sg", "sh", "si", "sj") cooccurmat <- matrix(nrow = length(picturenames$names), ncol = length(picturenames$names), dimnames = c(picturenames, picturenames)) for(j in 1:length(picturenames$names)){ for(i in 1:length(picturenames$names)){ cooccurmat[i,j] <- sum(ifelse(sapply(picturenames, function(x)colsums(xor(allsamples == picturenames$names[j], allsamples == picturenames$names[i]))) == 2, 1, 0)) colsd <- apply(cooccurmat, 2, sd) overallsd <- sd(colsd) } } cooccurmat colsd overallsd
Comments
Post a Comment