r - Create multiple dataframes by subsetting one dataframe based on the condition of another dataframe -
suppose have dataframe df1
home.id timeframe_start timeframe_end 2 58960 1476748800 1477353600 4 56862 1474329600 1474934400 6 40482 1454284800 1454889600 8 52105 1476748800 1477353600 10 37244 1476748800 1477353600 12 58213 1476748800 1477353600 14 17734 1458000000 1458604800 16 39786 1458000000 1458604800 18 42613 1458000000 1458604800
then have second dataframe df2 includes same home_ids, has many different instances of (here part of displayed)
home_id property_name timestamp_millis value 1 58960 inside_temperature 1.475849e+12 18.510000 2 58960 inside_temperature 1.475850e+12 19.810000 3 58960 inside_temperature 1.475850e+12 19.630000 4 58960 inside_temperature 1.475850e+12 19.470000 5 58960 inside_temperature 1.475850e+12 19.300000 6 58960 inside_temperature 1.475851e+12 19.470000 2482 58960 boiler_output_temperature 1.476755e+12 55.000000 2483 58960 boiler_output_temperature 1.476755e+12 53.000000 2484 58960 boiler_output_temperature 1.476755e+12 51.000000 2485 58960 boiler_output_temperature 1.476755e+12 47.000000 2486 58960 boiler_output_temperature 1.476755e+12 46.000000 2487 58960 boiler_output_temperature 1.476756e+12 55.000000 2488 58960 boiler_output_temperature 1.476756e+12 58.000000 2489 58960 boiler_output_temperature 1.476756e+12 61.000000
now create every row of df1 dataframe instances of df2 have same id , fulfill condition property name= 'inside_temperature' , timestamp within df1 columns timeframe start , timeframe end.
so results, have created 18 differet dataframes; 1 each instance in df1 - include 'inside temperature' , timestamp values defined in df1.
home_id property_name timestamp_millis value 1 58960 inside_temperature 1.475849e+12 18.510000 2 58960 inside_temperature 1.475850e+12 19.810000 3 58960 inside_temperature 1.475850e+12 19.630000 4 58960 inside_temperature 1.475850e+12 19.470000 5 58960 inside_temperature 1.475850e+12 19.300000 6 58960 inside_temperature 1.475851e+12 19.470000
since don't have dataframes reproduce code, i'd give general suggestion avoid for-loops , data in 1 place.
you can use tidyr , purrr packages.
for example:
# group home.id , nest df1 <- df1 %>% group_by(home.id) %>% nest()
then write function takes home.id , rest of data, apply conditions want filter df2 , give df desired rows.
getdetails <- function(id,data) { # add conditions filter df2 df2 %>% filter(home_id==id & property_name== 'inside_temperature' & timestamp_millis> data$timeframe_start & timestamp_millis< data$timeframe_end ) }
then add column df hold lists, each list has resulting df previous step
df1 <- df1 %>% mutate(all_data=map2(home.id,data,getdetails))
it might need modifications, sth work , give df 18 rows holding info.
Comments
Post a Comment