Mark Christopher Adkins
York University
About Myself:
I'm a fourth year Phd student at York University in the Department of Psychology in the Quantitative Methods area. My research interests are loosely about open science practices, statistical pedagogy, Monte Carlo Simulations, effect size estimates, and integrating technology into classrooms. I've taught introduction to using R and the Tidyverse short courses, as well as workshops on data cleaning and preregistration.
About Myself:
I'm a fourth year Phd student at York University in the Department of Psychology in the Quantitative Methods area. My research interests are loosely about open science practices, statistical pedagogy, Monte Carlo Simulations, effect size estimates, and integrating technology into classrooms. I've taught introduction to using R and the Tidyverse short courses, as well as workshops on data cleaning and preregistration.
About the Creator:
Phil Chalmers is an Assistant Professor at York University in the Department of Psychology in the Quantitative Methods area, and an Associate Coordinator in the Statistical Consulting Service. He is an author of multiple R packages hosted on CRAN (e.g., mirt, mirtCAT, SimDesign, matlib), and focuses on research pertaining to psychological measurement, latent variable modelling and computational statistics.
MCS studies are computer-driven experimental investigations in which certain parameters, such as population means and standard deviations that are known a priori, are used to generate random (but plausible) sample data.
Mooney, C. Z. (1997), Monte Carlo Simulations, Thousand Oaks, CA: Sage
Photo by Rob Curran on Unsplash and Mika Baumeister on Unsplash
Photo by Rob Curran on Unsplash and Mika Baumeister on Unsplash
Papers are just advertising for the real work.
Simulations are not much different than empirical research (in fact MCS are a kind of empirical research).
Simulations are not much different than empirical research (in fact MCS are a kind of empirical research).
Without each of these pieces, results can be inconclusive, or worse yet misleading.
Simulations are not much different than empirical research (in fact MCS are a kind of empirical research).
Without each of these pieces, results can be inconclusive, or worse yet misleading.
I also recommend you check out "Episode 23: Carlo. Monte Carlo" of the Quantitude podcast on Monte Carlo Simulations hosted by Patrick Curran and Greg Hancock.
https://quantitudepod.org/listen/
To paraphrase my favourite part of that podcast.
Simulations do not replace thinking
To paraphrase my favourite part of that podcast.
Simulations do not replace thinking
That's it!
To paraphrase my favourite part of that podcast.
Simulations do not replace thinking
That's it!
That's the slide!
Designing a MCS follows the same principles as any other empirical research Just because you are simulating data doesn't hurt our ability to make inferences. If anything, MCS results can help setup anchor points or boundaries for understanding particular phenomena. We can infer that when a particular set of assumptions are met, we have strong evidence and we can see as assumption start to drift, how estimates (or whatever your outcomes are) are affected
What if we wanted to explore aggregate differences between two populations: a wait-listed control group and an intervention group.
More specifically, we want to examine how
What if we wanted to explore aggregate differences between two populations: a wait-listed control group and an intervention group.
More specifically, we want to examine how
I can't stress enough how important it is to have a well articulated research question IN ADVANCE of actually coding.
As with any statistical test/model, there are many assumptions involved, and the beauty of MCS is that you have control over the data generation so you can ensure that certain assumptions are satisfied, while systematically varying the degree of violation for others
The levels of these factors must also carry substantive meaning and require thought and a defensible justification.
You might start with the ends as anchor points, and then create even intervals between these anchors.
The levels of these factors must also carry substantive meaning and require thought and a defensible justification.
You might start with the ends as anchor points, and then create even intervals between these anchors.
Statistical power is the probability that a test will correctly reject the null hypothesis.
Type I errors occur when a test incorrectly rejects the null hypothesis (aka False Positive)
The SimDesign package uses a novel approach to conducting MCSs.
It functionally separates the Monte Carlo simulation into three distinct pieces:
R
replications for each conditionThe SimDesign
package emulates the theoretical separation typically found in empirical research where the analysis stage is agnostic to the data generation process.
Design
objectGenerate()
, Analyse()
, and Summarise()
functionsrunSimulation()
functionWe begin by running the following code in the R console.
# running this code in the console will create# a skeleton for our MCSSimDesign::SimFunctions(file = "mcs", comments = TRUE)
We begin by running the following code in the R console.
# running this code in the console will create# a skeleton for our MCSSimDesign::SimFunctions(file = "mcs", comments = TRUE)
comments = TRUE
will insert a few helpful comments throughout the skeleton of our template.
If you are using Rstudio, the file will be opened automatically. The file will also be saved into the current working directory (which if you setup your R project and opened it in RStudio will be the root directory of the project)
Design
objectThe design object is how SimDesign keeps your simulation organized and makes it easy to add levels to existing factors, or to even add new factors.
### Define design conditionsDesign <- createDesign(factor1 = NA, factor2 = NA)
The resulting Design
object is a tibble/data.frame has which is a fully crossed design object of all levels of your factors.
Each row is a unique combination of the levels of your factors
pick meaningful factor names
pick meaningful factor names
Start small
pick meaningful factor names
Start small
include all of the relevant factors which pertain to the generation, analysis, or summarization of your data
pick meaningful factor names
Start small
include all of the relevant factors which pertain to the generation, analysis, or summarization of your data
the Design
object is a tibble, and can be manipulated and using the dplyr "verb" functions
createDesign()
function and then runSimulation()
Using our research question, update the createDesign()
function in the 'mcs.R' file we created to include any relevant factors (including any new levels you think might be appropriate).
Feel free to add additional factors/levels which might be interesting to you.
Using our research question, update the createDesign()
function in the 'mcs.R' file we created to include any relevant factors (including any new levels you think might be appropriate).
Feel free to add additional factors/levels which might be interesting to you.
What did you come up with?
Were there any new factors (or new levels of old factors)?
### Define design conditionsDesign <- createDesign(mean_diff = c(0), sample_size = c(30), var_ratio = c(1))
Notice, I included all relevant factors (for either my generation or analysis stage), and one initial level.
This first level can be a good place to use as a baseline or reference for the performance of the other design conditions. Ideally, the levels of this factor should be "well-behaved" because we are just setting up our simulation for the first time.
Remember, just like empirical research, this is an iterative process and we have to start somewhere.
Generate()
Next, let's breakdown the anatomy of the Generate()
function:
Like every R function, it needs a name (which is provided), arguments (which are provided), and a body (which is the hard part).
condition
Design
object we created in step 2fixed_objects
Generate <- function(condition, fixed_objects = NULL) { # Define data generation code ... # Return a vector, matrix, data.frame, or list dat <- data.frame() dat}
source()
to keep you code organized and easier to debugfixed_objets
Generate()
SimDesign was deliberately designed to be flexible in terms of data structures passed between the core set of functions. Most times, a data.frame/tibble will be sufficient, but you can return any type of object you need.
Generate()
SimDesign was deliberately designed to be flexible in terms of data structures passed between the core set of functions. Most times, a data.frame/tibble will be sufficient, but you can return any type of object you need.
Whatever you decide, remember that the dat
object is what gets passed onto the Analyse()
function, so knowing the type of data structure you are passing makes it easier to unpack and use the data in an efficient manner.
Generate()
SimDesign was deliberately designed to be flexible in terms of data structures passed between the core set of functions. Most times, a data.frame/tibble will be sufficient, but you can return any type of object you need.
Whatever you decide, remember that the dat
object is what gets passed onto the Analyse()
function, so knowing the type of data structure you are passing makes it easier to unpack and use the data in an efficient manner.
For our current example, we know that we are going to conduct some form of a t-test using this data, so I will demonstrate how to create some data in 'long' format.
There is a handy function within the SimDesign package which similar to base R's attach()
Attach(condition)
This function extracts any object contained in a list
object (e.g., data.frames or tibbles) and places them into the environment in which the function was called
Benefits:
Design
objectThink carefully about how data generation should change as a function of the current condition of the simulation.
Think carefully about how data generation should change as a function of the current condition of the simulation.
Example:
For every row in our Design
object, data are always sampled from a normal distribution, but the center of that distribution is defined using the current level of a factor.
# generate datarnorm(n = 30, mean = 0, sd= 1)
Attach(condition)# generate datarnorm(n = 30, mean = mean_diff, sd = 1)
Does the current level of a factor alter how a particular function should perform, or does it indicate that an entirely different sequence of code be used?
i.e., can we just change a function argument or do we need to use logical branching
Logical branching might be needed instead to ensure that only the appropriate code is run continent on the current condition
Example:
Perhaps, we have another factor ("dist") in our Design
object that contains information about which distribution to use for sampling
# generate datarnorm(n = 30, mean = 0, sd= 1)
Attach(condition)if (dist == "normal") { # use a normal distribution rnorm(n = 30, mean = 0, sd= 1)} else if (dist == "t") { # use a t distribution rt(n = 30, df = 35)}
rnorm()
rt()
rbinom()
rHeadrick()
rint()
rinvWishart()
rmgh()
rmvnorm()
rmvt()
rtruncate()
rValeMaurelli()
{extraDistr} (Wolodzko, 2019) for additional optimized distribution functions outside base R
Use the factor levels from condition
to generate two normally distributed samples and save them into a tibble/data.frame to be used later to conduct a t-test.
# open the help documentation for the t.test() and rnorm() functions?t.test?rnorm
condition
objectdat
Use the factor levels from condition
to generate two normally distributed samples and save them into a tibble/data.frame to be used later to conduct a t-test.
# open the help documentation for the t.test() and rnorm() functions?t.test?rnorm
condition
objectdat
Generate()
Generate <- function(condition, fixed_objects = NULL) { # Define data generation code ... # expose the condition levels Attach(condition) # define control group variance in case I need to change it control_var <- 1 # create two vectors: group, and score using all # of the variables from the current condition # (sample_size, mean_diff, and var_ratio) dat <- tibble(group = rep(x = c("Group 1","Group 2"), times = c(sample_size,sample_size)), score = c(rnorm(n = sample_size, sd = sqrt(control_var)), rnorm(n = sample_size, mean = mean_diff, sd = sqrt(control_var*var_ratio)))) # Return a vector, matrix, data.frame, or list dat}
Generate()
Remember, the Generate()
function and Design
object are just regular functions and tibbles. So we can test them out in the console
# test the first conditionDesign[1,] %>% Generate() %>% group_by(group) %>% summarise(mean = mean(score), sample_size = n(), var = var(score))
## # A tibble: 2 x 4## # Groups: group [2]## group mean sample_size var## <chr> <dbl> <int> <dbl>## 1 Group 1 0.0897 30 1.06 ## 2 Group 2 -0.190 30 0.607
Analyse()
Next, we need to analyze our data. The Analyse()
function from our skeleton template should look like this:
Analyse <- function(condition, dat, fixed_objects = NULL) { # Run statistical analyses of interest ... # Return a named vector or list ret <- c(stat1 = NaN, stat2 = NaN) ret}
Analyse()
Next, we need to analyze our data. The Analyse()
function from our skeleton template should look like this:
Analyse <- function(condition, dat, fixed_objects = NULL) { # Run statistical analyses of interest ... # Return a named vector or list ret <- c(stat1 = NaN, stat2 = NaN) ret}
Keep in mind the steps we want this function to complete:
Generate()
function (which is passed in as the dat
object)ret
by default)Notice that this core function is internally consistent with the Generate()
function. The first argument is condition
, and this function also has access to objects (if any) that were passed in via the fixed_objects
list.
Similar in spirit to the Generate()
function, think about which (if any) of your factors might alter the analysis performed.
It is possible that your research question pertains only to the generative mechanisms, and the analysis will remain unchanged (like our current example).
Similar in spirit to the Generate()
function, think about which (if any) of your factors might alter the analysis performed.
It is possible that your research question pertains only to the generative mechanisms, and the analysis will remain unchanged (like our current example).
Do our factors require logical branching or simply using function arguments to achieve the desired results?
Similar in spirit to the Generate()
function, think about which (if any) of your factors might alter the analysis performed.
It is possible that your research question pertains only to the generative mechanisms, and the analysis will remain unchanged (like our current example).
Do our factors require logical branching or simply using function arguments to achieve the desired results?
One big tip here is to never use magic numbers (i.e., numbers intended to be used as constants without explanation).
Similar in spirit to the Generate()
function, think about which (if any) of your factors might alter the analysis performed.
It is possible that your research question pertains only to the generative mechanisms, and the analysis will remain unchanged (like our current example).
Do our factors require logical branching or simply using function arguments to achieve the desired results?
One big tip here is to never use magic numbers (i.e., numbers intended to be used as constants without explanation).
Taking this one step further, create objects to define and hold these values at the beginning of the function in case you want to systematically vary them using new factors in our Design
object later.
These can be hard to foresee, but it does build in easier extensibility
Using the data object, dat
, conduct a t-test and extract out the p-value (and any other statistics you feel might be important).
Using the data object, dat
, conduct a t-test and extract out the p-value (and any other statistics you feel might be important).
Analyse()
Here is one possible solution you could try:
Analyse <- function(condition, dat, fixed_objects = NULL) { # build in extensiblity equal_variances <- FALSE # run the t-test test_results <- t.test(formula = score ~ group, data = dat, var.equal = equal_variances) # extract the important values from the t-test ret <- broom::tidy(test_results) # fix names of ret to be more meaningful colnames(ret)[c(1,2,3,6)] <- c("est_mean_differnce", "mean_1", "mean_2", "df") ret}
Analyse()
We could also just extract the p value like this:
ret <- c(p.value = test_results$p.value)
Analyse()
We could also just extract the p value like this:
ret <- c(p.value = test_results$p.value)
vignette("available-methods") # check which models have a tidier function
Summarise()
Finally, let's look at the skeleton of our Summarise()
function.
Summarise <- function(condition, results, fixed_objects = NULL) { # Summarise the simulation results ... # Return a named vector of results ret <- c(bias = NaN, RMSE = NaN) ret}
Summarise()
Finally, let's look at the skeleton of our Summarise()
function.
Summarise <- function(condition, results, fixed_objects = NULL) { # Summarise the simulation results ... # Return a named vector of results ret <- c(bias = NaN, RMSE = NaN) ret}
I want to highlight a few things here:
Summarise()
function only has access to the results returned from the Analyse()
functionSummarise()
This is the point which having a clearly articulated research question pays off. We should have clearly identified our outcomes PRIOR to coding our MCS.
Summarise()
This is the point which having a clearly articulated research question pays off. We should have clearly identified our outcomes PRIOR to coding our MCS.
For each condition, we have have a set R
of quantities of interest passed on from Analyse()
which we need to summarize. Each of which must be summarized carefully to avoid any errors.
Summarise()
This is the point which having a clearly articulated research question pays off. We should have clearly identified our outcomes PRIOR to coding our MCS.
For each condition, we have have a set R
of quantities of interest passed on from Analyse()
which we need to summarize. Each of which must be summarized carefully to avoid any errors.
For me, writing this function is where I encountered the most trouble, and consequently required the most time debugging.
Summarise()
This is the point which having a clearly articulated research question pays off. We should have clearly identified our outcomes PRIOR to coding our MCS.
For each condition, we have have a set R
of quantities of interest passed on from Analyse()
which we need to summarize. Each of which must be summarized carefully to avoid any errors.
For me, writing this function is where I encountered the most trouble, and consequently required the most time debugging.
The key was to think about how do I condense (i.e., summarize) the statistics I extracted from my Analyse()
function down to a single number (or pair of numbers if you are after confidence intervals) such that I now have the
Summarise()
This is the point which having a clearly articulated research question pays off. We should have clearly identified our outcomes PRIOR to coding our MCS.
For each condition, we have have a set R
of quantities of interest passed on from Analyse()
which we need to summarize. Each of which must be summarized carefully to avoid any errors.
For me, writing this function is where I encountered the most trouble, and consequently required the most time debugging.
The key was to think about how do I condense (i.e., summarize) the statistics I extracted from my Analyse()
function down to a single number (or pair of numbers if you are after confidence intervals) such that I now have the
Similar to the Analyse()
function, think about how you are going to structure the results. The results MUST be as a named vector or data.frame which has a single row.
Though as of last week, the development version of SimDesign added support for returning lists from the summarise() funciton
Often, computing complicated summary statistics is an easy place to make programming mistakes, and often they are hard to spot and debug.
Often, computing complicated summary statistics is an easy place to make programming mistakes, and often they are hard to spot and debug.
To help, the SimDesign has many built in summary statistics which are easy to use, and most importantly less error-prone (because they have been tested) than custom summary functions.
This table can be found in the on the last page of our article
Summarise()
Here is one way to compute the empirical detection rate, EDR()
.
Summarise <- function(condition, results, fixed_objects = NULL) { # explicitly set the alpha level alpha_rate <- .05 # Summarise the simulation results ... ret <- data.frame(edr = EDR(p = results$p.value, alpha = alpha_rate)) # Return a named vector of results ret}
Summarise()
Here is one way to compute the empirical detection rate, EDR()
.
Summarise <- function(condition, results, fixed_objects = NULL) { # explicitly set the alpha level alpha_rate <- .05 # Summarise the simulation results ... ret <- data.frame(edr = EDR(p = results$p.value, alpha = alpha_rate)) # Return a named vector of results ret}
Even now we have to think carefully about what our results mean. We just computed the proportion of R
simulations within each condition for which p was less than an \(\alpha\) of .05
Summarise()
Here is one way to compute the empirical detection rate, EDR()
.
Summarise <- function(condition, results, fixed_objects = NULL) { # explicitly set the alpha level alpha_rate <- .05 # Summarise the simulation results ... ret <- data.frame(edr = EDR(p = results$p.value, alpha = alpha_rate)) # Return a named vector of results ret}
Even now we have to think carefully about what our results mean. We just computed the proportion of R
simulations within each condition for which p was less than an \(\alpha\) of .05
If mean_diff = 0
If mean_diff
\(\ne\) 0
runSimulation()
Now that we have the necessary building blocks for running a simulation, this final step involves running a single function, runSimulation()
This function will do all of the "heavy lifting" for us.
runSimulation()
ArgumentsOur skeleton is sufficient to run the simulation immediately.
### Run the simulationres <- runSimulation(design=Design, replications=1000, generate=Generate, analyse=Analyse, summarise=Summarise, filename = paste0("SimDesign_first_ex_", lubridate::today()))
runSimulation()
ArgumentsOur skeleton is sufficient to run the simulation immediately.
### Run the simulationres <- runSimulation(design=Design, replications=1000, generate=Generate, analyse=Analyse, summarise=Summarise, filename = paste0("SimDesign_first_ex_", lubridate::today()))
show
save=TRUE
runSimulation()
ResultsThe object returned after a successful call to runSimulation()
can be best understood as three distinct pieces that have been combined by columns:
Design
object used to control the simulationSummarise()
functionEach row of this tibble object represents a set of results from a unique condition in your MCS (similar to how each row in the Design
object represents a unique MCS condition)
runSimulation()
ResultsrunSimulation()
ArgumentsBut let's take a peak at some of the built-in features which you might want to consider.
reSummarise()
at a later timesave_results = TRUE
save_seeds = TRUE
save_details = list()
parallel = TRUE
packages = NULL
One thing that can sometimes break your code is attempting incorporate parallel processing. Thankfully the functional separation of the MCS components makes it easy to distribute to nodes. Usually, there are function used that are not passed onto the nodes which results in errors about unknown functions. Simply pass a character vector of package names, and SimDesign will ensure that functions from those packages are distributed to the nodes as well.
runSimulation()
ArgumentsBut let's take a peak at some of the built-in features which you might want to consider.
reSummarise()
at a later timesave_results = TRUE
save_seeds = TRUE
save_details = list()
parallel = TRUE
packages = NULL
One thing that can sometimes break your code is attempting incorporate parallel processing. Thankfully the functional separation of the MCS components makes it easy to distribute to nodes. Usually, there are function used that are not passed onto the nodes which results in errors about unknown functions. Simply pass a character vector of package names, and SimDesign will ensure that functions from those packages are distributed to the nodes as well.
if( == TRUE) {
return()
} else {
next.slide()
}
Let's be honest, debugging your code is the most time consuming and frustrating part of the coding process. While that might never change, there are some things you can do to make your life easier using SimDesign.
Let's be honest, debugging your code is the most time consuming and frustrating part of the coding process. While that might never change, there are some things you can do to make your life easier using SimDesign.
runSimulation()
has an argument, debug
, which can be used to set a flag to trigger R's interactive debugging mode.
This argument accepts the following options:
max_errors = 50L
R
replications were successful or max_errors
was reached# improper level for variance ratioDesign_variance_error <- createDesign(mean_diff = c(0, 0.2), sample_size = c(30, 60), var_ratio = c(1, 2, -1))Design_variance_error
## # A tibble: 12 x 3## mean_diff sample_size var_ratio## <dbl> <dbl> <dbl>## 1 0 30 1## 2 0.2 30 1## 3 0 60 1## 4 0.2 60 1## 5 0 30 2## 6 0.2 30 2## 7 0 60 2## 8 0.2 60 2## 9 0 30 -1## 10 0.2 30 -1## 11 0 60 -1## 12 0.2 60 -1
If multiple conditions fatally terminated, use SimDesign's feedback in the R console to isolate those condition and check for common levels of a factor
dplyr::slice()
and pasting the row number(s) right into the slice()
functionDesign_fatal_conditions <- Design %>% slice(9,10,11,12)
runSimulation()
again, but using only the problematic conditions (to save time)If multiple conditions fatally terminated, use SimDesign's feedback in the R console to isolate those condition and check for common levels of a factor
dplyr::slice()
and pasting the row number(s) right into the slice()
functionDesign_fatal_conditions <- Design %>% slice(9,10,11,12)
runSimulation()
again, but using only the problematic conditions (to save time)demonstrate debugger mode
As your data generation and analyses become more complex, it becomes increasingly difficult to recreate or isolate errors. On a more positive side, SimDesign will continue to replicate a condition until R
replications occur or it hits max_errors
and fatally terminates the condition.
Errors should always be investigated to ensure the quality of your results.
As your data generation and analyses become more complex, it becomes increasingly difficult to recreate or isolate errors. On a more positive side, SimDesign will continue to replicate a condition until R
replications occur or it hits max_errors
and fatally terminates the condition.
Errors should always be investigated to ensure the quality of your results.
Thankfully, SimDesign captures the errors, their frequency, and the seeds which can be used to exactly replicate the conditions which triggered the error in the first place.
As your data generation and analyses become more complex, it becomes increasingly difficult to recreate or isolate errors. On a more positive side, SimDesign will continue to replicate a condition until R
replications occur or it hits max_errors
and fatally terminates the condition.
Errors should always be investigated to ensure the quality of your results.
Thankfully, SimDesign captures the errors, their frequency, and the seeds which can be used to exactly replicate the conditions which triggered the error in the first place.
To illustrate this, I want to manually throw an error in about 10% of my cases when var_ratio == 1
. I will place this code into the Analyse()
function.
# insert this code into the analysis funcitonif (rnorm(n = 1) < qnorm(p = .1) && var_ratio == 1) { stop("An obscure error has occured. Alert the proper authorities!!!")}
Once I rerun the simulation the results will look a bit different than they did earlier.
Next, we want to use a new function, SimExtract()
, and supply the results object and the additional argument what = "errors"
.
SimExtract(res, what = "errors")
We can also exactly replicate the error state by extracting the error_seeds
and supplying them to the original runSimulation()
function via the load_seeds
argument (e.g., micro reproducibility).
# extract error seedserror_seeds <- SimExtract(res, what = "error_seeds")# pass them back into runSimulation() and set and debugging flagsim_name <- paste0("SimDesign_first_ex_", lubridate::today())res <- runSimulation(design=Design, replications=1000, generate=Generate, analyse=Analyse, summarise=Summarise, filename = sim_name, load_seeds = error_seeds, debug = "all")
aggregate_simulations()
So, reviewer 2 thinks you need to increase the number of replications, but you don't want to rerun the original analysis.
aggregate_simulations()
So, reviewer 2 thinks you need to increase the number of replications, but you don't want to rerun the original analysis.
# initial simulationrunSimulation( design=Design, replications=500, generate=Generate, analyse=Analyse, summarise=Summarise, filename = "first_sim_results")
# second simulationrunSimulation( design=Design, replications=500, generate=Generate, analyse=Analyse, summarise=Summarise, filename = "second_sim_results")
First, make sure that both .rds files are located in the current working directory (which will be ensured if you are using an R project).
# aggregate the simulationsaggregate_simulations(files = c("first_sim_results", "second_sim_results"))
reSummarise()
To add additional summary statistics using pre-saved analysis results, simply write a new summary function, then call reSummarise()
and pass the new function and the name of the directory which holds the pre-saved results.
reSummarise()
To add additional summary statistics using pre-saved analysis results, simply write a new summary function, then call reSummarise()
and pass the new function and the name of the directory which holds the pre-saved results.
reSummarise()
To add additional summary statistics using pre-saved analysis results, simply write a new summary function, then call reSummarise()
and pass the new function and the name of the directory which holds the pre-saved results.
# expand the original Summarise() functionSummarise_extra <- function(condition, results, fixed_objects = NULL) { # Summarise the simulation results ... ret <- data.frame(edr = EDR(results$p.value), bias = bias(results$est_mean_differnce, parameter = condition$mean_diff), coverage = ECR(CIs = cbind(results$conf.low, results$conf.high), parameter = condition$mean_diff)) # Return a named vector of results ret}# resummarisesim_name <- paste0("SimDesign_second_ex_", lubridate::today())reSummarise(summarise = Summarise_extra, dir = paste0(sim_name,"_results"))
There are few things you can do to increase your chances of macro computational reproducibility.
Using the save_seeds = TRUE
argument will save the seeds used to run your simulation. At any time in the future, you can supply these seeds to the load_seeds
argument to exactly replicate your entire simulation.
One of my favourite things about the SimDesign package lies in some hidden attributes attached to the results.
summary(res)
There are few things you can do to increase your chances of macro computational reproducibility.
Using the save_seeds = TRUE
argument will save the seeds used to run your simulation. At any time in the future, you can supply these seeds to the load_seeds
argument to exactly replicate your entire simulation.
One of my favourite things about the SimDesign package lies in some hidden attributes attached to the results.
summary(res)
You can also pair this with the {renv} package to take a snapshot of your R project which functionally separates the packages needed for the project and ensures that the appropriate versions are installed.
# initialize your R project-local environmentrenv::init()# save the state of the project library lockfilerenv::snapshot()# collaborators can use {renv} to recreate your R environment# using the lockfilerenv::restore()
https://rstudio.github.io/renv/articles/renv.html
While all good things must come to an end, I hope that you all were persuaded to write your MCSs using the SimDesign package.
While all good things must come to an end, I hope that you all were persuaded to write your MCSs using the SimDesign package.
Don't forget to celebrate your victories when it comes to coding. I like to insert little bits of encouragement to let me know when my simulations are complete.
beepr::beep(5)praise::praise()
While all good things must come to an end, I hope that you all were persuaded to write your MCSs using the SimDesign package.
Don't forget to celebrate your victories when it comes to coding. I like to insert little bits of encouragement to let me know when my simulations are complete.
beepr::beep(5)praise::praise()
SimDesign also has integration with notification packages like {sendmailR} and {RPushbullet} to email or text you when your simulation has finished running.
While all good things must come to an end, I hope that you all were persuaded to write your MCSs using the SimDesign package.
Don't forget to celebrate your victories when it comes to coding. I like to insert little bits of encouragement to let me know when my simulations are complete.
beepr::beep(5)praise::praise()
SimDesign also has integration with notification packages like {sendmailR} and {RPushbullet} to email or text you when your simulation has finished running.
What are your questions?
You can also use the SimDesign package in conjunction with the {RPushbullet} package to send a text to your mobile phone when it finishes.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |