---
title: "Step-by-Step"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Step-by-Step}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  markdown: 
    wrap: 72
---

```{r options, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = ""
  
)
```

```{r setup, echo=FALSE}
library(stringr)

```

### Files

Several files are needed for execution of the MBBE package. These are:

1.  Model files. Standard NONMEM control files, one or more describing
    one or more candidate models to be averaged.

2.  Data file(s) - one or two. An analysis data set for the NONMEM model
    estimation in the bootstrap step is required. In addition, a data
    file for simulation of the intended virtual BE study is typically
    used. The analysis and simulation data sets may be the same file.

3.  The MBBE specification file. A json format file describing the
    analysis methods and parameters.

4.  (Optional) User supplied R code that returns a value that can be
    added to the default model selection criteria (BIC) used for model
    averaging. The file must be called "RPenaltyCode.R" and include a 
    function called MBBE_RPenaltyCode. The function MBBE_RPenaltyCode must return a 
    numeric penalty.

## First you need model(s)

Five models are provided with the MBBE package. Any number of model \> 1
can be used, bearing in mind that the total number of bootstrap samples
will be the number of models x the bootstrap sample size. In the present
example, models were selected using
[pyDarwin](https://certara.github.io/pyDarwin/html/index.html). Example
models (model1.mod - model5.mod) are at:

```{r echo = FALSE}
models_path <- system.file(package = "mbbe","examples")
```

```{r, echo=FALSE}
file.path(models_path,list.files(models_path, pattern = "\\.mod$"))
```

Let's view an example model file:

```{r, echo = FALSE}
readLines(list.files(models_path, pattern = "\\.mod$", full.names = TRUE)[1])
```

These model need not have a \$COV record (unless that is needed for any
user defined R code). There are few restriction on the structure of the
model. However, there are a significant number of requirement for both
the control file and data file format. It is recommended that the full
path to the data set be provided in the \$DATA record.

There are four critical requirements of the model files data sets. There
are:

1.  The values

-   TRT (treatment)

-   GROUP

-   PERIOD

-   SEQ (sequence)

    Note, these data items must be provided in the analysis data set as
    well as the simulation data set, even if they are not used. The
    GROUP (treatment and reference groups) must be used in the model.
    However, PERIOD and SEQ need not be included in the analysis model
    (and so can be all the same values, e.g., all "1"s). All four data
    items need to be included in the simulation data set (reflecting the
    multiple period cross over study design), and so must also be in the
    analysis data set and the \$INPUT record in the analysis control
    file(s).

2.  Only THETAs may be estimated. To accomplish this, \$OMEGA and
    \$SIGMA block can be FIXed to 1 and ETAs and EPSs multiplied by a
    THETA, see examples at

```{r include=FALSE}
models_path <- file.path(system.file(package = "mbbe"),"inst","examples")
if( .Platform$OS.type == "windows" )
  models_path <- str_replace_all(models_path,"//","\\")
```

```{r, echo=FALSE}
cat(models_path)
```

3.  The \$ESTIMATION record, followed by the \$THETA block MUST be the
    final two blocks in the control file (with the exception of any
    \$TABLE records that might be needed for any user supplied R code).

4.  The line

\|;;;; Start EST.

must appear immediately before the \$EST record. The reason for these
two requirements is that the simulation control files are constructed
from the analysis control files. All content in the analysis control
file is copied, then everything after the line

\|;;;; Start EST

is removed and replace with a \$SIM record, followed by the theta
estimates extracted from the .xml file for that model and bootstrap
sample. A \$TABLE record is then added, with data items required for the
power calculation. Note that this \$TABLE record appended to the
simulation control file will include:

-   TRT

-   GROUP

-   PERIOD

-   SEQ

and so these data must be available, either (preferably) in both the
analysis and simulation data sets or calculated in the analysis control
file. In addition the \$DATA record is edited (programmatically by the R
code) to reference the simulation data set, rather than the analysis
data. Note that the \$INPUT record is preserved unchanged between the
user provided analysis control file and the generated simulation control
file, and so the structure of the analysis and simulation data sets must
be the same.

If identifiability check is requested, then the option "SADDLE_RESET=1"
must be included in the \$ESTIMATION record of the analysis control
file.

## Next you need two data sets (although they can be the same one)

Two formatted data set are provided here:

```{r include=FALSE}
data_path <- file.path(system.file(package = "mbbe"),"inst","examples","data.csv")
data_sim_path <- file.path(system.file(package = "mbbe"),"inst","examples","data_sim.csv")
if( .Platform$OS.type == "windows" )
  data_path <- str_replace_all(data_path,"//","\\")
if( .Platform$OS.type == "windows" )
  data_sim_path <- str_replace_all(data_sim_path,"//","\\")
```

```{r, echo=FALSE}
cat(data_path)
cat(data_sim_path)
```

Note that, in this case, these two data sets are identical. In general,
the analysis data set, for bootstrap, will not be the same as the
simulation data. For example, it may be the case that only sparse data
may be available for analysis (e.g., clinical scenarios were few samples
can be collected), or a cross over study may not be feasible (e.g., long
acting injectable drugs). Still the simulation must be done using a
traditional study design, e.g., 4 period, cross over study. The typical
scenario is to use a different study design that the study(ies) that
generated the analysis data, assuming that if the analysis data were
from an adequate BE study, those data would be used to assess BE, rather
than MBBE.

The analysis data set is specified on the \$DATA record in the NONMEM
control files for the bootstrap. The simulation data set is specified as
in the MBBE specification file:

```{r include=FALSE}
MBBE_file <- file.path(system.file(package = "mbbe"),"inst","examples","mbbeargs.json") 
if( .Platform$OS.type == "windows" )
  MBBE_file <- str_replace_all(MBBE_file,"//","\\")
```

```{r, echo=FALSE}
cat(MBBE_file) 
```

Several notes on the data:

1.  As discussed above, data item for "GROUP" (1,2 are reference, 3,4
    are test in the examples) are included both in the analysis data set
    and the simulation data set. This is the simplest way to include
    these required variables in the simulation and analysis. The example
    simulation data set (data_sim.csv) and the source data set
    (data.csv) are both 4 period cross over studies, and so there are 4
    values for group (although only 2 formulations). It is not a
    requirement that the source data be a typical 4 period cross over
    study, although data for both test and reference formulation need to
    be included in order to estimate any formulation effects.

2.  Data item for "PERIOD" is provided (1-4). In general, between
    occasion variability (e.g., by study period) should be assessed and
    included in the model(s) if supported. The PERIOD data item is
    recommended in both the analysis and simulation data sets. If it is
    not included in the data set it must be calculated in the control
    file so it can be included in \$TABLE output for the simulations,

3.  Data item for sequence ("SEQ") is provided (1-2). Presumably, period
    and sequence were tested in the model selection and not found to be
    supported. However, similar to PERIOD, SEQ is required for the Monte
    Carlo simulation and statistical testing to be available for output
    in the \$TABLE step.

## User supplied penalty R code

The model averaging is, by default, based on the Bayesian Information
criteria (BIC), with the option to exclude any results that fail the
[Identifiability
test](https://www.page-meeting.org/pdf_assets/1345-PAGE_2017_SADDLE_RESET_Final.pdf).
Another, more general option is to include a user defined R function, to
return a penalty that is added to the BIC. The path to the file
containing this function can be specified in the R_code_path option in
the MBBE specification file, and the user_R\_code option set to "true".
The function must take the arguments:

-   run_dir - the parent run directory, as specified in the MBBE
    specification file

-   this_model - the model number

-   this_samp - the bootstrap sample number.

These arguments can then be used to access any output from the NONMEM
bootstrap model, such the .lst file, the .xml file. The path to the
model output files (.xml, .lst, .ext etc., as well as any \$TABLE
output) will be

run_dir/modelM/N

Where run_dir is the run directory (run_dir) specified in the MBBE
specification file, M is the model number and N is the bootstrap sample
number. The NONMEM output (.lst) and xml (.xml) files will be called:

bsSampM_N.lst

and bsSampM_N.xml

respectively, where M is the model number and N is the bootstrap sample
number. Other standard NONMEM output (e.g., .ext, .cor etc.) are
similarly named. Any \$TABLE file output, including \$SIM output, that
is specified in the analysis control files can be accesses as well,
using the file structure information described above.

The user defined R function should return, in all case, a numeric
penalty. An example R script to return a penalty for bias in simulated
Cmax is given in RPenaltyCode.R. This function requires that a 2nd
problem (for simulation) be included in the control files. An example
where RPenaltyCode.R returns a penalty for bias in prediction of Cmax is
given in:

```{r include=FALSE}
Rcode_path <- file.path(system.file(package = "mbbe"),"inst","example","RPenaltyCode.R")
if( .Platform$OS.type == "windows" )
  Rcode_path <- str_replace_all(Rcode_path,"//","\\")
```

```{r, echo=FALSE}
cat(Rcode_path)
```

### The MBBE Specification file

An example MBBE specification file is given at:

```{r include=FALSE}
MBBE_file <- file.path(system.file(package = "mbbe"),"inst","examples","mbbeargs.json")
if( .Platform$OS.type == "windows" )
  MBBE_file <- str_replace_all(MBBE_file,"//","\\")

```

```{r, echo=FALSE}
cat(MBBE_file)
```

The json file consists of key-value pairs. All pairs are required,
except "R_code_path": is not required if "user_R\_code": if set to
false.

The key-value pairs are:

##### "run_dir":

(string) parent directory where NONMEM is to be run. The bootstrap runs
will be executed in run_dir/modelM/N.

Where M is the model number and N is the bootstrap sample number. The
output (.lst) and xml (.xml) files will be bsSampM_N.lst and
bsSampM_N.xml.

##### "model_source":

(list) A list of paths to the candidate models. JSON syntax for list is:
["path1", "path2, ...]. Note that separator in path will be forward
slash "/" for any operating system.

##### "num_parallel":

(integer) Number of NONMEM executions to be run in parallel. In general
should not exceed the number cores.

##### "crash_value":

(numeric) A value to be assigned to the model selection criteria if the
calculation fails. Should be larger than any anticipated value for BIC +
any R penalty.

##### "nmfe_path":

(string) Path to nmfe??.bat. Note that separator in path will be forward
slash "/" regardless of operating system.

##### "delta_parms":

(numeric) Largest absolute fractional difference between any parameters
before and after the SADDLE_RESET that is permitted. Any model/sample
with a value larger than delta_parm will result in a failed
identifiability test for that model/sample and will be excluded for the
model averaging. If all models for a given sample fail the
identifiability test, that sample is excluded from the Monte Carlo
simulation.

##### "use_check_identifiable":

(logical) true or false, if identifability is to be checked.

##### "NCA_end_time":

(numeric) End time for calculation of AUC.

##### "rndseed":

(integer) Random seed for Bootstrap sampling and Monte Carlo Simulation.

##### "simulation_data_path":

(string) Path to simulation data set. Note that the separator in path
will be operating system dependent, as this text is simply copied
unchanged into the \$INPUT record.

##### "ngroups":

(integer) Number of GROUPs.

##### "samp_size":

(integer) Number of bootstrap and Monte Carlo Simulations.

##### "reference_groups":

(list of integers) Which of the GROUPs are reference formulation

##### "test_groups":

(list of integers) Which of the GROUPs are test formulation

##### "plan":

(string)- strategy for running R parallel. [Options
are](https://rdrr.io/cran/future/man/plan.html):

-   sequential: Resolves futures sequentially in the current R process,
    e.g. plan(sequential).

-   multisession: Resolves futures asynchronously (in parallel) in
    separate R sessions running in the background on the same machine,
    e.g. plan(multisession) and plan(multisession, workers = 2).

-   multicore: Resolves futures asynchronously (in parallel) in separate
    forked R processes running in the background on the same machine,
    e.g. plan(multicore) and plan(multicore, workers = 2). This strategy
    is not supported on Windows.

##### "alpha_error":

(numeric)- alpha error for computing one sided T tests.

##### "NTID":

(logical) true or false, whether this is a narrow therapeutic index
drug.

##### "model_averaging_by":

(string) "study" \| "subject". "study" is recommended, "subject" is
experimental.

##### "user_R\_code":

(logical) true or false, if user supplied R code is to be run and
included in model averaging penalty. If true, R_code_path is required.

##### "R_code_path":

(string) - path to user supplied R code for calculating additional
penalty for model averaging.

### Execution

MBBE is executed from the command line with

*mbbe::run_mbbe_json("path")*

where *"path*" is the path to the MBBE specification file.

## Output

## Console output

The start date and time of each event in the MBBE process will be
listed. The initial output from MBBE describes the options for running
MBBE and results of checks.

When the bootstrap starts, MBBE will print out a progress line that
describes what percent of the bootstrap models have been started.
Completion of the bootstrap step will occur after all the models are
started, and then completed.

Various status update message will be displayed as the bootstrap is
completed, model parameters are collected, if requested, user supplied R
code penalty etc.

Once the BIC and total penalty values are calculated, they will be
displayed.

When available, the estimated power for Cmax, AUC~last~ and
AUC~infinity~ will be displayed and the number of bootstrap sample for
each model that are identifiable is output.

Finally, the statistical output includes, for each Monte Carlo
simulation, whether the parameters (Cmax, AUC~last~ and AUC~infinity~)
are bioequivalence, the ratio for each and the upper and lower limits of
the confidence interval for the ratio.

### File output

Comma separated value (.csv) files with BIC values will be written to

run_dir/BICS.csv

and total penalties written to

run_dir/Total_penalties.csv

where run_dir is the run directory specified in the MBBE specification
file.

The power for each parameter is written out to the file

run_dir/MBBEpower[data-time].csv

where [data-time] is the analysis completion date and time.

All statistical results are written to run_dir/All_results.csv.

## Running the example.

An example can be run using the provide control files, data sets and
mbbe options file with the command:

mbbe::run_example(run_dir, nmfe??.bat, Include_R\_Code, plan)

where:

-   run_dir (string) is the parent directory for NONMEM execution

-   nmfe??.bat (string) is the full path to the nmfe??.bat file, where
    ?? is the NONMEM version (e.g., c:/nmfe74/util/nmfe74.bat)

-   Include_R\_Code = (logical), whether to include the (provided)
    script for a penalty in the model averaging. R script can be found
    in the file RPenaltyCode.r in run_dir. Default value is FALSE

-   plan is one of "sequential", "multicore", "multisession" with a
    default of "multisession"

The num_parallel will be set to the (number of cores - 1) or 1 if there
is only 1 core, and the analysis executed. Execution on a 32 core
machine with Include_R\_Code set to FALSE takes \~4 minutes and \~8
minutes on a 4 core machine. The sample size for the bootstrap and Monte
Carlo simulation is set to only 4, which will give a very poor estimate
of power. To reduce execution time, the FO method is used for the
bootstrap, rather than FOCEI.

Once the command is issued, the code will check if the run directory
(run_dir) is present, and if so, will ask permission to delete it. The
dialog that appears for ask permission may not be on top. If the
directory needs to be deleted, click the "OK" button.

Once started, various checks are performed and display of many of the
analysis parameters including the path(s) to the analysis model(s). An
example, where the run_dir is set to c://mbbe_ex is given below.

`if c:/mbbe_ex exists, a dialog will appear asking permission to remove it, it may not be on top`

`No additional penalty for model averaging will be used`

`MBBE will be run 7 x parallel`

`Estimated run time for example = 14 minutes`

`Example mbbe arguments file is at c:/mbbe_ex/example.json`

`2023-11-21 11:33:12 Start time`

`Model file(s) =`

`c:/mbbe_ex/model1.mod`

`c:/mbbe_ex/model2.mod`

`c:/mbbe_ex/model3.mod`

`c:/mbbe_ex/model4.mod`

`c:/mbbe_ex/model5.mod`

`reference groups = 1, 2`

`test groups = 3, 4`

`Run Directory = c:/mbbe_ex/example`

`Number of groups = 4`

`Model averaging will be by study`

`Bootstrap/Monte Carlo sample size = 5`

`nmfe??.bat path = c:/nm74g64/util/nmfe74.bat`

`Use_check_identifiability = FALSE`

`Narrow Therapeutic Index  = FALSE`

`Alpha error rate for bioequilvalence testing = 0.05`

`Number parallel runs for bootstrap, simulations and NCA = 7`

`Simulation data set = c:/mbbe_ex/data_sim.csv`

`Simulation data path = c:/mbbe_ex/data_sim.csv`

`Not using post run R code for model averaging selection`

After displaying the analysis parameters and options, the list of file
outputs will be displayed, e.g.,

`BICs values will be written to c:/mbbe_ex/example/BIC.csv`

`Total Model averaging penalties will be written to c:/mbbe_ex/example/Total_penalties.csv`

After that, the parameter values will be checked for consistent, the
path(s) to nmfe??.bat and the R code if requested will be checked. If
requirements are passed, the message:

`Passed requirements check`

will be displayed, otherwise error messages will be shown.

Once the checks are execution will begin, with messages about copying
the control files, sampling the data and starting the bootstrap model
runs. During the bootstrap model runs, a progress bar will be shown,
based on the number of bootstrap models **started** (not finished).

Once the bootstrap model runs are done, the BIC values will be
displayed, e.g.,

`BICS =`

`Model1   Model2   Model3   Model4   Model5 Best Max_Delta_parm Max_Delta`

`1 9624.202 9643.253 9626.252 9650.211 9643.971    1           -999      -999`

`2 9653.113 9696.959 9653.207 9658.488 9651.800    5           -999      -999`

`3 9272.172 9354.862 9272.172 9318.539 9304.562    1           -999      -999`

`4 9731.254 9779.648 9731.458 9754.807 9771.924    1           -999      -999`

`5 9147.007 9181.334 9143.620 9167.294 9155.821    3           -999      -999`

In this case, identifiability was not requested and so the
Max_Delta_parm and Max_Delta are not available (value = -999).

After the BIC values the total penalties (which is BIC+ any penalty from
the user supplied R code) will be displayed. In this case, as no use
supplied R code is used, they will be the same as the BICs.

`Total Penalties for bootstrap models =`

`Model1   Model2   Model3   Model4   Model5`

`1 9624.202 9643.253 9626.252 9650.211 9643.971`

`2 9653.113 9696.959 9653.207 9658.488 9651.800`

`3 9272.172 9354.862 9272.172 9318.539 9304.562`

`4 9731.254 9779.648 9731.458 9754.807 9771.924`

`5 9147.007 9181.334 9143.620 9167.294 9155.821`

Next, messages will appear about the simulations, and a progress bar
reflecting the number simulations

that have started. After simulations are done, the NCA parameters from
the simulations are calculated. Histograms are generated from these NCA
parameters (Cmax, AUCinf and AUClast). The path to these plots are
displayed:

`2023-11-21 11:39:32 Plots are saved in c:/mbbe_ex/example`

The elapse time is then shown, and the power for finding BE is given
for Cmax, AUClast and AUCinf:

`$Cmax_power`

`Power`

`1     1`

`$AUClast_power`

`Power`

`1     1`

`$AUCinf_power`

`Power`

`1     1`

The number of samples for each model that are identifiable is printed,
and finally the NCA statistics (Ratios of reference to test, upper and
lower limits and whether BE is shown) are printed

`$NCA_stats`

`Cmax_BE Cmax_Ratio Cmax_lower_CL Cmax_upper_CL AUClast_BE AUClast_Ratio AUClast_lower_CL`

`1    TRUE   99.63386     0.9880040     1.0047434       TRUE      99.39984        0.9900570`

`2    TRUE   98.85888     0.9807576     0.9964826       TRUE      99.63649        0.9927208`

`3    TRUE   98.86239     0.9802479     0.9970713       TRUE      99.48245        0.9913762`

`4    TRUE   99.83804     0.9898971     1.0069365       TRUE     100.06885        0.9970103`

`5    TRUE   99.75044     0.9885444     1.0065455       TRUE      99.76438        0.9944456`

`AUClast_upper_CL AUCinf_BE AUCinf_Ratio AUCinf_lower_CL AUCinf_upper_CL`

`1        0.9979555      TRUE     99.39840       0.9900409       0.9979428`

`2        1.0000224      TRUE     99.63729       0.9927314       1.0000279`

`3        0.9982847      TRUE     99.48119       0.9913649       0.9982709`

`4        1.0043803      TRUE    100.06714       0.9969955       1.0043608`

`5        1.0008522      TRUE     99.76512       0.9944545       1.0008582`

The histograms for AUCinf, AUClast and Cmax for test and reference in
the simulated studies are shown below. This plots are also saved to the
run directory.

![](images/All_models_AUCinf_histogram_by_treatment.jpeg){width="460"}

![](images/All_models_AUClast_histogram_by_treatment.jpeg){width="460"}

![](images/All_models_Cmax_histogram_by_treatment.jpeg){width="460"}