| Title: | Joint Covariance and Treatment-Effect Tests for Multiple Outcomes |
| Version: | 0.18.1 |
| Description: | Fits generalized linear models, Cox proportional-hazards models, log-rank tests, generalized estimating equations, mixed models with repeated measures, Kaplan-Meier curves, quantile differences, and hierarchical net-benefit (win-difference) and log win-ratio statistics jointly across multiple endpoints, and returns the full asymptotic covariance matrix linking them. Implements PATED (Prognostic Assisted Treatment Effect Detection), a randomized-trial method that exploits balanced prognostic covariates to tighten standard errors and increase statistical power without introducing bias. |
| URL: | https://github.com/zhangh12/multipleOutcomes, https://zhangh12.github.io/multipleOutcomes/ |
| BugReports: | https://github.com/zhangh12/multipleOutcomes/issues |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| Imports: | dplyr, ggplot2, ggpubr, mmrm (≥ 0.3.15), mvtnorm, rlang, sandwich, stringr, survival, tidyr, tidyselect |
| Suggests: | asaur, coin, iBST, invGauss, JM, knitr, momentfit, numDeriv, pec, rmarkdown, testthat (≥ 3.0.0) |
| Depends: | R (≥ 3.5) |
| LazyData: | true |
| Config/roxygen2/version: | 8.0.0 |
| RoxygenNote: | 7.3.3 |
| NeedsCompilation: | yes |
| Packaged: | 2026-06-15 07:11:49 UTC; zhhan |
| Author: | Han Zhang [aut, cre] |
| Maintainer: | Han Zhang <zhangh.ustc@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2026-06-15 07:40:02 UTC |
ACTG 320 Clinical Trial Dataset
Description
actg dataset from Hosmer et al.
Format
A data frame
- id
Identification Code
- time
Time to AIDS diagnosis or death (days).
- censor
Event indicator. 1 = AIDS defining diagnosis, 0 = Otherwise.
- time_d
Time to death (days)
- censor_d
Event indicator for death (only). 1 = Death, 0 = Otherwise.
- tx
Treatment indicator. 1 = Treatment includes IDV, 0 = Control group.
- txgrp
Treatment group indicator. 1 = ZDV + 3TC. 2 = ZDV + 3TC + IDV. 3 = d4T + 3TC. 4 = d4T + 3TC + IDV.
- strat2
CD4 stratum at screening. 0 = CD4 <= 50. 1 = CD4 > 50.
- sex
0 = Male. 1 = Female.
- raceth
Race/Ethnicity. 1 = White Non-Hispanic. 2 = Black Non-Hispanic. 3 = Hispanic. 4 = Asian, Pacific Islander. 5 = American Indian, Alaskan Native. 6 = Other/unknown.
- ivdrug
IV drug use history. 1 = Never. 2 = Currently. 3 = Previously.
- hemophil
Hemophiliac. 1 = Yes. 0 = No.
- karnof
Karnofsky Performance Scale. 100 = Normal; no complaint no evidence of disease. 90 = Normal activity possible; minor signs/symptoms of disease. 80 = Normal activity with effort; some signs/symptoms of disease. 70 = Cares for self; normal activity/active work not possible.
- cd4
Baseline CD4 count (Cells/Milliliter).
- priorzdv
Months of prior ZDV use (months).
- age
Age at Enrollment (years).
Source
ftp://ftp.wiley.com/public/sci_tech_med/survival
References
Hosmer, D.W. and Lemeshow, S. and May, S. (2008) Applied Survival Analysis: Regression Modeling of Time to Event Data: Second Edition, John Wiley and Sons Inc., New York, NY
Examples
data(actg)
Extract Model Coefficients
Description
coef is a generic function.
Usage
## S3 method for class 'jointCovariance'
coef(object, model_index = NULL, ...)
Arguments
object |
an object returned by |
model_index |
|
... |
for debugging only |
Value
a vector of coefficient estimates
Creating Objects of Proportional Hazards Regression Model
Description
coxph_ is a wrapper function of survival::coxph to create an object to be
passed into jointCovariance, the main function of this package through its
argument .... The object defines how a proportional hazard model would be
fitted.
Usage
coxph_(formula, data_index = 1)
Arguments
formula |
see |
data_index |
integer. Index of the data frame in the |
Details
Not all arguments of survival::coxph are supported in coxph_ due to the
complexity in handling environment and scope, which is particularly difficult
for arguments like weights, subset, etc.
Creating Objects of Generalized Estimation Equation Model
Description
gee_ is a wrapper function of gee::gee to create an object to be passed
into jointCovariance, the main function of this package through its
argument .... The object defines how a GEE model would be fitted.
This package does not import the package gee. Instead, codes of gee are
modified and integrated to compute score and information matrix. Thus, users
does not need to install the package gee to use this package.
Usage
gee_(formula, family, corstr, R = NULL, b = NULL, Mv = 1, data_index = 1)
Arguments
formula |
see |
family |
see |
corstr |
see |
R |
see |
b |
see |
Mv |
see |
data_index |
integer. Index of the data frame in the |
Details
Not all arguments of stats::gee are supported in gee_ due to the
complexity in handling environment and scope, which is particularly difficult
for arguments like subset, etc.
Creating Objects of Generalized Linear Models
Description
glm_ is a wrapper function of stats::glm to create an object to be passed
into jointCovariance, the main function of this package through its
argument .... The object defines how a GLM model would be fitted.
Usage
glm_(formula, family, data_index = 1)
Arguments
formula |
see |
family |
currently supports |
data_index |
integer. Index of the data frame in the |
Details
Not all arguments of stats::glm are supported in glm_ due to the
complexity in handling environment and scope, which is particularly difficult
for arguments like weights, subset, etc.
Rectal Indomethacin for Prevention of Post-ERCP Pancreatitis
Description
Patient-level data from a multi-center, randomized, double-blind, placebo-controlled 2-arm trial (n = 602) of rectal indomethacin (100 mg) versus placebo to prevent post-ERCP pancreatitis in high-risk patients, as reported by Elmunzer, Higgins, et al. (2012) in the New England Journal of Medicine.
This dataset was originally collected, cleaned, reformatted, and released
for public teaching and research use by Dr. Peter D. R. Higgins in the
medicaldata R package as indo_rct. The version shipped here is
redistributed in support of the worked examples in this package. The
variable definitions below follow medicaldata; note that a few
columns are stored as numeric (rather than factor) in this copy. Users who
need the authoritative copy or accompanying documentation should consult
medicaldata.
Format
A data frame with 602 observations on the following 33 variables:
idSubject identifier (numeric); leading digit indicates center. Range 1001–4003.
siteStudy site (factor, 4 levels):
1_UM= University of Michigan,2_IU= Indiana University,3_UK= University of Kentucky,4_Case= Case Western Reserve University.ageAge in years (numeric), range 19–90.
riskRisk score for post-ERCP pancreatitis (numeric), range 1–5.5.
genderSex (factor):
1_female,2_male.outcomePrimary outcome: post-ERCP pancreatitis (numeric, 1 = yes, 0 = no).
sodSphincter of Oddi dysfunction present (factor):
0_no,1_yes.pepHistory of prior post-ERCP pancreatitis (factor):
0_no,1_yes.recpancHistory of recurrent pancreatitis (factor):
0_no,1_yes.psphincPancreatic sphincterotomy performed (factor):
0_no,1_yes.precutSphincter pre-cut needed to enter papilla (factor):
0_no,1_yes.difcanCannulation of papilla was difficult (factor):
0_no,1_yes.pneudilPneumatic dilation of papilla performed (factor):
0_no,1_yes.ampAmpullectomy performed (factor):
0_no,1_yes.paninjContrast injected into pancreas (factor):
0_no,1_yes.acinarPancreas appeared to have acinarization on imaging (factor):
0_no,1_yes.brushBrushings taken from pancreatic duct (factor):
0_no,1_yes.asa81Aspirin used at 81 mg per day (factor with 3 levels):
0_no,1_yes, and a third level retained from the source coding.asa325Aspirin used at 325 mg per day (factor with 3 levels):
0_no,1_yes, and a third level retained from the source coding.asaAspirin used at any dose (factor with 3 levels):
0_no,1_yes, and a third level retained from the source coding.prophystentPancreatic duct stent placed per endoscopist judgment (factor):
0_no,1_yes.therastentPancreatic duct stent placed to treat narrowing (factor):
0_no,1_yes.pdstentPancreatic duct stent placed for any reason (factor):
0_no,1_yes.sodsomSphincter of Oddi manometry performed (factor):
0_no,1_yes.bsphincBiliary sphincterotomy performed (factor):
0_no,1_yes.bstentBiliary stent placed to relieve obstruction (factor):
0_no,1_yes.choleCholedocholithiasis present (factor):
0_no,1_yes.pbmalBiliary duct or pancreatic malignancy found (factor):
0_no,1_yes.trainTrainee participated in ERCP (factor):
0_no,1_yes.statusPatient status (factor):
0_inpatient,1_outpatient.typeSphincter of Oddi dysfunction type (factor):
0_no SOD,1_type 1,2_type 2,3_type 3.rxTreatment assignment (numeric, 1 = indomethacin, 0 = placebo).
bleedReportable gastrointestinal bleeding (numeric, coded 1 = no, 2 = yes;
NAwhen not assessed).
Source
Higgins, P. D. R. medicaldata: Data Package for Medical Datasets.
R package, dataset indo_rct.
https://CRAN.R-project.org/package=medicaldata
References
Elmunzer BJ, Higgins PDR, Saini SD, et al. A randomized trial of rectal indomethacin to prevent post-ERCP pancreatitis. New England Journal of Medicine 2012; 366(15):1414–1422. doi:10.1056/NEJMoa1111103
Examples
data(indo)
Fitting Regression Models for Multiple Outcomes and Returning the Matrix of Covariance
Description
jointCovariance can fit different types of models for multiple outcomes
simultaneously and return model parameters and variance-covariance matrix
for further analysis.
Usage
jointCovariance(..., data, nboot = 0, compute_cov = TRUE, seed = NULL)
Arguments
... |
objects returned by |
data |
a data frame if all models are fitted on the same dataset;
otherwise a list of data frames for fitting models in |
nboot |
non-zero integer if bootstrap is adopted. By default 0. |
compute_cov |
logic. If |
seed |
random seed when generate bootstrap data. |
Value
It returns an object of class "jointCovariance", which is a list containing the following components:
coefficients | an unnamed vector of coefficients of all fitted models.
Use id_map for variable mapping. |
mcov | a unnamed matrix of covariance of coefficients. Use id_map
for variable mapping. |
id_map | a list mapping the elements in coefficients and mcov to
variable names. |
n_shared_sample_sizes | a matrix of shared sample sizes between datasets being used to fit the models. |
call | the matched call. |
Examples
## More examples can be found in the vignettes.
library(survival)
library(mvtnorm)
library(tidyr)
genData <- function(seed = NULL){
set.seed(seed)
n <- 300
sigma <- matrix(.7, 4, 4)
diag(sigma) <- 1
v <- rmvnorm(n, sigma = sigma)
x1 <- v[, 1]
x2 <- v[, 2]
z1 <- (v[, 3] > 0) + 0
z2 <- v[, 4]
trt <- rbinom(n, 1, .5)
bet <- c(-.3,.3)
y <- -log(runif(n))/
exp(-.3 * x1 + .3 * x2 + z1 * .5 - z2 * .3 + .1 * trt + rnorm(n))
z1[sample.int(n, 50)] <- NA
z2[sample.int(n, 50)] <- NA
x1[sample.int(n, 50)] <- NA
x2[sample.int(n, 50)] <- NA
death <- ifelse(y > 2, 0, 1)
y[y > 2] <- 2
pid <- paste0('pid-', 1:n)
ret <- data.frame(
y = y, trt = trt,
z1 = z1, z2 = z2,
x1 = x1, x2 = x2,
death, pid)
ret
}
dat1 <- genData()
## create a dataset with repeated measurements x
dat2 <- dat1 %>% pivot_longer(c(x1, x2), names_to='visit', values_to='x') %>%
dplyr::select(x, trt, visit, pid) %>% as.data.frame()
dat2$visit <- as.factor(dat2$visit)
dat2$pid <- as.factor(dat2$pid)
fit <- jointCovariance(
coxph_(Surv(time = y, event = death) ~ trt, data_index = 1),
logrank_(Surv(time = y, event=death) ~ trt, data_index = 1),
glm_(z1 ~ trt, family = 'binomial', data_index = 1),
glm_(z2 ~ trt, family = 'gaussian', data_index = 1),
mmrm_(x ~ trt + us(visit | pid), reml = TRUE, data_index = 2),
gee_(x ~ trt, family = 'gaussian', corstr = 'independence', data_index = 2),
data = list(dat1, dat2))
fit
bfit <-jointCovariance(
coxph_(Surv(time=y, event=death) ~ trt, data_index = 1),
logrank_(Surv(time=y, event=death) ~ trt, data_index = 1),
glm_(z1 ~ trt, family = 'binomial', data_index = 1),
glm_(z2 ~ trt, family = 'gaussian', data_index = 1),
mmrm_(x ~ trt + us(visit | pid), reml = TRUE, data_index = 2),
gee_(x ~ trt, family = 'gaussian', corstr = 'independence', data_index = 2),
data = list(dat1, dat2), nboot = 10)
summary(bfit)
## km_() and quantile_() require nboot > 0 because they have no
## closed-form score. compute_cov is forced to FALSE for km_().
## When all models share one dataset, `data_index` and `list(...)`
## can be omitted.
kfit <- jointCovariance(
km_(Surv(time = y, event = death) ~ trt, conf_type = 'log',
times = c(0.5, 1, 1.5)),
glm_(z1 ~ trt, family = 'binomial'),
data = dat1, nboot = 30, seed = 1)
qfit <- jointCovariance(
quantile_(y ~ trt, probs = c(0.25, 0.5, 0.75)),
glm_(z2 ~ trt, family = 'gaussian'),
data = dat1, nboot = 30, seed = 1)
Creating Objects of Kaplan-Meier Curve
Description
km_ is a wrapper function creating an object of Kaplan-Meier curve to be
passed into jointCovariance, the main function of this package through its
argument .... The object defines how a Kaplan-Meier curve would be fitted.
Usage
km_(formula, times = NULL, conf_type, data_index = 1)
Arguments
formula |
a formula created by |
times |
numeric vector of time. Survival probabilities at |
conf_type |
character. Type of confidence interval. It must be one of
|
data_index |
integer. Index of the data frame in the |
Details
Usually, g-transformation is applied to the survival probability S(t)
to obtain pointwise confidence interval of a Kaplan-Meier curve. This
can be achieved by specifying conf_type. For identity transformation,
use conf_type = "plain".
This function can only be used with jointCovariance when the bootstrap
method is used to estimate variance-covariance matrix of multiple outcome
models.
Creating Objects of Logrank Test
Description
logrank_ is a wrapper function survival::coxph to create an object to be
passed into jointCovariance, the main function of this package through its
argument .... Logrank test is the score test under the proportional hazards
regression model. The object defines how a logrank test would be computed.
Usage
logrank_(formula, ties = c("efron", "breslow", "exact"), data_index = 1)
Arguments
formula |
see |
ties |
character string specifying the method for tie handling. One of
|
data_index |
integer. Index of the data frame in the |
Details
Not all arguments of survival::coxph are supported in logrank_ due to the
complexity in handling environment and scope, which is particularly difficult
for arguments like weights, subset, etc.
Creating Objects of Mixed Models for Repeated Measures
Description
mmrm_ is a wrapper function of mmrm::mmrm to create an object to be passed
into jointCovariance, the main function of this package through its
argument .... The object defines how a MMRM model would be fitted.
Usage
mmrm_(
formula,
covariance = NULL,
reml = TRUE,
control = mmrm::mmrm_control(...),
...,
data_index = 1
)
Arguments
formula |
see |
covariance |
see |
reml |
see |
control |
see |
... |
see |
data_index |
integer. Index of the data frame in the |
Details
The argument weights of mmrm::mmrm is supported in mmrm_ due to the
complexity in handling environment and scope.
Please always refer to help document of mmrm::mmrm before using mmrm_.
For example, time variable and observation ID must be factor variables in
some cases, otherwise error may be prompted. Users can call mmrm::mmrm
using the same arguments being passed to mmrm_ to check validity.
Endpoint Constructors for netbenefit_()
Description
These helpers build the individual endpoint specifications that
netbenefit_() consumes through its endpoints argument. Each call
returns a small list describing one comparison rule; the order in
which they appear in the endpoints list defines the hierarchical
priority (first = highest).
Usage
nb_tte(
time,
event,
direction = c("longer_better", "shorter_better"),
margin = 0,
censor_rule = c("informative", "ignore")
)
nb_continuous(
value,
direction = c("larger_better", "smaller_better"),
margin = 0
)
nb_binary(value, direction = c("larger_better", "smaller_better"))
Arguments
time |
character. Name of the time column in |
event |
character. Name of the event-indicator column in |
direction |
one of |
margin |
non-negative numeric. Minimum directional gap required to declare a winner on this endpoint. Defaults to 0. |
censor_rule |
( |
value |
character. Name of the outcome column in |
Value
An object of class c("nb_endpoint_<type>", "nb_endpoint")
carrying the rule and its arguments.
Direction
For each endpoint type, direction says whether a larger value (or
longer time, for TTE) counts as a win for the subject. With the
default convention, a treatment subject "wins" the endpoint when its
value, possibly minus a margin, exceeds the control subject's value.
Flip the default for endpoints where smaller (or shorter) is better
(e.g., pain scores; time to symptom relief).
Margin
margin is a non-negative clinically meaningful difference threshold.
A pair only declares a winner when the directional gap strictly exceeds
margin; otherwise the pair ties at this endpoint and falls through to
the next one. nb_binary() does not expose margin because binary
values admit only two outcomes.
Censoring
nb_tte() accepts a censor_rule argument. The default
"informative" matches the conventional Pocock/Buyse rule: if one
subject is censored and the other observed, a winner can only be
declared in the direction the censoring is uninformative about (i.e.,
the censored time is already past the event time). "ignore" drops
pairs where either subject is censored and treats them as ties at this
level.
See Also
Creating Objects of Hierarchical Net-Benefit (Win-Difference) Statistic
Description
netbenefit_ creates an object to be passed into jointCovariance or
pated through its ... argument. The object defines a hierarchical
net-benefit (a.k.a. win-difference, proportion-in-favor) statistic
across a list of endpoints, each declared by nb_tte(),
nb_continuous(), or nb_binary().
Usage
netbenefit_(formula, endpoints, data_index = 1)
Arguments
formula |
a two-sided R formula |
endpoints |
a non-empty list of endpoint specs built by |
data_index |
integer. Index of the data frame in the |
Details
The estimator is
\widehat\Delta = \frac{N_W - N_L}{N_W + N_L + N_T}
where N_W, N_L, and N_T are the numbers of treatment
wins, losses, and overall ties across all n_C \times n_T pairs
(control vs. treatment subject). The per-subject influence function is
available in closed form, so both the asymptotic and the bootstrap paths
of jointCovariance are supported.
The arm reference level is inferred from the arm column the same way
model.matrix(~ arm) would: levels(arm)[1] for factor, the smaller
value for numeric or logical, and the alphabetically first value for
character. To override, convert the column to a factor with the desired
level order before calling netbenefit_().
Value
An object of class c("jc_spec_netbenefit", "jc_spec").
See Also
nb_tte(), nb_continuous(), nb_binary(), jointCovariance(),
pated().
OAK Phase III Trial of Atezolizumab vs Docetaxel in NSCLC
Description
Patient-level data (primary intention-to-treat population, n = 850) from OAK, an international, open-label, randomized Phase III trial (NCT02008227) comparing the anti-PD-L1 antibody atezolizumab (MPDL3280A, 1200 mg IV every 3 weeks) with docetaxel (75 mg/m^2 IV every 3 weeks) as second- or third-line therapy in patients with previously treated locally advanced or metastatic non-small-cell lung cancer (NSCLC). Patients were randomized 1:1 to the two arms.
The variables retained here are those released in Supplementary
Table 8 of Gandara et al. (2018), Nature Medicine, who
retrospectively assayed banked baseline plasma samples for
blood-based tumor mutational burden (bTMB) and reported the
biomarker-evaluable analyses on this primary ITT cohort. The shipped
copy applies the following purely cosmetic transformations to the
source: source variable names lower-cased; categorical values
lower-cased; the string "." (the source's missing code)
converted to NA; numeric-looking character columns coerced to
numeric; and the event indicators inverted so that 1 = event,
0 = censored (the source uses the opposite convention). Categorical
variables are stored as character vectors (not factors);
convert to factor on demand with factor() if a modeling
function requires it. No rows are dropped: the full primary ITT
population is included.
Authoritative reference. This documentation is a convenience summary; users should consult the original publication and its supplementary materials (Gandara et al. 2018; Rittmeyer et al. 2017) for the definitive description of the trial design, eligibility criteria, endpoint definitions, and assay protocol.
Format
A data frame with 850 observations on 27 variables:
idAnonymized patient identifier; integer 301–1150. (identifier)
armRandomized treatment assignment, character:
"docetaxel"or"atezolizumab". (treatment assignment)pfsProgression-free survival, months (numeric). (post-randomization outcome)
pfs_eventPFS event indicator, integer: 1 = progression or death, 0 = censored. Inverted from the source's
PFS.CNSR. (post-randomization outcome)osOverall survival, months (numeric). (post-randomization outcome)
os_eventOS event indicator, integer: 1 = death, 0 = censored. Inverted from the source's
OS.CNSR. (post-randomization outcome)bcorBest confirmed overall response per investigator (RECIST 1.1), character:
"CR", "PR", "SD", "PD", "NE";NAwhen not assessable. (post-randomization outcome)btmbBlood-based tumor mutational burden: count of somatic base substitutions on a 394-gene,
\sim1.1 Mb panel of cell-free DNA from the baseline plasma sample (numeric);NAwhen the assay was not run or failed QC. (baseline characteristic; predictive of atezolizumab benefit atbtmb \ge 16; not strongly prognostic in itself)msafMaximum somatic allele frequency in the baseline cfDNA sample (numeric, 0–1); a tumor-content / shedding surrogate. (baseline characteristic; weakly prognostic)
cfdna_ngcfDNA mass entering NGS library construction (nanograms, numeric, capped at 100 ng). (baseline characteristic; not prognostic — assay input)
coverageMedian exon sequencing coverage (numeric). (baseline characteristic; not prognostic — assay quality)
qcbTMB assay QC status, character:
"pass", "fail";NAwhen not run. (baseline characteristic; not prognostic — assay quality)bepLogical flag for the biomarker-evaluable population used in the bTMB analyses:
coverage >= 800 & qc == "pass" & msaf >= 0.01(n = 642 of 850). (baseline derived flag; not prognostic in itself)ageBaseline age, years (numeric); range 33–85. (baseline characteristic; modestly prognostic)
sexSex, character:
"F", "M". (baseline characteristic; weakly prognostic; correlates with smoking and bTMB)raceRace, character:
"white", "asian", "other". (baseline characteristic; not robustly prognostic in this setting)histologyTumor histology, character:
"non-squamous", "squamous". (baseline characteristic; modestly prognostic)ecogECOG performance status at baseline, integer (0 or 1). (baseline characteristic; strongly prognostic for PFS and OS)
prior_txNumber of prior lines of systemic therapy, integer (1 or 2). (baseline characteristic; prognostic — patients with 2 prior lines tend to do worse)
smokingSmoking history, character:
"never", "previous", "current". (baseline characteristic; modestly prognostic and also a correlate of IO benefit through bTMB)sldBaseline sum of longest diameters of target lesions per RECIST 1.1, mm (numeric). (baseline characteristic; strongly prognostic — tumor burden)
metsitesNumber of metastatic sites at enrollment, integer. (baseline characteristic; strongly prognostic — disease burden)
krasKRAS mutation status from CRF, character:
"negative", "positive";NAwhen not tested. (baseline characteristic; weakly / inconsistently prognostic; possible predictive signal for IO)egfrEGFR mutation status from CRF, character:
"negative", "positive";NAwhen not tested. (baseline characteristic; weakly prognostic in IO-treated populations; strongly predictive of differential treatment effect — EGFR-positive patients typically derive less benefit from anti-PD-L1)eml4EML4-ALK rearrangement status from CRF, character:
"negative", "positive";NAwhen not tested. (baseline characteristic; analogous toegfrin interpretation — predictive of reduced IO benefit)tc1ic1PD-L1 IHC (Ventana SP142) dichotomized at the TC1IC1 cut-point, character:
"low"=TC0 and IC0;"high"=TC1/2/3 or IC1/2/3;NAwhen unknown. (baseline characteristic; predictive of atezolizumab benefit; weakly prognostic)tc3ic3PD-L1 IHC dichotomized at the TC3IC3 cut-point, character:
"low"=TC0/1/2 and IC0/1/2;"high"=TC3 or IC3;NAwhen unknown. (baseline characteristic; predictive of atezolizumab benefit; weakly prognostic)
Details
OAK's trial-level eligibility criteria (locally advanced or metastatic
NSCLC after platinum failure, ECOG 0–1, measurable disease per
RECIST 1.1, no prior immune-checkpoint therapy, adequate organ
function, etc.) were applied before randomization, so the
n = 850 primary ITT cohort is randomized 1:1. The bTMB assay was
performed retrospectively on banked baseline plasma; all assay-derived
variables therefore reflect quantities present before treatment
exposure but were generated post-randomization in calendar time.
EGFR, EML4-ALK, and PD-L1 IHC are recorded at screening
and can be used to subset without compromising randomization.
The published bTMB analyses (PFS and OS hazard ratios at the
btmb >= 16 cut-point) used the subset
bep & !(egfr %in% "positive") & !(eml4 %in% "positive") (n = 583).
OAK published a primary ITT of 850 patients and a secondary ITT of
1225 patients; only the primary ITT is included here. Variant-level
information (per-mutation chromosome, position, gene, protein change,
etc.) was released alongside this table but is not included here.
Variable role labels
Each variable in the format section above carries a role tag in italic parentheses at the end of its description. The tags combine two dimensions:
-
Baseline characteristic — measured (or, for the bTMB assay, derived from material collected) at the baseline visit before the first dose of study treatment. Using such variables as covariates does not break randomization in expectation, because they are functions of pre-treatment quantities only.
-
Prognostic — based on the clinical-oncology literature for previously treated advanced NSCLC, whether the variable has been reported to be associated with PFS or OS independent of treatment assignment. “Predictive” indicates association with differential atezolizumab vs docetaxel benefit rather than with outcome per se. Strength qualifiers (strong / modest / weak) are the author's reading of the literature and are no substitute for a formal analysis on the dataset at hand.
Variables that are outcomes, identifiers, or treatment assignment carry a single role tag in place of the two-dimensional tag.
Pre- vs post-randomization variables
For users planning to apply PATED or any analysis that relies on baseline covariates being balanced across arms:
- Pre-randomization (collected at screening; safe to condition on):
age, sex, race, histology, ecog, prior_tx, smoking, sld, metsites, kras, egfr, eml4, tc1ic1, tc3ic3.- Pre-treatment but derived post-randomization (measured on the baseline blood draw, before first dose; empirically balanced across arms per the source paper's Supp. Tables 4 and 5):
btmb, msaf, cfdna_ng, coverage, qc, bep.- Post-randomization outcomes (must not be used as covariates):
pfs, pfs_event, os, os_event, bcor.
Conditioning on the biomarker-evaluable population
(bep == TRUE, n = 642) or further restricting to
!(egfr %in% "positive") & !(eml4 %in% "positive") (the paper's primary
bTMB analysis subgroup, n = 583) is a selection on pre-treatment
variables and does not break randomization in expectation; Supp.
Tables 4 and 5 of the source paper confirm empirical arm-balance
(no nominally significant imbalance on age, race, ECOG, histology,
prior lines, KRAS, or PD-L1 IHC; the btmb >= 16
subgroup is enriched for males and for smokers, but those are
biomarker–covariate associations, not arm–covariate associations).
Users should still verify arm-balance on their working subset (e.g.
with tableone) before invoking methods that assume it.
Source
Built from Supplementary Table 8 (Excel file MOESM3) of:
Gandara DR, Paul SM, Kowanetz M, Schleifman E, Zou W, Li Y,
Rittmeyer A, Fehrenbacher L, Otto G, Malboeuf C, Lieber DS, Lipson D,
Silterra J, Amler L, Riehl T, Cummings CA, Hegde PS, Sandler A,
Ballinger M, Fabrizio D, Mok T, Shames DS. Blood-based tumor
mutational burden as a predictor of clinical benefit in non-small-cell
lung cancer patients treated with atezolizumab. Nature Medicine
2018; 24(9):1441–1448. doi:10.1038/s41591-018-0134-3.
Supplementary file downloaded from
https://www.nature.com/articles/s41591-018-0134-3.
See data-raw/poplar_oak.R in the package source for the exact
cleaning steps. Redistribution in this package is for non-commercial
research and teaching use; users intending other use should verify
the licensing terms of the Springer Nature supplementary materials.
Always cite the original publication when using this dataset.
References
Rittmeyer A, Barlesi F, Waterkamp D, Park K, Ciardiello F, von Pawel J, Gadgeel SM, Hida T, Kowalski DM, Dols MC, Cortinovis DL, Leach J, Polikoff J, Barrios C, Kabbinavar F, Frontera OA, De Marinis F, Turna H, Lee JS, Ballinger M, Kowanetz M, He P, Chen DS, Sandler A, Gandara DR; OAK Study Group. Atezolizumab versus docetaxel in patients with previously treated non-small-cell lung cancer (OAK): a Phase 3, open-label, multicentre randomised controlled trial. Lancet 2017; 389(10066):255–265. doi:10.1016/S0140-6736(16)32517-X.
Examples
data(oak)
table(oak$arm, useNA = "ifany")
summary(oak$btmb)
# Primary bTMB analysis subset (paper's n = 583); %in% keeps NA rows
# (untested patients) since they are not confirmed positive.
oak_bep <- subset(oak, bep & !(egfr %in% "positive") & !(eml4 %in% "positive"))
Prognostic Variables Assisted Treatment Effect Detection
Description
pated is a wrapper function of jointCovariance for testing treatment effect
in randomized clinical trials. It assumes that prognostic variables are fully
randomized. This assumption can help enhancing statistical power of conventional
approaches in detecting the treatment effect. Specifically, the sensitivity
of the conventional models specified in ... are improved by pated.
Usage
pated(
...,
data,
nboot = 0,
compute_cov = TRUE,
seed = NULL,
transform = "identity"
)
Arguments
... |
model specifications built by |
data |
either a single data frame (when all models are fitted on
the same dataset) or a list of data frames (one entry per |
nboot |
non-zero integer if bootstrap is adopted. By default 0. |
compute_cov |
logic. If |
seed |
random seed when generate bootstrap data. |
transform |
character. Now only supports |
Value
a data frame of testing results.
Examples
## More examples can be found in the vignettes.
library(survival)
library(mvtnorm)
genData <- function(seed = NULL){
set.seed(seed)
n <- 300
sigma <- matrix(c(1, .6, .6, 1), 2)
x <- rmvnorm(n, sigma = sigma)
z1 <- rbinom(n, 1, .6)
z2 <- rnorm(n)
trt <- rbinom(n, 1, .5)
bet <- c(-.2, .2)
y <- -.5 + x %*% bet + z1 * .3 - z2 * .1 + .1 * trt - .1 * rnorm(n)
death <- rbinom(n, 1, .8)
data.frame(
y = as.numeric(y), trt = trt,
z1 = z1, z2 = z2,
x1 = x[, 1], x2 = x[, 2],
death, pid = paste0('s-', seq_len(n))
)
}
dat <- genData(seed = 31415926)
## `data_index` defaults to 1 in every spec constructor and a single
## data.frame is auto-wrapped into a list, so neither needs spelling out
## when all models are fitted on the same dataset.
fit <-
pated(
coxph_(Surv(time = y, event = death) ~ trt),
glm_(z1 ~ trt, family = 'binomial'),
glm_(z2 ~ trt, family = 'gaussian'),
glm_(x1 ~ trt, family = 'gaussian'),
glm_(x2 ~ trt, family = 'gaussian'),
data = dat
)
fit
Plot PATED Analysis Results
Description
Plot PATED Analysis Results
Usage
## S3 method for class 'pated'
plot(x, ...)
Arguments
x |
an object returned from |
... |
currently not supported. |
Value
NULL
POPLAR Phase II Trial of Atezolizumab vs Docetaxel in NSCLC
Description
Patient-level data (full intention-to-treat population, n = 287) from POPLAR, a multi-center, open-label, randomized Phase II trial (NCT01903993) comparing the anti-PD-L1 antibody atezolizumab (MPDL3280A, 1200 mg IV every 3 weeks) with docetaxel (75 mg/m^2 IV every 3 weeks) as second- or third-line therapy in patients with previously treated locally advanced or metastatic non-small-cell lung cancer (NSCLC). Patients were randomized 1:1 to the two arms.
The variables retained here are those released in Supplementary
Table 8 of Gandara et al. (2018), Nature Medicine, who
retrospectively assayed banked baseline plasma samples for
blood-based tumor mutational burden (bTMB). The shipped copy applies
the following purely cosmetic transformations to the source: source
variable names lower-cased; categorical values lower-cased; the
string "." (the source's missing code) converted to NA;
numeric-looking character columns coerced to numeric; and the event
indicators inverted so that 1 = event, 0 = censored (the source
uses the opposite convention). Categorical variables are stored as
character vectors (not factors); convert to factor on demand
with factor() if a modeling function requires it. No rows are
dropped: the full ITT population is included.
Authoritative reference. This documentation is a convenience summary; users should consult the original publication and its supplementary materials (Gandara et al. 2018; Fehrenbacher et al. 2016) for the definitive description of the trial design, eligibility criteria, endpoint definitions, and assay protocol.
Format
A data frame with 287 observations on 25 variables:
idAnonymized patient identifier; integer 1–287. (identifier)
armRandomized treatment assignment, character:
"docetaxel"or"atezolizumab". (treatment assignment)pfsProgression-free survival, months (numeric). (post-randomization outcome)
pfs_eventPFS event indicator, integer: 1 = progression or death, 0 = censored. Inverted from the source's
PFS.CNSR. (post-randomization outcome)osOverall survival, months (numeric). (post-randomization outcome)
os_eventOS event indicator, integer: 1 = death, 0 = censored. Inverted from the source's
OS.CNSR. (post-randomization outcome)bcorBest confirmed overall response per investigator (RECIST 1.1), character:
"CR", "PR", "SD", "PD", "NE";NAwhen not assessable. (post-randomization outcome)btmbBlood-based tumor mutational burden: count of somatic base substitutions on a 394-gene,
\sim1.1 Mb panel of cell-free DNA from the baseline plasma sample (numeric);NAwhen the assay was not run or failed QC. (baseline characteristic; predictive of atezolizumab benefit atbtmb \ge 16; not strongly prognostic in itself)msafMaximum somatic allele frequency in the baseline cfDNA sample (numeric, 0–1); a tumor-content / shedding surrogate. (baseline characteristic; weakly prognostic)
cfdna_ngcfDNA mass entering NGS library construction (nanograms, numeric, capped at 100 ng). (baseline characteristic; not prognostic — assay input)
coverageMedian exon sequencing coverage (numeric). (baseline characteristic; not prognostic — assay quality)
qcbTMB assay QC status, character:
"pass", "fail";NAwhen not run. (baseline characteristic; not prognostic — assay quality)bepLogical flag for the biomarker-evaluable population used in the bTMB analyses:
coverage >= 800 & qc == "pass" & msaf >= 0.01(n = 211 of 287). (baseline derived flag; not prognostic in itself)ageBaseline age, years (numeric); range 36–84. (baseline characteristic; modestly prognostic)
sexSex, character:
"F", "M". (baseline characteristic; weakly prognostic; correlates with smoking and bTMB)raceRace, character:
"white", "asian", "other". (baseline characteristic; not robustly prognostic in this setting)histologyTumor histology, character:
"non-squamous", "squamous". (baseline characteristic; modestly prognostic)ecogECOG performance status at baseline, integer (0 or 1). (baseline characteristic; strongly prognostic for PFS and OS)
prior_txNumber of prior lines of systemic therapy, integer (1 or 2). (baseline characteristic; prognostic — patients with 2 prior lines tend to do worse)
smokingSmoking history, character:
"never", "previous", "current". (baseline characteristic; modestly prognostic and also a correlate of IO benefit through bTMB)sldBaseline sum of longest diameters of target lesions per RECIST 1.1, mm (numeric). (baseline characteristic; strongly prognostic — tumor burden)
metsitesNumber of metastatic sites at enrollment, integer. (baseline characteristic; strongly prognostic — disease burden)
krasKRAS mutation status from CRF, character:
"negative", "positive";NAwhen not tested. (baseline characteristic; weakly / inconsistently prognostic; possible predictive signal for IO)egfrEGFR mutation status from CRF, character:
"negative", "positive";NAwhen not tested. A single T790M patient is coded"positive". (baseline characteristic; weakly prognostic in IO-treated populations; strongly predictive of differential treatment effect — EGFR-positive patients typically derive less benefit from anti-PD-L1)eml4EML4-ALK rearrangement status from CRF, character:
"negative", "positive";NAwhen not tested. (baseline characteristic; analogous toegfrin interpretation — predictive of reduced IO benefit)
Details
POPLAR's trial-level eligibility criteria (locally advanced or metastatic NSCLC after platinum failure, ECOG 0–1, measurable disease, no prior anti-PD-L1/PD-1 therapy, adequate organ function, etc.) were applied before randomization, so the n = 287 ITT cohort is randomized 1:1. The bTMB assay was performed retrospectively on banked baseline plasma; all assay-derived variables therefore reflect quantities present before treatment exposure, but were generated post-randomization in calendar time. EGFR and EML4-ALK mutation status are CRF variables collected at screening and can be used to subset without compromising randomization. Variant-level information (per-mutation chromosome, position, gene, protein change, etc.) was released alongside this table but is not included here; consult the source supplementary file if needed.
Variable role labels
Each variable in the format section above carries a role tag in italic parentheses at the end of its description. The tags combine two dimensions:
-
Baseline characteristic — measured (or, for the bTMB assay, derived from material collected) at the baseline visit before the first dose of study treatment. Using such variables as covariates does not break randomization in expectation, because they are functions of pre-treatment quantities only.
-
Prognostic — based on the clinical-oncology literature for previously treated advanced NSCLC, whether the variable has been reported to be associated with PFS or OS independent of treatment assignment. “Predictive” indicates association with differential atezolizumab vs docetaxel benefit rather than with outcome per se. Strength qualifiers (strong / modest / weak) are the author's reading of the literature and are no substitute for a formal analysis on the dataset at hand.
Variables that are outcomes, identifiers, or treatment assignment carry a single role tag in place of the two-dimensional tag.
Pre- vs post-randomization variables
For users planning to apply PATED or any analysis that relies on baseline covariates being balanced across arms:
- Pre-randomization (collected at screening; safe to condition on):
age, sex, race, histology, ecog, prior_tx, smoking, sld, metsites, kras, egfr, eml4.- Pre-treatment but derived post-randomization (measured on the baseline blood draw, before first dose; empirically balanced across arms per the source paper's Supp. Tables 3 and 5):
btmb, msaf, cfdna_ng, coverage, qc, bep.- Post-randomization outcomes (must not be used as covariates):
pfs, pfs_event, os, os_event, bcor.
Conditioning on the biomarker-evaluable population (bep == TRUE)
or further restricting to !(egfr %in% "positive") & !(eml4 %in% "positive")
is a selection on pre-treatment variables and does not break
randomization in expectation; Supp. Tables 3 and 5 of the source paper
confirm empirical arm-balance. Users should still verify arm-balance
on their working subset (e.g. with tableone) before invoking
methods that assume it.
Source
Built from Supplementary Table 8 (Excel file MOESM3) of:
Gandara DR, Paul SM, Kowanetz M, Schleifman E, Zou W, Li Y,
Rittmeyer A, Fehrenbacher L, Otto G, Malboeuf C, Lieber DS, Lipson D,
Silterra J, Amler L, Riehl T, Cummings CA, Hegde PS, Sandler A,
Ballinger M, Fabrizio D, Mok T, Shames DS. Blood-based tumor
mutational burden as a predictor of clinical benefit in non-small-cell
lung cancer patients treated with atezolizumab. Nature Medicine
2018; 24(9):1441–1448. doi:10.1038/s41591-018-0134-3.
Supplementary file downloaded from
https://www.nature.com/articles/s41591-018-0134-3.
See data-raw/poplar_oak.R in the package source for the exact
cleaning steps. Redistribution in this package is for non-commercial
research and teaching use; users intending other use should verify
the licensing terms of the Springer Nature supplementary materials.
Always cite the original publication when using this dataset.
References
Fehrenbacher L, Spira A, Ballinger M, Kowanetz M, Vansteenkiste J, Mazieres J, Park K, Smith D, Artal-Cortes A, Lewanski C, Braiteh F, Waterkamp D, He P, Zou W, Chen DS, Yi J, Sandler A, Rittmeyer A; POPLAR Study Group. Atezolizumab versus docetaxel for patients with previously treated non-small-cell lung cancer (POPLAR): a multicentre, open-label, Phase 2 randomised controlled trial. Lancet 2016; 387(10030):1837–1846. doi:10.1016/S0140-6736(16)00587-0.
Examples
data(poplar)
table(poplar$arm, useNA = "ifany")
summary(poplar$btmb)
# Biomarker-evaluable subset used in the published bTMB analyses
poplar_bep <- subset(poplar, bep)
Title Summarize an Analysis of Multiple Outcomes.
Description
Summarize an analysis of multiple outcomes.
Usage
## S3 method for class 'summary.jointCovariance'
print(x, ...)
Arguments
x |
an object returned by |
... |
for debugging only. |
Value
an invisible object.
Examples
## no example
Creating Objects of Group Quantile Differences
Description
quantile_ is a wrapper function creating an object that, for each
requested probability, computes the difference between the two arms'
sample quantiles of the outcome. The object is passed to jointCovariance
or pated through .... Because the empirical-quantile estimator has no
tractable closed-form score, this engine is available only through the
bootstrap path (nboot > 0).
Usage
quantile_(formula, probs = c(0.25, 0.5, 0.75), data_index = 1)
Arguments
formula |
a two-sided formula |
probs |
numeric vector of probabilities in |
data_index |
integer. Index of the data frame in the |
Generating Data for Simulation and Testing
Description
simulateMoData generates data for simulation and testing purposes.
Usage
simulateMoData(n = 500, hr = 0.8, seed = NULL)
Arguments
n |
an integer for total sample size of a randomized control trial of two arms. |
hr |
hazard ratio of treatment. |
seed |
random seed. By default |
Object Summaries
Description
summary method for class jointCovariance.
Usage
## S3 method for class 'jointCovariance'
summary(object, model_index = NULL, ...)
Arguments
object |
an object returned by |
model_index |
|
... |
for debugging only |
Value
a list
Calculate Variance-Covariance Matrix for a Fitted Model Object
Description
Returns the variance-covariance matrix of the main parameters of fitted model
objects. The "main" parameters of models correspond to those returned by coef.
Usage
## S3 method for class 'jointCovariance'
vcov(object, model_index = NULL, ...)
Arguments
object |
an object returned by |
model_index |
|
... |
for debugging only |
Value
a matrix of covariance of all estimates
Creating Objects of Hierarchical Log Win-Ratio Statistic
Description
winratio_ creates an object to be passed into jointCovariance or
pated through its ... argument. The object defines a hierarchical
win-ratio statistic across a list of endpoints, each declared by
nb_tte(), nb_continuous(), or nb_binary().
Usage
winratio_(formula, endpoints, data_index = 1)
Arguments
formula |
a two-sided R formula |
endpoints |
a non-empty list of endpoint specs built by |
data_index |
integer. Index of the data frame in the |
Details
The fitted coefficient is on the log scale:
\widehat\theta = \log(\widehat{WR})
= \log(\widehat\pi_W / \widehat\pi_L)
where \widehat\pi_W and \widehat\pi_L are the proportions of
all control-treatment pairs where the treatment subject wins or loses,
respectively. Tied pairs remain in the denominator of both proportions
but do not contribute to either numerator.
The raw win ratio can be recovered by exponentiating the coefficient. The log scale is used because it gives the standard large-sample Wald representation and maps the null win ratio of 1 to 0.
The arm reference level is inferred from the arm column the same way
model.matrix(~ arm) would: levels(arm)[1] for factor, the smaller
value for numeric or logical, and the alphabetically first value for
character. To override, convert the column to a factor with the desired
level order before calling winratio_().
Value
An object of class c("jc_spec_winratio", "jc_spec").
See Also
nb_tte(), nb_continuous(), nb_binary(), jointCovariance(),
pated().