'rapidopgs_single computes PGS from a from GWAS summary statistics using posteriors from Wakefield's approximate Bayes Factors

rapidopgs_single(
  data,
  N = NULL,
  trait = c("cc", "quant"),
  build = "hg19",
  pi_i = 1e-04,
  sd.prior = if (trait == "quant") {
     0.15
 } else {
     0.2
 },
  filt_threshold = NULL,
  recalc = TRUE,
  reference = NULL
)

Arguments

data

a data.table containing GWAS summary statistic dataset with all required information.

N

a scalar representing the sample in the study, or a string indicating the column name containing it. Required for quantitative traits only.

trait

a string specifying if the dataset corresponds to a case-control ("cc") or a quantitative trait ("quant") GWAS. If trait = "quant", an ALT_FREQ column is required.

build

a string containing the genome build of the dataset, either "hg19" (for hg19/GRCh37) or "hg38" (hg38/GRCh38). DEFAULT "hg19".

pi_i

a scalar representing the prior probability (DEFAULT: \(1 \times 10^{-4}\)).

sd.prior

the prior specifies that BETA at causal SNPs follows a centred normal distribution with standard deviation sd.prior. Sensible and widely used DEFAULTs are 0.2 for case control traits, and 0.15 * var(trait) for quantitative (selected if trait == "quant").

filt_threshold

a scalar indicating the ppi threshold (if filt_threshold < 1) or the number of top SNPs by absolute weights (if filt_threshold >= 1) to filter the dataset after PGS computation. If NULL (DEFAULT), no thresholding will be applied.

recalc

a logical indicating if weights should be recalculated after thresholding. Only relevant if filt_threshold is defined.

reference

a string indicating the path of the reference file SNPs should be filtered and aligned to, see Details.

Value

a data.table containing the formatted sumstats dataset with computed PGS weights.

Details

This function will take a GWAS summary statistic dataset as an input, will assign align it to a reference panel file (if provided), then it will assign SNPs to LD blocks and compute Wakefield's ppi by LD block, then will use it to generate PGS weights by multiplying those posteriors by effect sizes (\(\beta\)). Optionally, it will filter SNPs by a custom filter on ppi and then recalculate weights, to improve accuracy.

Alternatively, if filt_threshold is larger than one, RapidoPGS will select the top filt_threshold SNPs by absolute weights (note, not ppi but weights).

The GWAS summary statistics file to compute PGS using our method must contain the following minimum columns, with these exact column names:

CHR

Chromosome

BP

Base position (in GRCh37/hg19 or GRCh38/hg38). If using hg38, use build = "hg38" in parameters

REF

Reference, or non-effect allele

ALT

Alternative, or effect allele, the one \(\beta\) refers to

ALT_FREQ

Minor/ALT allele frequency in the tested population, or in a close population from a reference panel. Required for Quantitative traits only

BETA

\(\beta\) (or log(OR)), or effect sizes

SE

standard error of \(\beta\)

If a reference is provided, it should have 5 columns: CHR, BP, SNPID, REF, and ALT. Also, it should be in the same build as the summary statistics. In both files, column order does not matter.

Author

Guillermo Reales, Chris Wallace

Examples

sumstats <- data.table(SNPID=c("rs139096444","rs3843766","rs61977545", "rs544733737",
    "rs2177641", "rs183491817", "rs72995775","rs78598863", "rs1411315"), 
    CHR=c(4,20,14,2,4,6,6,21,13), 
    BP=c(1479959, 13000913, 29107209, 203573414, 57331393, 11003529, 149256398, 
        25630085, 79166661), 
    REF=c("C","C","C","T","G","C","C","G","T"), 
    ALT=c("A","T","T","A","A","A","T","A","C"), 
    BETA=c(0.012,0.0079,0.0224,0.0033,0.0153,0.058,0.0742,0.001,-0.0131),
    SE=c(0.0099,0.0066,0.0203,0.0171,0.0063,0.0255,0.043,0.0188,0.0074))

PGS  <- rapidopgs_single(sumstats,  trait = "cc")
#> Assigning LD blocks...
#> Done!
#> Computing a RapidoPGS-single model for a case-control dataset...