R/rapfunc.R
rapidopgs_single.Rd
'rapidopgs_single
computes PGS from a from GWAS summary statistics using posteriors from Wakefield's approximate Bayes Factors
rapidopgs_single(
data,
N = NULL,
trait = c("cc", "quant"),
build = "hg19",
pi_i = 1e-04,
sd.prior = if (trait == "quant") {
0.15
} else {
0.2
},
filt_threshold = NULL,
recalc = TRUE,
reference = NULL
)
a data.table containing GWAS summary statistic dataset with all required information.
a scalar representing the sample in the study, or a string indicating the column name containing it. Required for quantitative traits only.
a string specifying if the dataset corresponds to a case-control ("cc") or a quantitative trait ("quant") GWAS. If trait = "quant", an ALT_FREQ column is required.
a string containing the genome build of the dataset, either "hg19" (for hg19/GRCh37) or "hg38" (hg38/GRCh38). DEFAULT "hg19".
a scalar representing the prior probability (DEFAULT: \(1 \times 10^{-4}\)).
the prior specifies that BETA at causal SNPs follows a centred normal distribution with standard deviation sd.prior. Sensible and widely used DEFAULTs are 0.2 for case control traits, and 0.15 * var(trait) for quantitative (selected if trait == "quant").
a scalar indicating the ppi threshold (if
filt_threshold
< 1) or the number of top SNPs by absolute
weights (if filt_threshold
>= 1) to filter the dataset
after PGS computation. If NULL (DEFAULT), no thresholding will
be applied.
a logical indicating if weights should be
recalculated after thresholding. Only relevant if filt_threshold
is defined.
a string indicating the path of the reference file SNPs should be filtered and aligned to, see Details.
a data.table containing the formatted sumstats dataset with computed PGS weights.
This function will take a GWAS summary statistic dataset as an input, will assign align it to a reference panel file (if provided), then it will assign SNPs to LD blocks and compute Wakefield's ppi by LD block, then will use it to generate PGS weights by multiplying those posteriors by effect sizes (\(\beta\)). Optionally, it will filter SNPs by a custom filter on ppi and then recalculate weights, to improve accuracy.
Alternatively, if filt_threshold is larger than one, RapidoPGS will select the top
filt_threshold
SNPs by absolute weights (note, not ppi but weights).
The GWAS summary statistics file to compute PGS using our method must contain the following minimum columns, with these exact column names:
Chromosome
Base position (in GRCh37/hg19 or GRCh38/hg38). If using hg38, use build = "hg38" in parameters
Reference, or non-effect allele
Alternative, or effect allele, the one \(\beta\) refers to
Minor/ALT allele frequency in the tested population, or in a close population from a reference panel. Required for Quantitative traits only
\(\beta\) (or log(OR)), or effect sizes
standard error of \(\beta\)
If a reference is provided, it should have 5 columns: CHR, BP, SNPID, REF, and ALT. Also, it should be in the same build as the summary statistics. In both files, column order does not matter.
sumstats <- data.table(SNPID=c("rs139096444","rs3843766","rs61977545", "rs544733737",
"rs2177641", "rs183491817", "rs72995775","rs78598863", "rs1411315"),
CHR=c(4,20,14,2,4,6,6,21,13),
BP=c(1479959, 13000913, 29107209, 203573414, 57331393, 11003529, 149256398,
25630085, 79166661),
REF=c("C","C","C","T","G","C","C","G","T"),
ALT=c("A","T","T","A","A","A","T","A","C"),
BETA=c(0.012,0.0079,0.0224,0.0033,0.0153,0.058,0.0742,0.001,-0.0131),
SE=c(0.0099,0.0066,0.0203,0.0171,0.0063,0.0255,0.043,0.0188,0.0074))
PGS <- rapidopgs_single(sumstats, trait = "cc")
#> Assigning LD blocks...
#> Done!
#> Computing a RapidoPGS-single model for a case-control dataset...