`R/rapfunc.R`

`rapidopgs_single.Rd`

'`rapidopgs_single`

computes PGS from a from GWAS summary statistics using posteriors from Wakefield's approximate Bayes Factors

```
rapidopgs_single(
data,
N = NULL,
trait = c("cc", "quant"),
build = "hg19",
pi_i = 1e-04,
sd.prior = if (trait == "quant") {
0.15
} else {
0.2
},
filt_threshold = NULL,
recalc = TRUE,
reference = NULL
)
```

- data
a data.table containing GWAS summary statistic dataset with all required information.

- N
a scalar representing the sample in the study, or a string indicating the column name containing it. Required for quantitative traits only.

- trait
a string specifying if the dataset corresponds to a case-control ("cc") or a quantitative trait ("quant") GWAS. If trait = "quant", an ALT_FREQ column is required.

- build
a string containing the genome build of the dataset, either "hg19" (for hg19/GRCh37) or "hg38" (hg38/GRCh38). DEFAULT "hg19".

- pi_i
a scalar representing the prior probability (DEFAULT: \(1 \times 10^{-4}\)).

- sd.prior
the prior specifies that BETA at causal SNPs follows a centred normal distribution with standard deviation sd.prior. Sensible and widely used DEFAULTs are 0.2 for case control traits, and 0.15 * var(trait) for quantitative (selected if trait == "quant").

- filt_threshold
a scalar indicating the ppi threshold (if

`filt_threshold`

< 1) or the number of top SNPs by absolute weights (if`filt_threshold`

>= 1) to filter the dataset after PGS computation. If NULL (DEFAULT), no thresholding will be applied.- recalc
a logical indicating if weights should be recalculated after thresholding. Only relevant if

`filt_threshold`

is defined.- reference
a string indicating the path of the reference file SNPs should be filtered and aligned to, see Details.

a data.table containing the formatted sumstats dataset with computed PGS weights.

This function will take a GWAS summary statistic dataset as an input, will assign align it to a reference panel file (if provided), then it will assign SNPs to LD blocks and compute Wakefield's ppi by LD block, then will use it to generate PGS weights by multiplying those posteriors by effect sizes (\(\beta\)). Optionally, it will filter SNPs by a custom filter on ppi and then recalculate weights, to improve accuracy.

Alternatively, if filt_threshold is larger than one, RapidoPGS will select the top
`filt_threshold`

SNPs by absolute weights (note, not ppi but weights).

The GWAS summary statistics file to compute PGS using our method must contain the following minimum columns, with these exact column names:

- CHR
Chromosome

- BP
Base position (in GRCh37/hg19 or GRCh38/hg38). If using hg38, use build = "hg38" in parameters

- REF
Reference, or non-effect allele

- ALT
Alternative, or effect allele, the one \(\beta\) refers to

- ALT_FREQ
Minor/ALT allele frequency in the tested population, or in a close population from a reference panel. Required for Quantitative traits only

- BETA
\(\beta\) (or log(OR)), or effect sizes

- SE
standard error of \(\beta\)

If a reference is provided, it should have 5 columns: CHR, BP, SNPID, REF, and ALT. Also, it should be in the same build as the summary statistics. In both files, column order does not matter.

```
sumstats <- data.table(SNPID=c("rs139096444","rs3843766","rs61977545", "rs544733737",
"rs2177641", "rs183491817", "rs72995775","rs78598863", "rs1411315"),
CHR=c(4,20,14,2,4,6,6,21,13),
BP=c(1479959, 13000913, 29107209, 203573414, 57331393, 11003529, 149256398,
25630085, 79166661),
REF=c("C","C","C","T","G","C","C","G","T"),
ALT=c("A","T","T","A","A","A","T","A","C"),
BETA=c(0.012,0.0079,0.0224,0.0033,0.0153,0.058,0.0742,0.001,-0.0131),
SE=c(0.0099,0.0066,0.0203,0.0171,0.0063,0.0255,0.043,0.0188,0.0074))
PGS <- rapidopgs_single(sumstats, trait = "cc")
#> Assigning LD blocks...
#> Done!
#> Computing a RapidoPGS-single model for a case-control dataset...
```