# Agent $$\textit{001}$$

## 1. Introduction

Agent is an open source tool for fast genome-wide association studies (GWAS) using the agent file format from Compression for population genetic data through finite-state entropy. This tool can convert files to the agent format and perform univariate GWAS. This page contains changes for each version of agent, source code, build instructions, binary releases and other downloads including example data files, the license (BSD 2-clause) and the agent manual. If you make use of this tool, please cite W. Chen and L. T. Elliott. Compression for population genetic data through finite-state entropy. 2021. biorxiv preprint 10.1101/2021.02.17.431713v2.

All binaries are available at the git repository.

## 3. Changes

• June 1st 2021. Initial release of Agent 001. See downloads section for binary, source, and example data files.
• July 26th 2021. Updated Zstandard version from 1.4.4 to 1.5.0.

## 4. Building

See downloads section for the Agent 001 alpha source release. All sources including the latest development version are available at the git repository. Agent is currently built using the GNU make build system and depends on the presence of the Intel Math Kernel Library (MKL). Those familiar with the GNU make build system can consult the Makefile, and the path to the Intel MKL libraries can be given using the make flag --eval=MKLROOT:=.

## 5. Manual

Agent 001 build d63d4b1564, June 2021.

https://agent.engineering/

Copyright 2021, Winfield Chen and Lloyd T. Elliott.

USAGE

• agent --version Prints version and build information.
• agent --genotypes <GFILE> --convert <DESTINATION> Converts the genotype file specified by <GFILE> (in bgen genetic file format) to a new file named <DESTINATION> in the agent format (this file is overwritten if it already exists). The bgen file must not have a chromosome header column (i.e., it must be created with the -omit-chromosome flag) and must use 16-bit genotypes and zstd compression (i.e., it must be created with the flags -bgen-bits 16 and -bgen-compression zstd). For example, the command agent --genotypes m21-80-001.bgen --convert m21-80-001.a1 converts the example bgen data file m21-80-001.bgen provided above in the downloads section to the equivalent agent file m21-80-001.a1 which can be checked against the downloadable version also provided above.
• agent --genotypes <GFILE> --phenotypes <PFILE> <DESTINATION> Runs a GWAS using the genotype file specified by <GFILE> (in agent format) and the phenotype file specified by <PFILE> with output directory <DESTINATION>. <GFILE> must be in agent format and <PFILE> must be a space separated ASCII file containing one column per phenotype. The first line of <PFILE> must indicate phenotype names (for example, “V1 V2 V3 ...”) and the remainder of the lines must include one sample per line, with the phenotype measurements listed in the same order as the column names. The number of samples and the order of samples must be the same in <GFILE> and <PFILE>. The results of the GWAS are stored in the directory <DESTINATION> (this directory is created if it doesn't already exist). <DESTINATION> is populated with the following files (which are overwritten if they already exist): “beta.bin” (the effect sizes) “se.bin” (the effect sizes) “pval.bin” (the $$-\log_{10}$$ p-values) “tstat.bin” (the t-statistics) and “unpack” (a perl script to extract GWAS results). The “.bin” files are stored as serialized IEEE 64-bit doubles in little endian form with a phenotype-major organization (i.e., if N phenotypes are considered, the first N 64-bit serialized IEEE 64-bit doubles are the values for the associations with the first genetic variant in the order specified by <PFILE> and the next N are for the second genetic variant, and so on). If the unpack command is invoked on the shell without arguments, then an ASCII version of all of the “.bin” files are printed to the standard output, with columns grouped by phenotype. unpack supports several command line options to specify which “.bin” files to output, and which genetic variants to output. Note that unpack can be invoked using a relative path and will determine the location of <DESTINATION> based on unpack's absolute path. Missing values in <PFILE> are not currently supported.

• unpack [--genotypes <STRING>] [--beta] [--se] [--pval] [--tstat] [--real] [--precision <NUMBER>] [--scientific] [--threshold <NUMBER>]

• --genotypes <STRING> Only outputs columns for genotypes specified by the indices in <STRING>. The indicies are 1-based (i.e., 1 indicates the first genetic variant in <GFILE>). <STRING> may be a natural number, or a dash separated range, or a comma separated list of natural numbers and dash separated ranges. A dash separated range must be of the form “A-B” where “A” is a natural number and ”B” is a natural number or the empty string. The empty string for “B” denotes the last genotype. Dash separated ranges are inclusive, and if “B” is a natural number, then it must be greater than “A”. The numbers or ranges specified in the dash separated list must be in increasing order. If this flag is not provided, then all genetic variants are output.
• --beta, --se, --pval, --tstat Only output specified summary statistics (effect size, standard error, $$-\log_{10}$$ p-value, respectively). If any one of these flags are present, then only the types of summary statistics listed in the flags are output and no other types of summary statistics are output. If none of these flags are present, then all summary statistics are output.
• --real Outputs real p-values instead of $$-\log_{10}$$ p-values.
• --precision <NUMBER> Output summary statistics using a number of significant digits given by the natural number <NUMBER>. --scientific Output summary statistics using scientific notation. --threshold <NUMBER> Only output lines for genetic variants that have at least one phenotype with $$-\log_{10}$$ p-value greater than the positive real --threshold <NUMBER>. If this flag is provided, then the columns of the output are proceeded by a column named “INDEX” giving the index of the genetic variant.