“One object to store them all, one object to find them, one object to query from and with ggplot2 visualize them.” (Lord Sauron)
pagoo is an encapsulated, object-oriented class system for analyzing bacterial pangenomes. It uses the R6 package as backend. It was designed in order to facilitate and speed-up the comparative analysis of multiple bacterial genomes, standardizing and optimizing routine tasks performed everyday. There are a handful of things done everyday when working with bacterial pangenomes: subset, summarize, extract, visualize and store data. So,
pagoo is intended to facilitate these tasks as much as possible.
The main idea behind pagoo is that, once you have reconstructed a pangenome, all the information and basic methods are embedded into a single object. To query this object, you simply use the
$ symbol as if where a named
list in R.
This idea is common in other object-oriented programming languages, but not in R for final users. We have exploited R6 package to take advantage of this kind of programming and make it available to the R pangenomics community.
pagoo is composed by three
R6 classes, each one more complex than the other, and that the more basic are inherited by the more complex ones. The most basic one,
PgR6, contains basic subset methods and data manipulation functions. The second one,
PgR6M inherits all the methods and fields from the previous one, and incorporates statistical and visualization methods. The last one,
PgR6MS, inherits from
PgR6M all its capabilities, and adds methods to manipulate DNA sequences.
A quick look to a
pagoo object print can give us some clues of how is it composed:
<PgR6MS> : <PgR6M> Inherits from: Public: function (map = "org", data) add_metadata: function (raref, pcounts = 10, ...) cg_exp_decay_fit: function (deep = FALSE) clone: active binding cloud_clusters: active binding cloud_genes: active binding cloud_sequences: active binding clusters: active binding core_clusters: active binding core_genes: active binding core_level: function (max_per_org = 1, fill = TRUE) core_seqs_4_phylo: active binding core_sequences: function (method = "bray", binary = FALSE, diag = FALSE, upper = FALSE, dist: function (x) drop: active binding dropped: active binding genes: function () gg_barplot: function () gg_binmap: function (what = c("pangenome", "coregenome"), ...) gg_curves: function (method = "bray", ...) gg_dist: function (colour = NULL, ...) gg_pca: function () gg_pie: function (data, org_meta, cluster_meta, core_level = 95, sep = "__", initialize: active binding organisms: active binding pan_matrix: function (center = TRUE, scale. = FALSE, ...) pan_pca: function (raref, ...) pg_power_law_fit: active binding random_seed: function (what = "pangenome", n.perm = 10) rarefact: function (x) recover: function () runShinyApp: function (file = "pangenome.rds", seqs.if.avail = TRUE) save_pangenomeRDS: active binding sequences: active binding shell_clusters: active binding shell_genes: active binding shell_sequences: active binding summary_stats: function (dir = "pangenome", force = FALSE) write_pangenome: Private: DataFrame .clusters: DataFrame .data: NULL .dropped: 95 .level: DataFrame .organisms: 1 1 1 1 1 1 1 1 1 0 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 ... .panmatrix: __ .sep: DNAStringSet .sequences: package_version, numeric_version version
You can see that basically there are public and private fields/methods. In private, all raw data is stored. You will not have easy access to it. Instead, you will be able to access public functions and active bindings. Active bindings are functions that behave as it were variables, in this case they are querying private information and returning it to the user in a convenient way. Public methods only “see” active bindings, so their results depends on the state of the pangenome object. For instance, some organisms can be hidden from the dataset, or thresholds can be set actively, changing the results retrieved by the public methods.
Pagoo works over the pangenome after it has been built with any pangenome reconstruction software. So, you can do it with the software of your preference. Despite we recommend
Pewit, our own pangenome reconstruction software,
pagoo can read-in the output of the most popular pangenome reconstruction software,
roary, and also we are planning to give support to others like
pagoo also runs a
Shiny application that provides reactive interaction with the data and facilitates handling and visualization.
pagoo is available at CRAN:
Alternatively you can install the latest dev version from GitHub using
if (!require("devtools")) install.packages("devtools") devtools::install_github('iferres/pagoo')
All three classes are documented. You can access R help pages as with any other function:
but as R6 classes documentation is still not standardized, we recommend you to use the
pagoo::pagoo() function to read the documentation from, and to to use it also instead of the raw classes.