“One object to store them all, one object to find them, one object to query from and with ggplot2 visualize them.” (Lord Sauron)
Introduction
pagoo
is an encapsulated, object-oriented class system for analyzing bacterial pangenomes. It uses the R6 package as backend. It was designed in order to facilitate and speed-up the comparative analysis of multiple bacterial genomes, standardizing and optimizing routine tasks performed everyday. There are a handful of things done everyday when working with bacterial pangenomes: subset, summarize, extract, visualize and store data. So, pagoo
is intended to facilitate these tasks as much as possible.
Philosophy
The main idea behind pagoo is that, once you have reconstructed a pangenome, all the information and basic methods are embedded into a single object. To query this object, you simply use the $
symbol as if where a named list
in R.
This idea is common in other object-oriented programming languages, but not in R for final users. We have exploited R6 package to take advantage of this kind of programming and make it available to the R pangenomics community.
pagoo
is composed by three R6
classes, each one more complex than the other, and that the more basic are inherited by the more complex ones. The most basic one, PgR6
, contains basic subset methods and data manipulation functions. The second one, PgR6M
inherits all the methods and fields from the previous one, and incorporates statistical and visualization methods. The last one, PgR6MS
, inherits from PgR6M
all its capabilities, and adds methods to manipulate DNA sequences.
A quick look to a pagoo
object print can give us some clues of how is it composed:
<PgR6MS>
: <PgR6M>
Inherits from:
Public: function (map = "org", data)
add_metadata: function (raref, pcounts = 10, ...)
cg_exp_decay_fit: function (deep = FALSE)
clone: active binding
cloud_clusters: active binding
cloud_genes: active binding
cloud_sequences: active binding
clusters: active binding
core_clusters: active binding
core_genes: active binding
core_level: function (max_per_org = 1, fill = TRUE)
core_seqs_4_phylo: active binding
core_sequences: function (method = "bray", binary = FALSE, diag = FALSE, upper = FALSE,
dist: function (x)
drop: active binding
dropped: active binding
genes: function ()
gg_barplot: function ()
gg_binmap: function (what = c("pangenome", "coregenome"), ...)
gg_curves: function (method = "bray", ...)
gg_dist: function (colour = NULL, ...)
gg_pca: function ()
gg_pie: function (data, org_meta, cluster_meta, core_level = 95, sep = "__",
initialize: active binding
organisms: active binding
pan_matrix: function (center = TRUE, scale. = FALSE, ...)
pan_pca: function (raref, ...)
pg_power_law_fit: active binding
random_seed: function (what = "pangenome", n.perm = 10)
rarefact: function (x)
recover: function ()
runShinyApp: function (file = "pangenome.rds", seqs.if.avail = TRUE)
save_pangenomeRDS: active binding
sequences: active binding
shell_clusters: active binding
shell_genes: active binding
shell_sequences: active binding
summary_stats: function (dir = "pangenome", force = FALSE)
write_pangenome:
Private: DataFrame
.clusters: DataFrame
.data: NULL
.dropped: 95
.level: DataFrame
.organisms: 1 1 1 1 1 1 1 1 1 0 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 ...
.panmatrix: __
.sep: DNAStringSet
.sequences: package_version, numeric_version version
You can see that basically there are public and private fields/methods. In private, all raw data is stored. You will not have easy access to it. Instead, you will be able to access public functions and active bindings. Active bindings are functions that behave as it were variables, in this case they are querying private information and returning it to the user in a convenient way. Public methods only “see” active bindings, so their results depends on the state of the pangenome object. For instance, some organisms can be hidden from the dataset, or thresholds can be set actively, changing the results retrieved by the public methods.
Pangenome reconstruction
Pagoo works over the pangenome after it has been built with any pangenome reconstruction software. So, you can do it with the software of your preference. Despite we recommend Pewit
, our own pangenome reconstruction software, pagoo
can read-in the output of the most popular pangenome reconstruction software, roary
, and also we are planning to give support to others like Panaroo
, PanX
and PIRATE
. pagoo
also runs a Shiny
application that provides reactive interaction with the data and facilitates handling and visualization.
Installation
pagoo
is available at CRAN:
install.packages("pagoo")
Alternatively you can install the latest dev version from GitHub using devtools
:
if (!require("devtools")) install.packages("devtools")
devtools::install_github('iferres/pagoo')
Help
All three classes are documented. You can access R help pages as with any other function:
but as R6 classes documentation is still not standardized, we recommend you to use the pagoo::pagoo()
function to read the documentation from, and to to use it also instead of the raw classes.
?pagoo