PgR6 with Methods. Final users should use pagoo
instead of this, since is more easy to understand.
Inherits: PgR6
Super class
pagoo::PgR6
-> PgR6M
Methods
Method new()
Create a PgR6M
object.
Usage
PgR6M$new(
data,
org_meta,
cluster_meta,
core_level = 95,
sep = "__",
verbose = TRUE,
DF,
group_meta
)
Arguments
data
A
data.frame
orDataFrame
containing at least the following columns:gene
(gene name),org
(organism name to which the gene belongs to), andcluster
(group of orthologous to which the gene belongs to). More columns can be added as metadata for each gene.org_meta
(optional) A
data.frame
orDataFrame
containing additional metadata for organisms. Thisdata.frame
must have a column named "org" with valid organisms names (that is, they should match with those provided indata
, columnorg
), and additional columns will be used as metadata. Each row should correspond to each organism.cluster_meta
(optional) A
data.frame
orDataFrame
containing additional metadata for clusters. Thisdata.frame
must have a column named "cluster" with valid organisms names (that is, they should match with those provided indata
, columncluster
), and additional columns will be used as metadata. Each row should correspond to each cluster.core_level
The initial core_level (that's the percentage of organisms a core cluster must be in to be considered as part of the core genome). Must be a number between 100 and 85, (default: 95). You can change it later by using the
$core_level
field once the object was created.sep
A separator. By default is '__'(two underscores). It will be used to create a unique
gid
(gene identifier) for each gene.gid
s are created by pastingorg
togene
, separated bysep
.verbose
logical
. Whether to display progress messages when loading class.DF
Deprecated. Use
data
instead.group_meta
Deprecated. Use
cluster_meta
instead.
Method rarefact()
Rarefact pangenome or corgenome. Compute the number of genes which belong to the pangenome or to the coregenome, for a number of random permutations of increasingly bigger sample of genomes.
Method dist()
Compute distance between all pairs of genomes. The default dist method is
"bray"
(Bray-Curtis distance). Another used distance method is "jaccard"
,
but you should set binary = FALSE
(see below) to obtain a meaningful result.
See vegdist
for details, this is just a wrapper function.
Usage
PgR6M$dist(
method = "bray",
binary = FALSE,
diag = FALSE,
upper = FALSE,
na.rm = FALSE,
...
)
Arguments
method
The distance method to use. See vegdist for available methods, and details for each one.
binary
Transform abundance matrix into a presence/absence matrix before computing distance.
diag
Compute diagonals.
upper
Return only the upper diagonal.
na.rm
Pairwise deletion of missing observations when computing dissimilarities.
...
Other parameters. See vegdist for details.
Method pan_pca()
Performs a principal components analysis on the panmatrix
Arguments
center
a logical value indicating whether the variables should be shifted to be zero centered. Alternately, a vector of length equal the number of columns of x can be supplied. The value is passed to scale.
scale.
a logical value indicating whether the variables should be scaled to have unit variance before the analysis takes place. The default is TRUE.
...
Other arguments. See prcomp
Returns
Returns a list with class "prcomp". See prcomp for more information.
Method pg_power_law_fit()
Fits a power law curve for the pangenome rarefaction simulation.
Method cg_exp_decay_fit()
Fits an exponential decay curve for the coregenome rarefaction simulation.
Arguments
raref
(Optional) A rarefaction matrix, as returned by
rarefact()
.pcounts
An integer of pseudo-counts. This is used to better fit the function at small numbers, as the linearization method requires to subtract a constant C, which is the coregenome size, from
y
. Asy
becomes closer to the coregenome size, this operation tends to 0, and its logarithm goes crazy. By defaultpcounts=10
....
Further arguments to be passed to
rarefact()
. Ifraref
is missing, it will be computed with default arguments, or with the ones provided here.
Method gg_binmap()
Plot a pangenome binary map representing the presence/absence of each gene within each organism.
Returns
A binary map (ggplot2::geom_raster()
), and a gg
object (ggplot2
package) invisibly.
Method gg_dist()
Plot a heatmap showing the computed distance between all pairs of organisms.
Arguments
method
Distance method. One of "Jaccard" (default), or "Manhattan", see above.
...
More arguments to be passed to
distManhattan
.
Returns
A heatmap (ggplot2::geom_tile()
), and a gg
object (ggplot2
package) invisibly.
Method gg_pca()
Plot a scatter plot of a Principal Components Analysis.
Arguments
colour
The name of the column in
$organisms
field from which points will take colour (if provided).NULL
(default) renders black points....
More arguments to be passed to
ggplot2::autoplot()
.
Returns
A scatter plot (ggplot2::autoplot()
), and a gg
object (ggplot2
package) invisibly.
Method gg_pie()
Plot a pie chart showing the number of clusters of each pangenome category: core, shell, or cloud.
Method gg_curves()
Plot pangenome and/or coregenome curves with the fitted functions returned by pg_power_law_fit()
and cg_exp_decay_fit()
. You can add points by adding + geom_points()
, of ggplot2 package
Usage
PgR6M$gg_curves(what = c("pangenome", "coregenome"), ...)
Method runShinyApp()
Launch an interactive shiny app. It contains a sidebar with controls and switches to interact with the pagoo object. You can drop/recover organisms from the dataset, modify the core_level, visualize statistics, plots, and browse cluster and gene information. In the main body, it contains 2 tabs to switch between summary statistics plots and core genome information on one side, and accessory genome plots and information on the other.
The lower part of each tab contains two tables, side by side. On the "Summary" tab, the left one contain information about core clusters, with one cluster per row. When one of them is selected (click), the one on the right is updated to show information about its genes (if provided), one gene per row. On the "Accessory" tab, a similar configuration is shown, but on this case only accessory clusters/genes are displayed. There is a slider on the sidebar where one can select the accessory frequency range to display.
Give it a try!
Take into account that big pangenomes can slow down the performance of the app. More than 50-70 organisms often leads to a delay in the update of the plots/tables.