PgR6 basic class

A basic PgR6 class constructor. It contains basic fields and subset functions to handle a pangenome. Final users should use pagoo instead of this, since is more easy to understand.

Active bindings

pan_matrix: The panmatrix. Rows are organisms, and columns are groups of orthologous. Cells indicates the presence (>=1) or absence (0) of a given gene, in a given organism. Cells can have values greater than 1 if contain in-paralogs.
organisms: A DataFrame with available organism names, and organism number identifier as rownames(). (Dropped organisms will not be displayed in this field, see $dropped below). Additional metadata will be shown if provided, as additional columns.
genes: A SplitDataFrameList object with one entry per cluster. Each element contains a DataFrame with gene ids (<gid>) and additional metadata, if provided. gid are created by pasteing organism and gene names, so duplication in gene names are avoided.
clusters: A DataFrame with the groups of orthologous (clusters). Additional metadata will be shown as additional columns, if provided before. Each row corresponds to each cluster.
core_level: The percentage of organisms a gene must be in to be considered as part of the coregenome. core_level = 95 by default. Can't be set above 100, and below 85 raises a warning.
core_genes: Like genes, but only showing core genes.
core_clusters: Like $clusters, but only showing core clusters.
cloud_genes: Like genes, but only showing cloud genes. These are defined as those clusters which contain a single gene (singletons), plus those which have more than one but its organisms are probably clonal due to identical general gene content. Colloquially defined as strain-specific genes.
cloud_clusters: Like $clusters, but only showing cloud clusters as defined above.
shell_genes: Like genes, but only showing shell genes. These are defined as those clusters than don't belong neither to the core genome, nor to cloud genome. Colloquially defined as genes that are present in some but not all strains, and that aren't strain-specific.
shell_clusters: Like $clusters, but only showing shell clusters, as defined above.
summary_stats: A DataFrame with information about the number of core, shell, and cloud clusters, as well as the total number of clusters.
random_seed: The last .Random.seed. Used for reproducibility purposes only.
dropped: A character vector with dropped organism names, and organism number identifier as names()

Methods

Method `new()`

A basic PgR6 class constructor. It contains basic fields and subset functions to handle a pangenome.

Usage

PgR6$new(
  data,
  org_meta,
  cluster_meta,
  core_level = 95,
  sep = "__",
  verbose = TRUE,
  DF,
  group_meta
)

Arguments

data: A data.frame or DataFrame containing at least the following columns: gene (gene name), org (organism name to which the gene belongs to), and cluster (group of orthologous to which the gene belongs to). More columns can be added as metadata for each gene.
org_meta: (optional) A data.frame or DataFrame containing additional metadata for organisms. This data.frame must have a column named "org" with valid organisms names (that is, they should match with those provided in data, column org), and additional columns will be used as metadata. Each row should correspond to each organism.
cluster_meta: (optional) A data.frame or DataFrame containing additional metadata for clusters. This data.frame must have a column named "cluster" with valid organisms names (that is, they should match with those provided in data, column cluster), and additional columns will be used as metadata. Each row should correspond to each cluster.
core_level: The initial core_level (that's the percentage of organisms a core cluster must be in to be considered as part of the core genome). Must be a number between 100 and 85, (default: 95). You can change it later by using the $core_level field once the object was created.
sep: A separator. By default is '__'(two underscores). It will be used to create a unique gid (gene identifier) for each gene. gids are created by pasting org to gene, separated by sep.
verbose: logical. Whether to display progress messages when loading class.
DF: Deprecated. Use data instead.
group_meta: Deprecated. Use cluster_meta instead.

Returns

An R6 object of class PgR6. It contains basic fields and methods for analyzing a pangenome.

Method `add_metadata()`

Add metadata to the object. You can add metadata to each organism, to each group of orthologous (cluster), or to each gene. Elements with missing data should be filled by NA (dimensions of the provided data.frame must be coherent with object data).

Usage

PgR6$add_metadata(map = "org", data)

Arguments

map: character identifying the metadata to map. Can be one of "org", "cluster", or "gid".
data: data.frame or DataFrame with the metadata to add. For each case, a column named as "map" must exists, which should contain identifiers for each element. In the case of adding gene (gid) metadata,each gene should be referenced by the name of the organism and the name of the gene as provided in the "data" data.frame, separated by the "sep" argument.

Returns

self invisibly, but with additional metadata.

Method `drop()`

Drop an organism from the dataset. This method allows to hide an organism from the real dataset, ignoring it in downstream analyses. All the fields and methods will behave as it doesn't exist. For instance, if you decide to drop organism 1, the $pan_matrix field (see below) would not show it when called.

Usage

PgR6$drop(x)

Arguments

x: character or numeric. The name of the organism wanted to be dropped, or its numeric id as returned in $organism field (see below).

Returns

self invisibly, but with x dropped. It isn't necessary to assign the function call to a new object, nor to re-write it as R6 objects are mutable.

Method `recover()`

Recover a previously $drop()ped organism (see above). All fields and methods will start to behave considering this organism again.

Usage

PgR6$recover(x)

Arguments

x: character or numeric. The name of the organism wanted to be recover, or its numeric id as returned in $dropped field (see below).

Returns

self invisibly, but with x recovered. It isn't necessary to assign the function call to a new object, nor to re-write it as R6 objects are mutable.

Method `write_pangenome()`

Write the pangenome data as flat tables (text). Is not the most recommended way to save a pangenome, since you can loose information as numeric precision, column classes (factor, numeric, integer), and the state of the object itself (i.e. dropped organisms, or core_level), loosing reproducibility. Use $save_pangenomeRDS for a more precise way of saving a pagoo object. Still, it is useful if you want to work with the data outside R, just keep the above in mind.

Usage

PgR6$write_pangenome(dir = "pangenome", force = FALSE)

Arguments

dir: The non-existing directory name where to put the data files. Default is "pangenome".
force: logical. Whether to overwrite the directory if it already exists. Default: FALSE.

Returns

A directory with at least 3 files. "data.tsv" contain the basic pangenome data as it is provided to the data argument in the initialization method ($new(...)). "clusters.tsv" contain any metadata associated to the clusters. "organisms.tsv" contain any metadata associated to the organisms. The latter 2 files will contain a single column if no metadata was provided.

Method `save_pangenomeRDS()`

Save a pagoo pangenome object. This function provides a method for saving a pagoo object and its state into a "RDS" file. To load the pangenome, use the load_pangenomeRDS function in this package. It *should* be compatible between pagoo versions, so you could update pagoo and still recover the same pangenome. Even sep and core_level are restored unless the user provides those arguments in load_pangenomeRDS. dropped organisms also kept hidden, as you where working with the original object.

Usage

PgR6$save_pangenomeRDS(file = "pangenome.rds")

Arguments

file: The name of the file to save. Default: "pangenome.rds".

Returns

Writes a list with all the information needed to restore the object by using the load_pangenomeRDS function, into an RDS (binary) file.

Method `clone()`

The objects of this class are cloneable with this method.

Usage

PgR6$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Active bindings

Methods

Public methods

Method new()

Usage

Arguments

Returns

Method add_metadata()

Usage

Arguments

Returns

Method drop()

Usage

Arguments

Returns

Method recover()

Usage

Arguments

Returns

Method write_pangenome()

Usage

Arguments

Returns

Method save_pangenomeRDS()

Usage

Arguments

Returns

Method clone()

Usage

Arguments

Method `new()`

Method `add_metadata()`

Method `drop()`

Method `recover()`

Method `write_pangenome()`

Method `save_pangenomeRDS()`

Method `clone()`