-
Introduction
You are able to store, manage, reanalyse and download data in the EPI2ME platform using the EPI2ME Data Manager. The EPI2ME Data Manager is a collection of web-based tools, services and interfaces that allow easy management of data associated with analyses uploaded to and run on the EPI2ME platform.
To create a new analysis, you need to provide Oxford Nanopore sequence files generated by the MinKNOW software as input and select an EPI2ME workflow. EPI2ME workflows are composed of several steps. Each step represents a logical unit of computation and/or analysis. Sequence data uploaded to the EPI2ME platform progress from one step to the next until all data have been processed (or you stop the analysis). The output of each step comprises one or more files that might include metadata, reference data, alignments (BAM), taxonomic classifications, basecalled sequences, etc. The files uploaded to EPI2ME and the collection of files output by each step of the workflow are each represented in EPI2ME as a reusable, sharable “Dataset”. Datasets are made available and managed in the EPI2ME Portal interface.
-
Data security
The EPI2ME platform is accredited to the ISO/IEC 27001 standard and your data will be encrypted in transit and at rest from the moment it is uploaded to the moment it is finally deleted. Datasets will not be stored in EPI2ME unless explicitly requested by the data owner. Datasets not explicitly stored are removed in accordance with our data retention policy. For full details about the security of your data in EPI2ME, please refer to our information governance document.
-
Location of datasets in EPI2ME Portal interface
Datasets are accessed and managed from the data manager on the EPI2ME Portal website - a collection of tools and interfaces that allow you to store and manage files uploaded to EPI2ME or the collection of files that are output by each step of a workflow. You can view your own and shared datasets from the Datasets tab in the Portal:
All datasets are temporarily available to view from the Analysis page while the analysis is running. After the analysis is complete, you have the option of saving the dataset to the EPI2ME cloud indefinitely. Only datasets that are stored in EPI2ME can be viewed in the Dataset dashboard. Storage of datasets in EPI2ME is optional and free but may incur extra costs in future.
Datasets not stored in EPI2ME will be deleted from the platform (whichever is first):
- Within 24 hours of the analysis being stopped
- In line with our data retention policy for analyses that are left to run indefinitely (see https://metrichor.com/ig.html#dataretention )
Datasets can also be deleted at any time.
-
There are three types of datasets:
- Source: the raw data uploaded to EPI2ME. Every time you upload files to EPI2ME for analysis, a Source dataset will be created.
- Output: files/sequences/metadata produced during each discrete step of an analysis. For example, there will be a separate dataset created for the barcoding step, and another one for e.g. metagenomic classification.
- Reference: a special type of output produced by the FASTA Reference Upload workflow that can be used as reference sequence database for custom alignment analysis.
-
Each dataset is represented by a tile with several items of information on it:
- Dataset type: Source, Output or Reference - described above
- Workflow name: the workflow run in EPI2ME which created the dataset
- Analysis: link to the analysis report
- Tags: these are entered during analysis set-up in the EPI2ME Agent, and can be edited in the Portal
- Created: the date on which the dataset was created
- Size: this changes dynamically as data is uploaded into EPI2ME or is analysed
- Stored in cloud or Expiring: if the dataset has not been permanently saved, it will be marked as "Expiring" until it is automatically deleted. Saved datasets are marked as "Stored in cloud"
- Owner: the person who ran the workflow that generated the dataset
-
Clicking Go to dataset takes you to a new page with further information about the dataset, and options for dataset management:
There are several ways to manage a dataset:
- Share the dataset with all other EPI2ME users (including users outside of one's own account)
- Re-use the dataset for a new analysis in EPI2ME
- Store the dataset in the cloud
- Copy the dataset to another EPI2ME-enabled region
- Download the dataset contents
- Delete the dataset
-
Starting new analyses with datasets as input
Only datasets that are saved in EPI2ME can be reused to launch new analyses. Your data is uploaded to the EPI2ME platform once, then as new versions of software become available or new analysis workflows are released, you can re-analyse your data by running new workflows on your existing datasets. The choice of which analysis users are able to run is restricted to those analyses that are compatible with the dataset and its contents.
Setting up a new analysis from a dataset in the Portal mimics the analysis set-up in the EPI2ME Agent:
Please be aware that EPI2ME operates in multiple geographic regions, and analyses can only be run from datasets within the same EPI2ME-enabled operating region.
Important note: analyses can be stopped from within the Portal before completion, and users will be presented with an option to optionally store temporary datasets:
However, old versions of the EPI2ME Agent do not offer this functionality, and stopping an analysis in the Agent will automatically delete datasets in the EPI2ME portal. To mitigate this, you will need to indicate in the Portal which datasets are to be stored, before stopping the analysis either in Portal or in the Desktop Agent.
-
Switching to other EPI2ME-enabled regions
To switch to another EPI2ME-enabled region you can either change the preferred Operating region from the profile menu or you can change your preferred Operating region in your user profile by clicking the blue profile button > Profile option > Operating region option.
-
Copying datasets to/from other EPI2ME-enabled regions
In order to copy an existing dataset to or from other regions, you will first have to ensure that:
- You have the appropriate access to view and manage the selected dataset.
- The analysis that generated the dataset has been stopped.
- The dataset of interest has been stored in the EPI2ME platform.
The actions panel for a dataset will change depending on whether that dataset is stored in your current operating region. If the dataset is in your region then you will have access to a number of actions on that dataset including a new Copy dataset to button. However, if the operating region is different to that of the dataset you will be presented with the simpler actions panel and a Copy dataset here button. Below compare the two different routes whereby you will be able to copy datasets into other operating regions.
If the dataset is in the current operating region
Click Copy dataset to:
Dataset copy successfully finished:
If the dataset is outside the current operating region
Click Copy dataset here:
Dataset copy successfully finished:
-
The location of datasets including where they have been copied will be shown in the dataset details section of the page.
If a dataset is available in the current operating region, the region label will be emphasised in bold.
Datasets can contain a very large number of files and the process of copying them into other regions will take time. You will be notified through the EPI2ME notification system when all files have been successfully copied after which the usual actions (Start analysis, download, share etc.) will become available.
-
Deleting datasets from EPI2ME-enabled regions
If you have copied a dataset into other EPI2ME-enabled regions, you will be presented the option of which operating region you would like to delete datasets from. To begin deleting datasets, click on Delete dataset. You will then be presented with a panel with the various copies (replicas) listed. Datasets that are still in the process of being copied cannot be deleted and are distinguished by a Currently replicating label. As with the dataset details, the dataset that is in the current operating region is emphasised in bold. Select the dataset replicas you want to delete and click Delete replicas.
-
Glossary of terms
EPI2ME: A cloud-based bioinformatics and analytics platform developed by Metrichor Ltd.
EPI2ME Portal: A web portal in which users can view and manage data held in EPI2ME.
Data manager: The collection of tools services and interfaces for managing the output of EPI2ME workflows.
Dataset: A collection of files and folders representing the analysis output of each discrete step in an EPI2ME workflow.
Reference dataset: A dataset created by running the FASTA Reference Upload workflow that can be used in custom reference alignment workflows as the reference data.