ProteinQuest user guide
1. Introduction……………………………………………………………………………………............ 3
1.1 With ProteinQuest you can…………………………………………………………………............... 3
1.2 ProteinQuest basic version……………………………………………………………………………4
1.3 ProteinQuest extended version………………………………………………………………............. 5
2. ProteinQuest dictionaries………………………………………………………………………............ 6
3. Directions for use………………………………………………………………………………............ 7
3.1 Simple query……………………………………………………………………………..................... 7
3.2 Advanced query……………………………………………………………………………................ 8
3.3 Combine dictionary terms with Boolean operators AND, OR, NOT………………………............... 8
3.4 Loading a list….……………………………………………………………………………............... 9
3.5 Identify a list of terms…………………………………………………………………………….….. 10
3.6 How to identify extra data…………………………………………………………………………… 11
3.7 Highlight data into the documents………………………………………………………...…............ 11
3.8 Wizard……………………………………………………………………………………………….. 12
3.9 Query limits………………………………………………………………………………………….. 13
3.10 Results bar………………………………………………………………………………………….. 13
3.10.1 Enlarge, Filter, Clipboard, Network………………………………………………….................... 14
3.11 Results list……………………………………………………………………………….................. 17
3.11.1 Papers, Patents and Clinical Trials……………………………………………………….............. 17
3.12 Results analysis……………………………………………………………………………............... 18
3.12.1 Excel download of a list of terms………………………………………………………................ 18
3.12.2 Excel download of a list of PMIDs…………………………………………………….................. 19
3.12.3 Excel download of a list of PMIDs, Year, Title, Authors;Journal, Volume, Pages, Notes………. 19
3.13 Graphs download………………………………………………………………………………….... 20
3.13.1 Heat Map download………………………………………………………………………............ 21
3.13.2 Network download……………………………………………………………………….............. 22
3.13.2.1 Excel……………………………………………………………………………………………. 22
3.13.2.2 Node XL…………………………………………………………………………………........... 23
3.13.2.3 Cytoscape………………………………………………………………………………………. 23
4. Tools…………………………………………………………………………………………............... 24
4.1 Saved query/Load query……………………………………………………………………............... 24
4.2 PMID list…………………………………………………………………………………………….. 24
4.3 Network:……………………………………………………………………………………............... 24
4.3.1 How to generate a Network………………………………………………………………………... 24
4.3.1.2 Automatic Network selection……………………………………………………………............. 24
4.3.1.3 Select where to collect data……………………………………………………………………… 25
1
4.3.1.4 Select Nodes……………………………………………………………………………………... 25
4.3.1.5 Generate Network……………………………………………………………………….............. 26
4.3.1.6 Black and colored edges: two types of information……………………………………............... 27
4.3.1.7 Advanced Network configuration……………………………………………………….............. 28
4.3.1.8 Set the values of occurrence, co-occurrence and Ef. ……………………………………………. 28
4.4 The Heat Map………………………………………………………………………………............... 29
4.4.1 How to generate a Heat Map……………………………………………………………................. 29
4.4.2 How to download a Heat Map……………………………………………………………............... 31
5. ProteinQuest Case Studies…………………………………………………………………….............. 31
2
1.Introduction
ProteinQuest is a new platform for biomedical literature retrieval and analysis. This new platform
for biodiscovery smoothly integrates data from scientific literature, data repositories and biological
images. Currently ProteinQuest holds more than 15 million indexed abstracts, 9 million images, 1.8
million selected Patents, 250.000 Clinical Trials and 10 billion binary relationships.
Literature information can be obtained easily by using two different query types: by inserting free
key words and by guided construction of a Boolean query using cured Ontologies.
ProteinQuest finds relevant insights into both article abstracts and image captions, producing more
specific and comprehensive search results compared to other data mining platforms.
Query results can be as specific as users require. ProteinQuest performs an accurate search as it lets
you refine the field of interest by selecting specific dictionaries/ontologies such as miRNA, drugs,
Biological Processes, etc. Moreover queries can be saved and reloaded whenever needed.
ProteinQuest can be also used to search Patent abstracts and claims for analysis of the resulting
information by means of all dictionaries/ontologies available.
Additionally ProteinQuest builds complex network models to extend the understanding of your
research. Networks generated by ProteinQuest reveal binding relationships between several types
of concepts and biological items, as well as between people, institutions, companies, etc.
1.1 With ProteinQuest you can:
Easily understand and interpret literature information through an innovative graphical layout
that highlights key relationships and connections between objects included in several different
Ontologies
Mine for biological relationships between proteins/genes experimentally supported by one
or more techniques of our choice
Prioritize target genes for biomarker discovery, drug development and repositioning
Create powerful, interactive networks connecting genes or proteins to diseases, identify
relevant drugs and isolate sub-networks within biological fields
3
Retrieve only clinically-relevant information at any clinical stage of development
Examine relevant experiments in the literature and compare your results to what people have
already found
Track down collaborations among people or institutions working on a topic of your choice,
identifying the most relevant players in the field
ProteinQuest is available in two versions: basic and extended.
1.2 ProteinQuest basic version
The Basic version is the right tool to search and explore PubMed papers for easily getting a quick
reply to your query.
With ProteinQuest basic version you can:
Retrieve information from abstracts of the entire PubMed collection (more than 15.000.000
records) and captions of all free full-text papers (about 9,000,000 entries);
Launch queries both to PubMed (simple search) or to our curated, internal database
(advanced search).
Disambiguate entities using a semantic approach and a highly sophisticated proprietary
technology to reduce the number of false positive results which common data mining tools are
unable to discriminate
Obtain higher accuracy, precision and recall values compared to other tools
Auto complete query fields for a guaranteed accurate search
Automatically expand queries that include a reference term (e.g. gene symbol), all known
synonyms and add disambiguation information for ambiguous terms allowing to perform a
single absolute search
Perform composite queries by inserting a list of terms such as gene symbols as search input
Retrieve clinically-relevant information at any clinical stage for drug development purposes
Track down collaborations among people or institutions on a common topic
Interrogate the scientific literature using free-words or selecting terms from 9 different
dictionaries/ontologies
The table below highlights the main features of ProteinQuest’s basic version
4
1.3 ProteinQuest extended version
This is the full version of ProteinQuest.
With ProteinQuest’s extended version you can:
Retrieve relevant information from abstracts of the entire PubMed collection (more than
15.000.000 records) and captions of all free full-text papers (about 9,000,000 entries) and both
Patents and Clinical Trails (1.8 million selected Patents, 250.000 Clinical Trials)
Launch queries both to PubMed (simple search) or to our curated, internal database
(advanced search)
Disambiguate entities using a semantic approach, and a highly sophisticated and proprietary
reasoned avoiding the release of false positive results which the common data mining tools are
unable to discriminate.
Obtain higher accuracy, precision and recall values compared to other tools
Auto complete query fields for a guaranteed accurate search
Automatically expand queries that include a reference term (e.g. gene symbol), all known
synonyms and add disambiguation information for ambiguous terms allowing to perform a
single absolute search
Perform composite queries by inserting a list of terms such as gene symbols as search input
Retrieve clinically-relevant information at any clinical stage for drug development purposes
Track down collaborations among people or institutions on a common topic
Integrate PubMed information with Patents and Clinical trials data
Interrogate the scientific literature using free-words or selecting terms from 9 different
dictionaries/ontologies
Process the results using Heatmaps and Networks.
Define Pathways also with data regarding the activation, inhibition and binding information.
The table below highlights the main features of ProteinQuest’s extended version.
5
2.ProteinQuest dictionaries
Molecules
Functions
Anatomy
Lab
Source
Proteins
miRNA
Drugs
Substances
Protein families
Bio Processes
Disease
Pathways
Body parts
Tissues
Cells
Cell parts
Organisms
Methods
Papers
Organizations
Nationality
Study type
Authors
Journals
Year
Patents
Organizations
Inventors
USP Class
Year
Trials
Organizations
Nationality
Status
Phase
Year
6
3. Directions for use
3.1 Simple query
To set a simple query insert your keywords into the search space:
The selected terms will be searched without any text processing in both abstracts and Mesh terms of
PubMed papers
=>But organized information is available for each dictionary
7
3.2 Advanced query
The selected terms will be searched in both abstracts, Mesh terms and captions of PubMed papers,
abstracts and claims of US Patents and Summaries of worldwide Clinical Trials
3.3 Combine dictionary terms with Boolean operators AND, OR, NOT
After the query has been set, it is possible to change the Boolean operator from OR to AND or from
AND to NOT simply by clicking on it.
8
3.4 Loading a list:
- You can insert terms one at the time or you can load a list.:
The file must have a .txt format
9
3.5 Identify a list of terms
For each dictionary, it is possible to extract the specific match terms identified in the papers:
Different lists can be extracted. Below you can see that we chose highlight miRNAs that have been
identified in the result.
10
3.6 How to identify extra data
If you want to visualize all the related miRNA described with the ones of the query, just click
on{}Enlarge:
3.7 Highlight data into the documents
11
3.3. Wizard
Through the wizard button it is possible to obtain the results of an advanced query in a single click
Through wizard buttons it’s possible to obtain networks in a single click
12
3.4 Query limits
-How to set limits:
3.5 Results bar
For each dictionary it is possible to visualize and export the elements identified in the results. Here
is the protein list where the number of documents (abstract, or captions) and images (captions)
corresponding to the free captions of papers has been specified. The Ef corresponds to the
enrichment factor, which relies on the frequency of the elements in the results.
13
Notably the first protein of the list is TNFAIP3 as it has been cited the most in the results.
The list can also be ordered by the highest number of images or enrichment factor simply by
clicking on the top of the bar:
3.10.1 Enlarge, Filter, Clipboard, Network
-{}Enlarge to visualize all the related concepts of the query elements identified in the results
14
-Filter and restrict the query with an additional group of elements (checked into the box space)
-Select and save all documents and images of your interest to the Clipboard
15
Furthermore it is possible to visualize all PMIDs and their corresponding titles
by clicking on a title the document will appear behind the clipboard, ready to be analyzed.
Within the clipboard you can CLEAR and erase or SAVE the subset of selected documents. The
papers can be reloaded by selecting the saved clipboard.
-Network and visualize the biological relationship among the selected terms.
Inside ProteinQuest a network of at least 240 nodes can be represented.
The choice of nodes relies on the number of documents and the enrichment factor (Ef) of the most
connected terms.
16
3.11 Results list
3.11.1 Papers, Patents and Clinical Trials
The result obtained from a query corresponds to a subset of documents:
If you are interested in PubMed publications, just select the Papers directory:
The title, affiliation, abstract, mesh and open sources figure of the papers will be analyzed.
If you are interested in Patents, just select the Patents directory:
The title, affiliation and claims of patents will be analyzed.
17
If you are interested in Clinical trials, just select the Trials directory:
The analysis is related to the title, affiliation, summary and eligibility of clinical trials
3.12 Results analysis
3.12.1 Excel download of a list of terms
The list of terms of each dictionary identified in the results, can be exported to excel:
18
3.12.2 Excel download of a list of PMIDs
Furthermore the PMIDs list of the results can be exported from ProteinQuest.
3.12.3 Excel download of a list of PMID, Year, Title, Authors; Journal, Volume, Pages, Notes
Note that the PMID is the link to download of the selected papers.
Not only the list of terms but you can also export a graph, heat map or a network
19
3.13 Graphs download
Here is a downloaded graph representing the number of documents and images of the most specific
biological processes of a query:
20
3.13.1 Heat Map download
Here is a downloaded excel file of a Heat Map that represents the methods used for the analysis of
genes identified in a specific query
21
3.13.2 Network download
The Network can be downloaded in different formats:
3.13.2.1 Excel
With the excel file it is possible to visualize the main characteristic of the network generated in
ProteinQuest:
The concept selected (vertex), the occurrence (label), the Ef (tooltip) and the weight (Cooccurrence)
Here is an example of excel file of a network generated from a query
Vertex
IL6
LMNA
TNF
IL1B
NFKB1
STAT3
IL8
MAPK8
RELA
TLR4
CASP3
CCL2
COX2
PTGS2
MAPK3
Color
254, 161, 0
254, 161, 0
255, 135, 135
255, 152, 152
255, 156, 156
255, 156, 156
255, 161, 161
255, 163, 163
255, 165, 165
255, 165, 165
255, 167, 167
255, 169, 169
255, 169, 169
255, 169, 169
255, 171, 171
Shape Size Label Tooltip Type Occurrences Weight
circle
80 61 occ, Ef 167.9 Prot
61 167.86
circle
80 61 occ, Ef 4160 Prot
61 4160.01
circle
60 31 occ, Ef 46.95 Prot
31 46.9531
circle
50 16 occ, Ef 71.32 Prot
16 71.3205
circle
48 13 occ, Ef 97.91 Prot
13 97.9058
circle
48 13 occ, Ef 236.7 Prot
13 236.75
circle
46 10 occ, Ef 78.47 Prot
10 78.4673
circle
45 9 occ, Ef 80.52 Prot
9 80.5248
circle
45 8 occ, Ef 266.3 Prot
8 266.275
circle
45 8 occ, Ef 148.4 Prot
8 148.415
circle
44 7 occ, Ef 35.22 Prot
7 35.2155
circle
43 6 occ, Ef 70.26 Prot
6 70.2596
circle
43 6 occ, Ef 59.98 Prot
6 59.9779
circle
43 6 occ, Ef 60.86 Prot
6 60.8579
circle
43 5 occ, Ef 36.28 Prot
5 36.2841
22
3.13.2.2 Node XL
Using NodeXL it is possible to edit the network obtained in ProteinQuest and prepare an image of
it.
3.13.2.3 Cytoscape
Using Cytoscape it is possible to edit your ProteinQuest network and to further analyze it through
its plugins (Bingo, GeneMania, Reactome, Network Analyzer etc.,)
23
4. Tools
Inside the Tool bar there are several functions:
4.1 Saved query/Load query
It is possible to save your query before you log out from ProteinQuest and reload it in the following
session.
4.2 PMIDs list
It is possible to export the list of PMIDs identified in the results
4.3 Network
There are two possible network setting options:
An automatic selection will choose the interactions by relying on the number of documents and Ef
among the most connected terms.
4.3.1 How to generate a Network
4.3.1.2 Automatic Network selection
For automatic network generation don’t select the advanced configuration option.
24
4.3.1.3 Select where to collect data
It is required to select where to collect data: papers, patents or clinical trials. For the PubMed papers
it is necessary to select if terms should be collected from either abstracts or images or both
4.3.1.4 Select Nodes
The nodes can be represented by their query terms, visualized only by the interactions among them
(restrict to query elements) or included terms identified in the results belonging to the same
dictionary or other ones.
25
Since the edges selected correspond to the documents where different nodes are described together,
it is required to select which interactions to visualize by checking one or more of the options
proposed.
4.3.1.5 Generate Network
Here is an examle of a network automatically generated by ProteinQuest:
26
4.3.1.6 Black and colored edges: two types of information
-black edges correspond to a link of specific papers described together by the relationship among
the adjacent nodes.
-colored edges correspond to experimental data describing interactions, inhibitions, expression
regulation and enzymatic reactions.
It is possible to select which data to visualize first.
Other information related to the network is available in bibliometric and protein pathway network
analysis.
27
4.3.1.7 Advanced Network configuration
There is also the possibility to select the advanced configuration to generate the network.
4.3.1.8 Set the values of occurrence, co-occurrence and Ef
And set the values of occurrence, co-occurrence and Ef.
The only limits sizes are the ones set by user.
28
Here is an example of a network generated in ProteinQuest and visualized in Cytoscape:
4.4 The Heat Map
4.4.1 How to generate a Heat Map
Heat Maps can be generated by selecting the correspondent button under the “Tools” directory.
The Heat Map represents a useful tool to explore biological relationship among specific terms
identified in the query results.
29
It is possible to visualize where two terms are described together in the papers or patents or clinical
trials. Following are the steps necessary to generate a Heat Map.
The resulting Heat Map will report in each cell the number of co-occurrences of two terms in the
list of documents retrieved in the results. The red intensity is proportional to the fraction of hits
normalized to the total hits number of each column.
Following is a Heat Map reporting in each box the number of documents where each genes or
proteins are described in a specific pathological context.
Furthermore the numbers are also linked to the corresponding documents.
30
4.4.2 How to download a Heat Map
The Heat Map can be exported to excel for further statistical analysis, such as cluster analysis,
Pearson’s correlation etc. These analyses are very useful to identify for e.g. biomarker signatures
and other biological information.
5. ProteinQuest Case Studies
1] S. Polidoro et al., “Effects of bisphosphonate treatment on DNA methylation in osteonecrosis of
the jaw.,” Mutat. Res., vol. 757, no. 2, pp. 104–13, Oct. 2013.
[2] T. Alberio et al., “Parkinson’s disease plasma biomarkers: an automated literature analysis
followed by experimental validation.,” J. Proteomics, vol. 90, pp. 107–14, Sep. 2013.
[3] C. Zanini et al., “Medullospheres from DAOY, UW228 and ONS-76 cells: increased stem cell
population and proteomic modifications.,” PLoS One, vol. 8, no. 5, p. e63748, Jan. 2013.
[4] A. Benso et al., “Reducing the complexity of complex gene coexpression networks by coupling
multiweighted labeling with topological analysis.,” Biomed Res. Int., p. 676328, Jan. 2013.
31
32
Scarica

ProteinQuest user guide