- home
- Advanced Search
5 Research products, page 1 of 1
Loading
- Research data . 2022Open AccessAuthors:Dallago, Christian; Marquet, Céline; Rost, Burkhard;Dallago, Christian; Marquet, Céline; Rost, Burkhard;Publisher: Zenodo
Residue and sequence embeddings of the fly (drosophila melanogaster) proteome (FlyBase for organism drosophila melanogaster, downloaded on 2022.03.01) computed using bio_embeddings (bioembeddings.com) using the ProtT5 embedder at full precision (https://www.biorxiv.org/content/10.1101/2020.07.12.199554v3). To open the embeddings file, please see this notebook. The embeddings will be indexed by numbers according to the mapping file (mapping_file.csv) in this dataset. All following results will share the same mapping (for instance, to access the variation prediction results, by accessing index "0", you will query results for the sequence "FBpp0304622"). Additionally: - Sequence-level predictions of subcellular localization in 10 classes using LA (https://www.biorxiv.org/content/10.1101/2021.04.25.441334v1) - Residue-level three state secondary structure prediction (alpha, sheet or other) using models reported in the ProtTrans paper (https://www.biorxiv.org/content/10.1101/2020.07.12.199554v3) - Residue-level prediction of conservation (in 9 states) and of variation effect (from 0 [no-effect] to 1 [effect]) using VESPAl (https://doi.org/10.1007/s00439-021-02411-y) Files included: - dmel-all-translation-r6.44.fasta --> FASTA-formatted sequences of drosophila melanogaster from FlyBase - mapping_file.csv --> A CSV file mapping the identifiers used in the following files (from 0 to 30737) to the identifiers in the FlyBase fasta file (dmel-all-translation-r6.44.fasta). - DSSP3_fly_ProtT5Sec.fasta --> Secondary structure predictions in three states for each residue of each protein in dmel-all-translation-r6.44.fasta. "H" stands for Helix; "E" stands for Sheet; "C" stands for Other. - subcell_fly_LA_ProtT5.csv --> Subcellular location (10 states) and memrane-boundness (2 states) for each protein in dmel-all-translation-r6.44.fasta - embeddings_file.h5 --> per-residue embeddings of sequences in dmel-all-translation-r6.44.fasta. Each dataset in the .h5 file represents a protein sequence and contains a matrix of length Lx1024, with L being the length of the protein sequence. Datasets are indexed using integers. The original sequence identifier (from the FASTA header) can be accessed through the "original_id" attribute. See https://docs.bioembeddings.com/v0.2.0/notebooks/open_embedding_file.html for information on how to open the file. - reduced_embeddings_file.h5 --> per-sequence embeddings of sequences in dmel-all-translation-r6.44.fasta (obtained by mean-pooling the residue-embeddings along the length dimension of the protein sequence). Each dataset in the .h5 file represents a protein sequence and contains a vector of size 1024 (meaning, each sequence has the same dimension). - conspred_probs.h5 --> per-sequence conservation probability (softmax) prediction of sequences in dmel-all-translation-r6.44.fasta in 9 classes. Each dataset in the .h5 file represents a protein sequence and contains a matrix of length 9xL, with L being the length of the protein sequence, and 9 being the predicted conservation class (index 0 = very variable; index 8 = very conserved) - vespal_SAVeffect_fly.zip --> zipped .h5 file of per-sequence variation predictions of sequences in dmel-all-translation-r6.44.fasta on a scale from 0 (neutral) to 1 (effect). -1 indicates WT substitution. Each dataset in the .h5 file represents a protein sequence and contains a matrix of length 20xL, with L being the length of the protein sequence, and 20 being the predicted variation score for each residue substitution (AAs in the following order: "ALGVSREDTIPKFQNYMHWC" . Meaning that index 0 = substitution of the residue to "A", index = 1 substitution to residue "L", aso.)
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research data . 2021Open Access EnglishAuthors:Divya M. Persaud; Robert Barnes;Divya M. Persaud; Robert Barnes;Publisher: Zenodo
{"references": ["Barnes, R., S. Gupta, C. Traxler, T. Ortner, A. Bauer, G. Hesina, G. Paar, et al. 2018. \"Geological Analysis of Martian Rover-Derived Digital Outcrop Models Using the 3-D Visualization Tool, Planetary Robotics 3-D Viewer\u2014PRo3D.\" Earth and Space Science 5: 1\u201323. https://doi.org/10.1002/2018EA000374.", "Fraeman, A. A., B. L. Ehlmann, R. E. Arvidson, C. S. Edwards, J. P. Grotzinger, R. E. Milliken, D. P. Quinn, and M. S. Rice. 2016. \"The Stratigraphy and Evolution of Lower Mount Sharp from Spectral, Morphological, and Thermophysical Orbital Data Sets.\" Journal of Geophysical Research E: Planets 121 (9): 1713\u201336. https://doi.org/10.1002/2016JE005095.Received.", "Hughes, M. N. 2021. \"Landscape Evolution at Endeavour and Gale Craters on Mars, and How Terrain Characteristics Correlate with Mineralogy on Lower Mount Sharp, Gale Crater.\" https://doi.org/10.7936/c6se-5895.", "Milliken, R. E., J. P. Grotzinger, and B. J. Thomson. 2010. \"Paleoclimate of Mars as Captured by the Stratigraphic Record in Gale Crater.\" Geophysical Research Letters 37 (4): 1\u20136. https://doi.org/10.1029/2009GL041870.", "Persaud, D. M. (2021). Co-registered U. Arizona HiRISE DTM and ORI over Sakarya Vallis, Gale Crater, Mars [Data set]. Zenodo. https://doi.org/10.5281/zenodo.5808371", "Persaud, D. M. (2021). Multi-Resolution Basemap of Northwest Aeolis Mons, Gale Crater, Mars [Data set]. Zenodo. https://doi.org/10.5281/zenodo.5808381", "Thomson, B. J., N. T. Bridges, R. E. Milliken, A. M. Baldridge, S. J. Hook, J. K. Crowley, G. M. Marion, C. R. de Souza Filho, A. J. Brown, and C. M. Weitz. 2011. \"Constraints on the Origin and Evolution of the Layered Mound in Gale Crater, Mars Using Mars Reconnaissance Orbiter Data.\" Icarus 214 (2): 413\u201332. https://doi.org/10.1016/j.icarus.2011.05.002."]} This dataset comprises the unit map derived from 3D analysis of Sakarya Vallis in Gale crater, Mars. The units are named Package 1–7. These packages have been extrapolated from morpho-stratigraphic analysis of a HiRISE scene in PRo3D. They have been further extrapolated using underlying image data. Included here is a shapefile representing the marker bed (Milliken et al. 2010) in Gale crater. The "CDD" refers to the Central Debris Deposit, identified by Hughes (2021). The structural data represents dip measurements along the boundaries of these packages within the feature; the "sub-package" data represent layering within the packages. For more on how dip is calculated in PRo3D, see https://pro3d.space/. Finally, the profiles mark the locations where topographic profiles were extracted for constructing cross-sections, as discussed in the thesis Persaud (2022). These data are intended to be displayed with the HiRISE ORI (https://doi.org/10.5281/zenodo.5808371) and CTX ORI mosaic (https://doi.org/10.5281/zenodo.5808357) over Sakarya Vallis, and over the basemap over the northwest of Aeolis Mons (https://doi.org/10.5281/zenodo.5808381). Format: SHP, SHX, DBF, PRJ, QPJ Projection: Equidistant cylindrical Datum: Spheroid (r = 3396.190 km) N.B. the PROJ4 format of the project is "+proj=eqc +lat_ts=0 +lat_0=0 +lon_0=0 +x_0=0 +y_0=0 +a=3396190 +b=3396190 +units=m +no_defs" The first author is now at Jet Propulsion Laboratory, California Institute of Technology, Pasadena, California. Contact: divya.m.persaud@jpl.nasa.gov Part of this work was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under contract to NASA. Government sponsorship acknowledged.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research data . 2021Open Access EnglishAuthors:Dumont, Bastien;Dumont, Bastien;Publisher: Zenodo
Test for the TEI Critical Apparatus Toolbox, along with their outcome and summaries. They are related to an article to be published in RIDE 14.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research data . 2021Open AccessAuthors:Christian Dallago; Burkhard Rost;Christian Dallago; Burkhard Rost;Publisher: Zenodo
Residue and sequence embeddings of the human proteome (SwissProt for organism Human, downloaded on 2021.06.09) computed using bio_embeddings (bioembeddings.com) using the ProtT5 embedder at full precision (https://www.biorxiv.org/content/10.1101/2020.07.12.199554v3). Additionally: - Sequence-level predictions of subcellular localization in 10 classes using LA (https://www.biorxiv.org/content/10.1101/2021.04.25.441334v1) - Residue-level three state secondary structure prediction (alpha, sheet or other) using models reported in the ProtTrans paper (https://www.biorxiv.org/content/10.1101/2020.07.12.199554v3) Files included: - human.fasta --> FASTA-formatted sequences of human from SwissProt - DSSP3_human_ProtT5Sec.fasta --> Secondary structure predictions in three states for each residue of each protein in human.fasta. "H" stands for Helix; "E" stands for Sheet; "C" stands for Other. - subcell_human_LA_ProtT5.csv --> Subcellular location (10 states) and memrane-boundness (2 states) for each protein in human.fasta - embeddings_file.h5 --> per-residue embeddings of sequences in human.fasta. Each dataset in the .h5 file represents a protein sequence and contains a matrix of length Lx1024, with L being the length of the protein sequence. Datasets are indexed using integers. The original sequence identifier (from the FASTA header) can be accessed through the "original_id" attribute. See https://docs.bioembeddings.com/v0.2.0/notebooks/open_embedding_file.html for information on how to open the file - reduced_embeddings_file.h5 --> per-sequence embeddings of sequences in human.fasta (obtained by mean-pooling the residue-embeddings along the length dimension of the protein sequence). Each dataset in the .h5 file represents a protein sequence and contains a vector of size 1024 (meaning, each sequence has the same dimension).
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research data . 2019Open AccessAuthors:Scott, Pat;Scott, Pat;Country: United Kingdom
This is the example MultiNest chain from arXiv:0909.3300, used in the pippi example, and referred to in the pippi paper and documentation.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.
5 Research products, page 1 of 1
Loading
- Research data . 2022Open AccessAuthors:Dallago, Christian; Marquet, Céline; Rost, Burkhard;Dallago, Christian; Marquet, Céline; Rost, Burkhard;Publisher: Zenodo
Residue and sequence embeddings of the fly (drosophila melanogaster) proteome (FlyBase for organism drosophila melanogaster, downloaded on 2022.03.01) computed using bio_embeddings (bioembeddings.com) using the ProtT5 embedder at full precision (https://www.biorxiv.org/content/10.1101/2020.07.12.199554v3). To open the embeddings file, please see this notebook. The embeddings will be indexed by numbers according to the mapping file (mapping_file.csv) in this dataset. All following results will share the same mapping (for instance, to access the variation prediction results, by accessing index "0", you will query results for the sequence "FBpp0304622"). Additionally: - Sequence-level predictions of subcellular localization in 10 classes using LA (https://www.biorxiv.org/content/10.1101/2021.04.25.441334v1) - Residue-level three state secondary structure prediction (alpha, sheet or other) using models reported in the ProtTrans paper (https://www.biorxiv.org/content/10.1101/2020.07.12.199554v3) - Residue-level prediction of conservation (in 9 states) and of variation effect (from 0 [no-effect] to 1 [effect]) using VESPAl (https://doi.org/10.1007/s00439-021-02411-y) Files included: - dmel-all-translation-r6.44.fasta --> FASTA-formatted sequences of drosophila melanogaster from FlyBase - mapping_file.csv --> A CSV file mapping the identifiers used in the following files (from 0 to 30737) to the identifiers in the FlyBase fasta file (dmel-all-translation-r6.44.fasta). - DSSP3_fly_ProtT5Sec.fasta --> Secondary structure predictions in three states for each residue of each protein in dmel-all-translation-r6.44.fasta. "H" stands for Helix; "E" stands for Sheet; "C" stands for Other. - subcell_fly_LA_ProtT5.csv --> Subcellular location (10 states) and memrane-boundness (2 states) for each protein in dmel-all-translation-r6.44.fasta - embeddings_file.h5 --> per-residue embeddings of sequences in dmel-all-translation-r6.44.fasta. Each dataset in the .h5 file represents a protein sequence and contains a matrix of length Lx1024, with L being the length of the protein sequence. Datasets are indexed using integers. The original sequence identifier (from the FASTA header) can be accessed through the "original_id" attribute. See https://docs.bioembeddings.com/v0.2.0/notebooks/open_embedding_file.html for information on how to open the file. - reduced_embeddings_file.h5 --> per-sequence embeddings of sequences in dmel-all-translation-r6.44.fasta (obtained by mean-pooling the residue-embeddings along the length dimension of the protein sequence). Each dataset in the .h5 file represents a protein sequence and contains a vector of size 1024 (meaning, each sequence has the same dimension). - conspred_probs.h5 --> per-sequence conservation probability (softmax) prediction of sequences in dmel-all-translation-r6.44.fasta in 9 classes. Each dataset in the .h5 file represents a protein sequence and contains a matrix of length 9xL, with L being the length of the protein sequence, and 9 being the predicted conservation class (index 0 = very variable; index 8 = very conserved) - vespal_SAVeffect_fly.zip --> zipped .h5 file of per-sequence variation predictions of sequences in dmel-all-translation-r6.44.fasta on a scale from 0 (neutral) to 1 (effect). -1 indicates WT substitution. Each dataset in the .h5 file represents a protein sequence and contains a matrix of length 20xL, with L being the length of the protein sequence, and 20 being the predicted variation score for each residue substitution (AAs in the following order: "ALGVSREDTIPKFQNYMHWC" . Meaning that index 0 = substitution of the residue to "A", index = 1 substitution to residue "L", aso.)
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research data . 2021Open Access EnglishAuthors:Divya M. Persaud; Robert Barnes;Divya M. Persaud; Robert Barnes;Publisher: Zenodo
{"references": ["Barnes, R., S. Gupta, C. Traxler, T. Ortner, A. Bauer, G. Hesina, G. Paar, et al. 2018. \"Geological Analysis of Martian Rover-Derived Digital Outcrop Models Using the 3-D Visualization Tool, Planetary Robotics 3-D Viewer\u2014PRo3D.\" Earth and Space Science 5: 1\u201323. https://doi.org/10.1002/2018EA000374.", "Fraeman, A. A., B. L. Ehlmann, R. E. Arvidson, C. S. Edwards, J. P. Grotzinger, R. E. Milliken, D. P. Quinn, and M. S. Rice. 2016. \"The Stratigraphy and Evolution of Lower Mount Sharp from Spectral, Morphological, and Thermophysical Orbital Data Sets.\" Journal of Geophysical Research E: Planets 121 (9): 1713\u201336. https://doi.org/10.1002/2016JE005095.Received.", "Hughes, M. N. 2021. \"Landscape Evolution at Endeavour and Gale Craters on Mars, and How Terrain Characteristics Correlate with Mineralogy on Lower Mount Sharp, Gale Crater.\" https://doi.org/10.7936/c6se-5895.", "Milliken, R. E., J. P. Grotzinger, and B. J. Thomson. 2010. \"Paleoclimate of Mars as Captured by the Stratigraphic Record in Gale Crater.\" Geophysical Research Letters 37 (4): 1\u20136. https://doi.org/10.1029/2009GL041870.", "Persaud, D. M. (2021). Co-registered U. Arizona HiRISE DTM and ORI over Sakarya Vallis, Gale Crater, Mars [Data set]. Zenodo. https://doi.org/10.5281/zenodo.5808371", "Persaud, D. M. (2021). Multi-Resolution Basemap of Northwest Aeolis Mons, Gale Crater, Mars [Data set]. Zenodo. https://doi.org/10.5281/zenodo.5808381", "Thomson, B. J., N. T. Bridges, R. E. Milliken, A. M. Baldridge, S. J. Hook, J. K. Crowley, G. M. Marion, C. R. de Souza Filho, A. J. Brown, and C. M. Weitz. 2011. \"Constraints on the Origin and Evolution of the Layered Mound in Gale Crater, Mars Using Mars Reconnaissance Orbiter Data.\" Icarus 214 (2): 413\u201332. https://doi.org/10.1016/j.icarus.2011.05.002."]} This dataset comprises the unit map derived from 3D analysis of Sakarya Vallis in Gale crater, Mars. The units are named Package 1–7. These packages have been extrapolated from morpho-stratigraphic analysis of a HiRISE scene in PRo3D. They have been further extrapolated using underlying image data. Included here is a shapefile representing the marker bed (Milliken et al. 2010) in Gale crater. The "CDD" refers to the Central Debris Deposit, identified by Hughes (2021). The structural data represents dip measurements along the boundaries of these packages within the feature; the "sub-package" data represent layering within the packages. For more on how dip is calculated in PRo3D, see https://pro3d.space/. Finally, the profiles mark the locations where topographic profiles were extracted for constructing cross-sections, as discussed in the thesis Persaud (2022). These data are intended to be displayed with the HiRISE ORI (https://doi.org/10.5281/zenodo.5808371) and CTX ORI mosaic (https://doi.org/10.5281/zenodo.5808357) over Sakarya Vallis, and over the basemap over the northwest of Aeolis Mons (https://doi.org/10.5281/zenodo.5808381). Format: SHP, SHX, DBF, PRJ, QPJ Projection: Equidistant cylindrical Datum: Spheroid (r = 3396.190 km) N.B. the PROJ4 format of the project is "+proj=eqc +lat_ts=0 +lat_0=0 +lon_0=0 +x_0=0 +y_0=0 +a=3396190 +b=3396190 +units=m +no_defs" The first author is now at Jet Propulsion Laboratory, California Institute of Technology, Pasadena, California. Contact: divya.m.persaud@jpl.nasa.gov Part of this work was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under contract to NASA. Government sponsorship acknowledged.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research data . 2021Open Access EnglishAuthors:Dumont, Bastien;Dumont, Bastien;Publisher: Zenodo
Test for the TEI Critical Apparatus Toolbox, along with their outcome and summaries. They are related to an article to be published in RIDE 14.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research data . 2021Open AccessAuthors:Christian Dallago; Burkhard Rost;Christian Dallago; Burkhard Rost;Publisher: Zenodo
Residue and sequence embeddings of the human proteome (SwissProt for organism Human, downloaded on 2021.06.09) computed using bio_embeddings (bioembeddings.com) using the ProtT5 embedder at full precision (https://www.biorxiv.org/content/10.1101/2020.07.12.199554v3). Additionally: - Sequence-level predictions of subcellular localization in 10 classes using LA (https://www.biorxiv.org/content/10.1101/2021.04.25.441334v1) - Residue-level three state secondary structure prediction (alpha, sheet or other) using models reported in the ProtTrans paper (https://www.biorxiv.org/content/10.1101/2020.07.12.199554v3) Files included: - human.fasta --> FASTA-formatted sequences of human from SwissProt - DSSP3_human_ProtT5Sec.fasta --> Secondary structure predictions in three states for each residue of each protein in human.fasta. "H" stands for Helix; "E" stands for Sheet; "C" stands for Other. - subcell_human_LA_ProtT5.csv --> Subcellular location (10 states) and memrane-boundness (2 states) for each protein in human.fasta - embeddings_file.h5 --> per-residue embeddings of sequences in human.fasta. Each dataset in the .h5 file represents a protein sequence and contains a matrix of length Lx1024, with L being the length of the protein sequence. Datasets are indexed using integers. The original sequence identifier (from the FASTA header) can be accessed through the "original_id" attribute. See https://docs.bioembeddings.com/v0.2.0/notebooks/open_embedding_file.html for information on how to open the file - reduced_embeddings_file.h5 --> per-sequence embeddings of sequences in human.fasta (obtained by mean-pooling the residue-embeddings along the length dimension of the protein sequence). Each dataset in the .h5 file represents a protein sequence and contains a vector of size 1024 (meaning, each sequence has the same dimension).
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research data . 2019Open AccessAuthors:Scott, Pat;Scott, Pat;Country: United Kingdom
This is the example MultiNest chain from arXiv:0909.3300, used in the pippi example, and referred to in the pippi paper and documentation.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.