More details on amino acids from Wikidata

From the previous blog it was apparent that Wikidata could use some additional information concerning amino acids. I added some content, for example which triplet codes for which amino acid (the active L-forms only) and which amino acids are considered to be essential (and have to be taking in via diet). There are also amino acids which are considered to be dispensable in the human body, and are therefore synthesised in the body itself. I wondered in which pathways I could find these non-essential amino acids and did a query on it in Wikidata:

SELECT ?ID ?IDLabel (COUNT(DISTINCT ?PWID) AS ?count) WHERE{
 ?ID wdt:P279 wd:Q8066 .
 ?ID wdt:P279 wd:Q44266770 . 
 ?PWID wdt:P31 wd:Q4915012 . 
 ?PWID wdt:P527 ?ID . 
  
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
GROUP BY ?ID ?IDLabel
ORDER BY DESC(?count)

This revealed that there are 5 non essential amino acids, however only 4 can be found in multiple pathways from WikiPathways:

ID
 IDLabel
count
 wd:Q183290 L-Serine 15
 wd:Q178450 L-Aspartic Acid 7
 wd:Q218642 L-Alanine 4
 wd:Q29519883   L-Asparagine 3

If I want to visualise this count in a graph, I could use for example the Bubble chart; but only 4 amino acid isn't a challenge for the SPARQL endpoint of Wikidata; I will look for all amino acids (regardless of stereospecific form, or if this amino acid is considered essential yes or no). Unfortunately, some of the bubbles are not large enough to display the whole name of the compound, I cannot add a title or description to this Figure (yet), and I can only download the Figure as a SVG image (which I know how to work with, but is hard to integrate with this blog...).

Presence of amino acids in WikiPathways pathways, which are listed in Wikidata [retrieved 01-12-2017].
I will dive into the option of visualising queries more often now, since this tells me more then a simple table... Perhaps I can custumise the graph above, where different colours are related to how essential an amino acid is ;) Use this link to find the interactive version of this visualisation, with clickable links in the bubbles to go to your selected amino acids' Wikidata page: http://tinyurl.com/y93cvlxr 

Below is the query, to try this out yourself:

#defaultView:BubbleChart
SELECT ?ID ?IDLabel (COUNT(DISTINCT ?PWID) AS ?count) WHERE{
 ?ID wdt:P279 wd:Q8066 .
 ?PWID wdt:P31 wd:Q4915012 . 
 ?PWID wdt:P527 ?ID . 
  
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
GROUP BY ?ID ?IDLabel

Reacties

  1. Intriguing bubble plot... btw, for each amino acid, you can see all stereo and charge forms with #Scholia, e.g. https://tools.wmflabs.org/scholia/chemical/Q183290

    if you look under 'Related Compounds' you will find other compounds with the same skeleton, but different charge and/or stereochemistry. In fact, you'll see 9 related compounds.

    In that table, you can see a lot of info in the InChIKey: 'UHFFFAOYSA' is no stereo defined, '-N' is neutral, and '-M' and '-O' are charged. All other IDs should be different.

    BeantwoordenVerwijderen

Een reactie posten

Populaire posts van deze blog

Ten simple rules on how to mingle at a scientific conference (featuring BioSb, ICCS 2018 and MacsBio Science Day.

Biomarkers of Diseases

I'm a scientist, not a (wo)man