Why the mapping?
We will use the mapping, when it is complete, to help contributors to Wikiproteins (who are not expert ontologists) to be consistent when they add new entries to the database. Users will be able to choose from biotop classes to classify their entries. The semantic network defines the relationships that may exist between entities of defined semantic type, so users may be alerted when their edits appear to be contradictory. Secondly, mapping other classification systems to Biotop in the future will allow us to seamlessly integrate new datasets into Wikiproteins, and use one ontology for all of them. An example would be SwissProt, which is admittedly not very wide, but it is important.
A good part of the Semantic Types (STs) have been mapped to a Biotop class. An Excel sheet is available on Google Docs. Of the 91 that remain, roughly half will be possible to express once we determine which superclass to use. Of the remaining STs, the ones that require most work to express under Biotop are the ones that are classified by their function.
- We need to discuss our contribution to publications:
- The MIE paper
- Could someone give feedback on the ST’s that have been mapped (“equivalences” sheet of the Google Document mentioned above)? Are they adequate?
- Determine the most important problematic ST’s and concentrate on those (see “remaining” worksheet). We can ditch the less important ones (i.e., not biomedically related) if they are a disproportionate burden to map. I (László) should be able to run through these this week.
- Should we add the relationships of the SN to the mapping somehow?
- use Pellet to classify hierarchy (Ronald can help with this):
- Find and resolve inconsistencies
- Determine whether explicit mappings of child classes are necessary when parents have been made equivalent. Up to now I have created classes for every single UMLS ST where none existed in Biotop – maybe this is unneeded.
- If we can do this during November, in parallel with the mapping of the remaining types, then we’re still on schedule according to User:Laszlo/UMLS-Biotop/Plan.
- Clean up the issues mentioned below. The time this will take depends on how difficult it will prove to do. The first two are the most pressing ones, because most pending classes depend on them.
Issues with the mapping
- Of what class should ClinicalDrug be a child? This is a problem I have with many classes: Food, Finding… This is perhaps my biggest roadblock in mapping the remaining classes.
- How to express Function? Vitamin, Hormone, Antibiotic, Receptor,
- biotop:Human is a direct child of biotop:Organism. What about the Vertebrate/Invertebrate distinction and the rest of the tree of life? Alga is currently in an inconsistent state (it’s both Plant and Protist, which are disjunct).
- Abstract and “Meta”-classes such as many children of Conceptual Entity may have to go somewhere under biotop:NonRealizableInformationEntity.
- Molecular/Amino Acid/Carbohydrate/Nucleotide Sequence can be either conceptual or conrete, depending on how you look at it. The text string "CGATATATAG" is related to a 10-basepair DNA molecule, it is its blueprint – but it’s not the same thing, is it? Is a sequence a realizableEntity?
- Creating new properties: what about ontological rigor here? Are there guidelines for this? Can we just add the proposed consistsOf?