Last time we made the following comment: “An interesting point is that if it was possible to exclude the possibility that the molecule had a ring, then the correct result would jump from position 16 to position 2.”
This time we are doing exactly that sort of thing. Based on the theory in chapter 8.
The structure criteria (either that something is present or absent) with probability over 95% is used to restrict the set of possibilities. In the example, this reduces the set of candidates from around 30 to around 3.
A comparison between MOLGEN-MS and two competitors; ACD MS Fragmenter and MetFrag, is made. It seems that they are about the same in quality. Some have better percentwise ranking (RRP) but also more candidates.
Mass spectrometry yields more than just a spiky digram. It also produces the diagram in real time. To take the time element into account, we look at retention time.
Retention properties – retention is the time it takes from insertion to observation and is used in chomtography. (There are to rentention indices, but they build on the same idea)
There is some error when measuring the retention time, so one approach is to allow a molecule if it is within 2 standard deviation of a retention measurement.
2 other properties can be used similarly to retention time to exclude unlikely candidates:
Partitioning properties – A measure for how the molecule partitions related to retention time.
Steric energy – molecules with too high steric energy that are so weird, that they cannot exist.
An alternative to the above approach is Consensus scoring. The idea is to give a combined rank for candidates instead of eliminating unlikely ones for each criterion. This requires a formula for combining the different criteria, which is provided. The formula has some disadvantages, but overall this seems like a more promising approach.
The example used tries to find the molecules present in contaminated groundwater in Germany.
The chapter concludes with naming some ways to improve CASE studies I the future. It mentions that there is a long way to go before it is automated.
The second conclusion is about CASE with high accuracy data. It is not efficient in practice, since the databases are in their infant stages, but people are working on it.
The workflow on figure 9.9 is not guilty of the same lack of details as the previous ones.
Chapter 7,8 also seemed like case-studies.
Next up is a talk on Monday 14-16 in the department of statistics. We discuss the contents of the entire book. Interested people are welcome.