Is this serious?

There are many web services that tell which peptide fits the best to a HLA type.

Indeed the first step in creating a personalized vaccine should be to choose the peptide that will trigger a response by the immune system.

* One problem is that there is nothing as “a response by the immune system”:
Because there are several immune systems.

* Another is that there is nothing as “triggering a response”.
The same peptide could trigger an immune response against cancer cells, but it could actually trigger a response that promotes cancer.

* The immune system has mechanisms to block unwanted immune responses, the famous “check points” that stopped cancer immunology to progress during decades. True there are “check point inhibitor” drugs, but using them is playing with fire.

* HLA typing is not even close to being “personalized”. HLA typing as done by the industry, will search for less than 10 major HLA types.
HLA types are personal, they are “genetically highly variable” and it is what makes each of us different from the next person. We evolved this way in order to make spreading of a contagious disease difficult.

* Cancer is a tissue disease, before being a genetic disease. Something that inhibited cancer immunology for a long time was the fact that the immune system could not penetrate inside tumors (for complicated reasons).

* Even if we can find the best peptide, there will be lots of side effects. Current web services give a molecular answer to the question, not a medical one. Actually they answer to the question: “Which peptide docks the best on a conventional HLA type”. There is nothing about the patient and her side effects. Side effects could kill the patient quicker than her cancer.

* Molecular docking is an insolvable problem when using classical computation. Maybe quantum computers will change that, but for now there is no quantum computers. So how work those web services? It is simple, they train a Machine Learning algorithm (the kind we use in our Heart Failure detector) and the algorithm can easily spit out some answer. Each academic that wrote such a service dismiss other proposals, but is unable to prove that her ML algorithm is better. ML is a dirty process that should be used for things that are not vital, or things that could be crossed checked many times. While this is what could be called “an educated guess”, it is unethical to use it to create a vaccine.

* If you want to have an immunology targeted treatment, you have to convince your doctor. It will be impossible to discuss with her if there is not a medical or scientific article that mention how this peptide could be useful for that kind of cancer. As far I know there is no peptide/HLA docking web service that do that.


Cancer vaccines

You may have read this article, it is about self care by cancer patients that comb the scientific literature, find a description of a vaccine for their HLA type and convince a medical service to inoculate them with this vaccine.

We, at Padirac Innovation, gave some thoughts about this and came to an algorithm that takes in account the side effects, because as usual it is not enough to find a cure, the cure must also be bearable by the patient.

As for Hjerte, our heart failure detector, we wrote a project on Hackaday in order to document our work in this area:

The easiest thing for us, would be to start writing a Java application, but soon enough move it to a web site for interested users to find and use it as a service.

Toward a passive contactless Heart Failure detector

Today we used the audio extracted from the video in a post below. It was taken by a Logitech 270 webcam which was pressed against my finger, hence the strange red pulsating color.

The audio was truncated as in the beginning the camera was moved a bit harshly and at the end the boiler used by my other self was creating an increasing white noise.

We just used Hjerte 0.3 (available on this Hackaday page and on Github) to classify the heart sound. It recognizes my heart beat quite well, and seems to find convincing S1 and S2 sounds.

What are the take home points?

– There is no need for a ultrasound Doppler to record heart sounds. Hence the “passive” in the title. It is important as some people dislike the idea of using ultrasounds in mass market devices.

– Heart sounds can be recorded on a finger!

– The Hjerte algorithm works even in weird conditions (ordinary microphone, lot of noise).

Now we have to use the video component in the file, and use it in conjunction with the audio part. An obvious use would be to have a reliable heart beat detection, before starting the segmentation.

We are dreaming of multiple webcams recording heart and lung sounds and integrating all those information!

Next developments

So, since one year we did develop a heart failure detector. For a proof of concept, it works quite well and it is open source.
The making of was documented on Hackaday:

We invite you to review this work, adapt it, and sell it in the format that fits the best your goals and capabilities. If you need help just ask us with the contact form:

Now is the time to plan our next steps.
First we observe that technology is often a bit arrogant when it comes to medicine.
It is not because a heart is detected as having some problem at some time that the patient problem is solved. What is detected could be the sign of many disorders and thus doesn’t tell what is wrong. This heart problem may be transient or linked to another condition. It may not exist at all. It is the role of the family doctor to capture the full picture and prioritize issues and define treatments.

What we envision is a kind of tooling for family doctors that is coupled to a physiology model. It is actually an extension of what doctors use today, it is not a revolution. Today doctors can browse conditions and remedies on their computer according to parameters they define.
What are missing are the data sources, that for now are provided through medical diagnosis that is the realm of specialists.

Ideally, no new software should be installed on the doctor’s computer, she would securely access the physiology modeling tool through her browser. And this physiology modeling tool would access her data sources securely without the doctor’s IT service having to install them.

Having multiple convergent information

When it comes to studying heart sounds with algorithms, it is important to not be naive, technology is a tool, not an oracle. Mastering what we do is of utmost importance and having multiple informations helps to gain confidence in the outcome of algorithms.

One important information for segmenting the heart beat in sounds, is the heart rate, but it is much better to know approximatively when heart beats begins.
Heart beat events are easy to recognize with ECG because there is a very distinctive figure, the “R” peak which signals the contraction of ventricles.
However the ECG is difficult to sample and it does not give easily much other usable information. For example one has to interpret deflection of waves or figure out if missing beats are normal or not. Surprisingly our heart can miss beats, for example if we move during the sampling.
Phonocardiograms are easier to sample and they give more trusty information.

However automatically segmenting Phonocardiograms is not an easy task either. There is a well known challenge for heart specialists on this subject: Physionet/CINC and indeed there are the Kaggle challenges for the data scientists.

Phonocardiograms are usually obtained with Doppler ultrasounds or with electronic stethoscopes. But it could be sampled with ordinary microphones as well, providing that ambient sounds are not too loud.

As Webcams are often equipped with microphones and SBC like Raspberry Pi can use webcam out of the box I was thinking of using photoplethysmography to at least get the heart rate. This is an ongoing work but results are encouraging. I am using a Logitech 270 webcam.

The result of this study is intended to be included in my Hjerte software package that now runs on a Raspberry Pi.

About usefulness of early detectors

When one claim to design an early detector of some illness, it is important to think about early detection in healthcare in general, what are the benefits, inefficiencies and the new problems it creates.

As a personal anecdote about usefulness of early detectors, I had a skin carcinoma, that two doctors (the family and the company MDs) saw without reacting for years, one of them even asked me what it was. At the end it was a cardiologist who told me it was probably a carcinoma and that I had to consult quickly a specialist.
MDs have to know what to make of the tests results of those devices. For example some medical organizations start to provide free kit for genetic screening for some conditions [0], as we know some drugs work well for some genomes but less for others, which is a concept a bit weird in itself but very fashionable at the moment.
But those kits do not work the same way, so their results are not comparable with each others, some may analyze the DNA in blood, while others may take a sample with a biopsy needle. Neither can claim to capture the full picture of the tumor’s mutations. In addition tumors’ genome evolves very quickly and is not homogeneous, it is as if many mutations are branching out quickly from a common ancestor cell. At some time later a tumor is the site of several unrelated mutations.
Some tests sometimes provide conflicting or overlapping results from the same patient. Researchers at the University of California, San Diego, published a 168-patient study on discordance in early 2016 that shows there are overlap as well as differences, between DNA analyses from biopsies of tissue and blood samples.
Some tests even make suggestions for drugs, studies have shown that different commercial solutions may in some cases suggest different drugs, or do not suggest drugs that a MD would have prescribed. Those commercial products need to improve, and doctors’ professional bodies need to develop guidelines to teach how to cope with those new tools.

Another thing is the false negative, the press reported recently an unfortunate case where a women felt something was wrong with her baby, in the last months of her pregnancy. She then used a fetal Doppler and found an heart beat, unfortunately the baby was stillborn. It is possible that if she had not used her fetal Doppler, she would have gone immediately to her hospital which may have saved the baby.
False positive are another problem, as an older man I am regularly reminded to check for PSA by the state health insurance, PSA (prostate-specific antigen) is a marker of prostate cancer. I am aware of the risk of cancer, but two large studies, one in US and two in Europe told that for a thousand people screened positively, one man will probably be saved, but several dozens will suffer severe degradation in their life quality and health in general.

The testing process may also induce traveling cost for the patient, lost of time and revenues, incomfort or even suffering, especially in women healthcare. Unnecessary biopsies and other medical procedures for people who are wrongly diagnosed or whose cancer might never have spread, can also hasten health problems.
While early detectors might seem a good idea in general, one problem is the anxiety they generate, for example even if everything is right, it does not mean everything will stay right in the future so there is a constant urge to re-check. Even medical doctors could succumb to cognitive bias, when they find “something” in mammography, then ask for more tests which are negative but nevertheless urge to have more frequent testing in the future, creating unnecessary anxiety for the patient[1].

What does all this mean for a designer of an early detector of heart failure? Certainly that there is a need to not make big unwise claims. There is also a need to collaborate with real doctors, not only scientists.
At the same time how to attract attention of people to make them use it and finance R&D ?


Refactoring and randomness test

The first usable versions of our features detection code ( were full of hardwired constants and heuristics.
The code has now been modularized, it spreads several methods with clean condition of method exit.

We were proud that our design was able to look at each beat and heart sound, which is a far greater achievement than what ML code does usually. Something really interesting was how we used compression to detect heart sounds features automatically in each beat.
Now we introduce something similar in spirit: Until now sometimes our code was unable to find the correct heart rate if the sound file was heavily polluted by noise. Now we use a simple statistical test akin to standard deviation, to test the randomness of beats distribution. If it is distributed at random, then it means our threshold is too low: We detect noise in addition to the signal.
This helped us to improve the guessing of the heart rate.

In an unrelated area, we also started to work on multi-HMM, which means detecting several, concurrent features. An idea that we toy with, would be to use our compression trick, at beat level, whereas now it is used at heart sound level. This is tricky and interesting in the context of a multi-HMM. Indeed it makes multi-HMM​ more similar to unsupervised ML algorithms.

Multi HMM for heart sound observations

Up to now the feature detection has used something that I find funny, but it works really well. As we use Hidden Markov Models, we must create a list of “observations” for which the HMM infer a model (the hidden states). So creating trustable observations is really important, it is a design decision that those observations would be the “heart sounds” that cardiologists name S1, S1, etc..

In order to detect those events, we first have to find the heart beats, then find sonic events in each of them. In CINC/Physionet 2016 they use a FFT to find the the basic heart rate, and because a FFT cannot inform on heart rate variability, they compute various statistical indicators linked to heart rate variability.
And its not a very good approach as the main frequency of a FFT is not always the heart beat rate.
Furthermore this approach is useless at the heart beat level and indeed at heart sound level. So what we did, was to detect heart beats (which is harder that one could think) and from that point, we can detect heart sounds.

Having a series of observations that would consist only of four heart sounds, would not be useful at all. After all a Sn+1 heart sound, is simply the heart sound that comes after the Sn heart sound. We needed more information to capture and somehow pre-classify the heart sounds.

It was done (after much efforts) by computing a signature based somehow on a compressed heart sound. Compression is a much more funny thing that it might seem. To compress one has to reduce the redundant information as much as possible, which means that a perfectly compressed signal could be used as a token about this signal, and logical operations could be done with it.

Sometimes people in AI research fantasize that compression is the Graal of machine learning by making feature detection automatic. We are far from thinking that, as that in order to compress one has to understand how the information is structured, and automatic feature detection implies that we do not know its structure.

It is the same catch-22 problem that the Semantic Web met 10 years ago, it can reason on structured data but not on unstructured data, and the only thing that would have been a real breakthrough was reasonning on unstructured data. That is why now we have unsupervised Machine Learning with algorithms like Deep Forest. While Cinc 2016 submissions used heavily unsupervised ML, we used compression (Run Limited Length) to obtain a “signature” of each heart sound, and it works surprisingly well with our HMM.

The next step is to implement a Multi-HMM Approach​, because there are other possibilities to pre-categorize our heart sounds than its RLL signature, for example the heart sound might be early or late and that characteristic could be used to label it.

Heart beat detection and segmentation

This is a quick description of our early detection of heart failure algorithm for feature identification and segmentation.

1) A sound file consists basically of a float array and sampling rate.
2) One normalizes this sound in amplitude (but we can do without) and in sampling rate (2000 times per seconds)
3) Contrary to what is done in Physionet 2016, there is no filtering or elimination of “spikes”.
4) The cardiac rhythm is detected through events that are roughly the S1 events ( This is not trivial as there are noises, spikes, “plops”, respiration, abnormal heart sounds, human speaking and in one case even dog barks! Basically there are two ways to achieve the beat detection, one is to find the frequency of the envelope of the signal, the other is to make a FFT. But both approaches are subjectives as they imply to decide beforehand what is an acceptable heart rate. What does not help is that heart rate can roughly go from 30Hz to 250Hz, so as we can detect three or four main frequencies between 30Hz to 250Hz in the FFT of the sound file, which one is the correct one? One cannot decide in advance. What we do is having a two steps and indirect approach that fails quickly if we make a wrong guess. So it could converge quickly toward a result:
* In the first step we try to estimate the heart beat duration independently of the heart rate (because there are noise sources that make the heart rate counting unreliable).
* We use this first estimation to make a better guess of the heart rate in the second step. If the guess gives obviously wrong results we change the threshold for detecting heart rate.

4-1) One seeks to find the optimal detection level (threshold) enabling to detect heart beat rate in this sound file.
4-1-1) We create a window on the sound file, that is very low-pass filtered (currently 200Hz)
4-1-2) We register times of downward passage through the current treshold level
4-1-3) If a convincing heart rate (not necessarily exact) is obtained, the level of detection is kept, otherwise it is lowered and the procedure is repeated.
4-2) With this threshold, we will try to obtain a cardiac rhythm with roughly the same procedure as above but with a window filtered at 1000Hz and in a wide interval around the heart rate obtained in the first step.
4-2-1) We create a sound window which is low-pass filtered (currently 1000Hz)
4-2-2) A downward passage through the threshold is detected, ad hoc margins are imposed between the start and the end of the event.
4-2-3) If a convincing cardiac rhythm is obtained, the heart rate is kept, otherwise the level of detection is lowered and we start again

So we have here an approach which is very progressive, yet delivers results in a short time. It is also quite insensitive to sound events that could derail the heart beat counting, as the first step provide some good indications of the area where is the real heart beat, one spike makes the closer S1 perhaps not detected, but we gain a good idea of the heart duration.
5) In addition to S1, many other events are detected. A priori we assume that these are the other events S2, S3, S4. Even numbers going higher than four, it is useful for unusual heart sound classification.
6) These two Sx events detections are brought closer together
6-1) The list of S1 events is made more reliable, which makes it possible to deduce the events S2, S3, S4 (
6-2) A signature of the heart beat is computed to acknowledging there are more in one heart beat than its time of arrival and duration. We tested several schemes, and decided to use a Huffman compression of the heart beat. We had also the idea to use this to yet another kind of feature detection without training but it is not implemented at the moment because of lack of resources.
7) From there one can either train a HMM, or classify. We go out of cardiac specific, it’s just an HMM
8) One interprets the classification made by the HMM, with a score of similarity and comments on the events Sx

Another difference with Physionet 2016 is that they add a second approach, which is the variability of cardiac rhythm, they calculate a lot of indicators, but the HMM does it more accurately, with a probability of Transition from internal to internal state that is explained, instead of being a scalar computed over the whole file.

In a future version, I would like to work on the frequency peaks that can be identified with an FFT.
The general idea would be to look for what would be “apparently” harmonics but which in fact would indicate echoes on different paths.

Heart sound files analysis

Not much to show, but some news:
Sounds files have problems that I did not anticipated. What I was expecting from the analysis of Physionet 2016 submissions was noise, spikes, weird amplitude and similar distortions of the signal.
What I found was different, there is little noise while you filter it a bit, there are few spikes.
However sometimes the signal is biased (more negative values than positive), the signal also appears to have little in common with textbooks, I can easily detect S1 and S2 events, but it is difficult to find S3 and S4.

When you hear the sounds, half of them looks weird, I am not a cardiologist, but I find it difficult to find what I could hear in a “textbook” heart sound.
This makes me think again about the Physionet 2016, successful submissions where mainly about heavily filtering, dealing with spikes with sophisticated algorithms and finding characteristics (features in ML slang) that encompass the whole file such as RR variability as in:​

Clearly my approach is different, I focus on what identify a heart beat, which is entirely new. But I still plan to implement the RR variability analysis and tied it to my HMM classifier which will become quite hybrid in the process.