A nice workshop

RHAT workshop

I have recently been invited to give a talk about my R package missingHE at the R for HTA summer workshop 2021, and I would like to spend a couple of words to describe what was my experience at this event which for me was a first-time. Here you can see one of my tweets about the event (I will try to use Twitter more often in the future, promise!)

In general, I found it to be a very interesting conference with a crazy amount of people coming from all over the world who were interested, in one way or another, in the use and application of R for health economic assessment. To be honest, the fact that the workshop was fully online, for obvious reasons, may have encouraged lots of people to join compared to what would have been a standard in person attendance. Even so, I am quite impressed by the many people engaging in discussion and the very interesting topics raised. I felt that my presenation was very appreciated and I had a ton of fun discussing some missing data things with some of my colleagues at the workshop, including Manuel Gomes and Gianluca Baio who I know very well from my past at UCL. However, missing data analysis can be really hard to grasp from a mere presentation and I hope I simply gave some intuitions to some of the people in the audience about the importance that missing data assumptions cover in any analysis.

Aside from my personal stuff, which I really love despite not being very appealing to everybody, I have to pay my compliments to the organisers of the the event which ran smoothly and with some very interesting back and forth discussions between people coming from different places and positions but all with the common interest in HTA analyses. In particular, the discussion panel following my presentation was very interesting as it opened up a “hot” argument in regard to the use of R for HTA in the industry. All points discussed were valid and I felt that two of the main concerns which hold people in the HTA industry from using R in their analyses are:

  • Clients do not like going through the R code in order to understand the model structure and results. They have been used to a standardised way to report the model which is often based on an Excel spreadsheet in which they can play around with the cells and see the results for themselves. Since they are unfamiliar with R, they do not want to spend extra time trying to figure out what the model is like based on the R code. However, they are well aware that Excel is not able to perform any advaced statistical modelling and that all models are nowadays implemented in software such as R.

  • There is a general concern about the “quality” of the R packages available on CRAN or on the individual GitHub repositories of different developers. Many people made the comparison with software such as STATA or SAS which cannot be freely updated by the users but where there is a reference body who is in charge of making all the testing and checks before a new version or update is realeased to the public. On the contrary, R packages do not have such checks as the only requirement for a package to be uploaded on CRAN is that is does not crash when called.

I agree that both points are valid and in fact this is something which should be seriously taken into account in order to encourage people to use R more often. If I may say my humble opinion, what I would argue is that the presence of a “unique” and “validated” reference for the checking and updating of the software routines is not necessarily an advantage in all situations. For example, if someone develops a new model which has not been yet implemented in any current functions, then it will be very difficult that a new version of the software is released simply to include this new model but it will be more likely that a certain amount of time will pass (so that more updates can be included all together) before the new commands become available to everybody. In this case, a software like R gives the chance to any developer to implement their method within a new package which can be uploaded and made availale to everybody in a very short time. Of course, it also important that proper testing is carried out to ensure to minimise the amount of bugs or issues that mya arise from using this new functions. Although these problems occure more frequently within a software framework such as R, the free-user nature of the software allows everybody encountering an issue in the use of the package to contact the developer, point out the problem and ask for a solution which may become available to everybody in a matter or days or even hours! I personally think that going back to a unique controller for checking the quality of all packages goes against the spirit behind the use of R as a free-user software that anyone can use to create, update and extend packages so to make them available to everybody without the need to wait for an external and impartial controller to do the checks. People will always find some issues or bugs for a new package and, in time, the more people use it the more the corresponding functions will be tested and will ensure a high quality resource for any newcomer.

As for the “problem” with the clients, I think this is a very delicate issue as, like in any private sector, it is important to try to match the needs of the client as much as possible. At the moment I think the standard approach is to implement the model in R and then “copy” the results into an Excel speadsheet giving some powers to the client in terms of changing some inputs and see how this affects the results. This is of course very time consuming and also frustrating at times. A possible solution would be to use the web version of R called R Shiny which is a sort of user-friendly interface which allows people to play around with a model develped in R in a fashion which resembles the familiar Excel output clients are used to see. It is not perfect as, most of the times, the amount of modifications which are allowed to the clients is quite small and it is not really possible to perform any serious debugging of the code unless looking directly at the R code itself which raises the same problem as above. Personally, I think that it will be a matter of time until Excel outputs will gradually go away from the international landscape of HTA. This has already happened in most of academia, particulalry in the UK, where statistical analyses are performed using statistical software. Soon the need to implement and improve more complex models in order to comply with the regulations and guidelines of decision-makers will make the simple idea of using Excel as something obsolete and both consultants and clients will need to adapt to the new standard, which is likely to be a statistical software such as R.

Maybe it is stil not the time, but we are getting there I am confident in this!