Blog
From an expert statistician’s tool kit: R vs Python programming language
VP Prasanth, Associate Director, Business Solutions
Jun 02, 2021

In recent years, leaders in the pharmaceutical industry have relied on data science to guide their decision making around vital drug development objectives. Data science experts have multiple layers of responsibilities that include exploring and evaluating data; model development and validation; examining research patterns and generating data insights. And, after all of that, they have to effectively communicate the results in a meaningful way that impacts decisions. Requiring key expertise, functional service providers (FSPs) can play a critical role in guiding and advancing these responsibilities.

In clinical research, sponsors and some service partners typically use legacy systems like SAS to analyze data, input results and generate insights from it. It’s a conventional method to data analysis. However, being a paid software, SAS does not allow for open-source algorithms. In the last decade, it has gradually become apparent to industry stakeholders that taking advantage of historical data has benefits worthy of consideration for sponsors.

In aiming to best leverage data insights and ensure enhanced decision making and cost-effective process improvements, it is critical to know what tools experts in statistics, analytics and visualization may find critical for success, which a functional service provider (FSP) manager can help identify per the project needs. Statisticians have been migrating to is open source R and Python programming languages as a supplement to SAS software.

R and Python are gaining relevance as part of the drug discovery and development process. As open-source programming environments with a robust and committed online communities, R and Python have become a powerful set of tools for statisticians and are used extensively within drug development—from molecule to market.

893_Businessman and businesswoman using laptop in conference room

Analyzing high volume data

By integrating the data and applying useful algorithms, sponsors can benefit from the extensive data insights needed to make clinical trials more agile and adaptable, ultimately increasing productivity and efficiency to improve success rates. It is important to heighten these efficiencies using the sophisticated tech-enabled solutions and best in analytics platforms to tweak, analyze and leverage large amount of data.

Depending on the individual project needs, experienced FSP statisticians will ask the question, “What is best to use—R or Python?”

Usability differentiators

In data science, both R and Python are freely available and popular open source programming languages. But, there are key differences that those working in data analysis should consider. In a nutshell, Python can be considered superior as a programming tool for text analytics and mining of big data. Consider, for example, actigraphy data collection: This can easily generate huge volumes of data that need to be processed, particularly if one is using a functional statistics approach to the analysis. On the other hand, R provides more value for statistical analysis and visual data mining needs.

Integrating FSP experts who hold the right experience and technical training to know these programming languages well and understand how to maximize the benefits of each is extremely useful when working to secure key data points that will guide the decision-making process for sponsors and study teams.

Statisticians will need to be able to determine which language to use based on the problem under study or key objectives of the analysis at hand. For example, if one is needing statistical analysis rather than text analysis, R programming language is ideal. Alternatively, if one needs to develop a web application or prefers to use general purpose programming language, Python is the preferred choice, given it supports varying data formats. Though for web development one might also want to consider Shiny apps, which allow for a quick and easy statistical application delivered via the browser. The table below (fig1) outlines a simplified set of criteria a biostatistician could use to determine which software fits the problem at hand.

Criterion

SAS

R

Python

Cost

Paid

Free

Free

Learning

Moderate

Moderate

Moderate

Data manipulation

High

High

High

Analytic modelling

High

High

Medium

Graphical capabilities

Low

High

Medium

Text processing

Low

Medium

High

Deep learning support

Low

Medium

High

Common usages

High

High

Medium

Customer support

High

Medium

Medium

Advancement in tool

Low

High

Medium

Fig.1 outlines a comparison of known features / benefits between each solution based of Prasanth’s own experience.

AI/ML problem solving Generally, Python is the best fit to grab information from the web and is also good in handling unstructured data libraries that includes popular tools like scikit-learn, Keras, and TensorFlow, which eventually enable data scientists to develop sophisticated data models that plug directly into a production system. However, this does not necessarily mean R programming language cannot perform these tasks.

With the growing R community and augmenting free libraries and packages available in the CRAN (e.g., e1071, , CARET, H20 etc.), R is now better equipped to handle problems that stem from AI/ML and deep learning. Furthermore, R language is extensively used for the development of prototypes, especially after the introduction of Integrated Development Environments (IDEs) such as R Studio, which has helped data experts create programs that are easier to navigate and use.

Additionally, similar to Python, R also has capabilities for reproducible research and reporting system in the form of R Markdown and Knitr.

Overall, experienced statisticians will usually have a preference for one tool over another depending on their current role and project needs. Staying focused on sponsors’ overall goals, FSP experts in this field are key contributors to making sure strategies remain agile and ready to make somewhat complex tasks easier and less time consuming to drive more quality outcomes, while shortening timelines and reducing costs for sponsors.

For questions, training or to learn more about outsourcing biostatistical projects using R or Python programming language and other statistical tools, please contact GlobalFSPGTM@iqvia.com.

Related solutions

Contact Us