Data Science Jobs in Pharma in 2023

The job market is tough right now.

Alice Walsh true
2023-12-03

As many people know too well, it has not been a good year for jobs in pharma and biotech. As STAT News reported, it is hard out there for job seekers.

I have previously done some data mining for pharma job postings, so I went ahead and browsed more recent data to understand whether data science jobs might have been affected more or less than other roles.

My methods are imperfect, but I focused on top pharma and jobs descriptions that focused on data science and related-skills.

Overall, jobs are down

I used data from November 2022, February 2023, and November 2023 (when I downloaded the job descriptions).

Show code
library(dplyr)     # data wrangling
library(gt)        # make nice tables
library(ggplot2)   # nice plots

# plot theme
theme_set(theme_minimal(base_family = "Avenir"))
# set seed
set.seed(1203)

data_files <- c("2022-11-03/combined_data2.csv",
                "2023-02-20/combined_data2.csv",
                "2023-11-21/combined_data2.csv")
jobs <- purrr::map_dfr(
  file.path("~/Documents/code/job_description_collector/data/", data_files), 
  read.csv, .id = "id")
date_map <- data.frame(
  id = as.character(1:length(data_files)),
  date = dirname(data_files)
)
jobs <- jobs |> 
  left_join(date_map, by = "id")

# require distinct descriptions
jobs <- distinct(jobs,
                 date, company, title, description, .keep_all = TRUE) |> 
  add_count(date, company, name = "n_company")

Looking at all jobs, we can see that the postings have decreased from February to November.

Show code
jobs |> 
  distinct(date, n_company, company) |> 
  ggplot(aes(date, n_company, group = company)) + 
  geom_line() +
  geom_point(aes(color = company)) + 
  coord_cartesian(ylim = c(0, NA)) + 
  labs(title = "Total jobs per company",
       y = "Jobs", x = "Date")

There are few data science jobs

Next, I labeled jobs as “data jobs” if the descriptions had R or data science in them. Most the jobs are not “data jobs,” so to look at the trends, I normalize the values to the first data point (November, 2022).

Back in November of 2022, there were 232 jobs that mentioned R or “data science”. In November of 2023, that is down to only 66 jobs.

I also plotted “chemistry” and “sales” as comparisons. It seems that some types of roles have been hit harder than others!

Show code
jobs_w_type <- mutate(jobs, 
               job_type = case_when(
                 grepl("[^(Maurice)] R[\\., ][^(&N.Ph)]", description) |
                   grepl("[Dd]ata [Ss]science", description) ~ "data",
                 grepl("[Cc]hemistry", description) ~ "chemistry",
                 grepl("[Ss]ales", description) ~ "sales",
                 TRUE ~ "other")
) |> 
  count(job_type, date) |> 
  group_by(job_type) |> 
  mutate(fraction = n / n[date == "2022-11-03"]) 

jobs_w_type |> 
  ggplot(aes(date, fraction, group = job_type, color = job_type)) + 
  geom_line() +
  geom_point() + 
  coord_cartesian(ylim = c(0, NA)) + 
  scale_y_continuous(labels = scales::percent) + 
  scale_color_manual(values = c("#87B8DA", "firebrick4", "#87B8DA", "#87B8DA")) + 
  geom_text(aes(label = job_type), nudge_x = 0.1, hjust = 0,
            data = filter(jobs_w_type, date == "2023-11-21"),
            family = "Avenir") + 
  labs(title = "Jobs by type",
       y = NULL, x = "Date",
       color = "Job type") + 
  theme(legend.position = "none")

Good luck out there job seekers! If anyone is looking for job search advice, feel free to reach out to see if I can help.

sessionInfo

pander::pander(sessionInfo())

R version 4.3.0 (2023-04-21)

Platform: aarch64-apple-darwin20 (64-bit)

locale: en_US.UTF-8||en_US.UTF-8||en_US.UTF-8||C||en_US.UTF-8||en_US.UTF-8

attached base packages: stats, graphics, grDevices, utils, datasets, methods and base

other attached packages: ggplot2(v.3.4.4), gt(v.0.10.0) and dplyr(v.1.1.2)

loaded via a namespace (and not attached): gtable(v.0.3.3), jsonlite(v.1.8.4), highr(v.0.10), compiler(v.4.3.0), Rcpp(v.1.0.10), tidyselect(v.1.2.0), xml2(v.1.3.4), jquerylib(v.0.1.4), scales(v.1.2.1), yaml(v.2.3.7), fastmap(v.1.1.1), R6(v.2.5.1), labeling(v.0.4.2), generics(v.0.1.3), knitr(v.1.42), tibble(v.3.2.1), pander(v.0.6.5), distill(v.1.6), munsell(v.0.5.0), bslib(v.0.4.2), pillar(v.1.9.0), rlang(v.1.1.1), utf8(v.1.2.3), cachem(v.1.0.7), xfun(v.0.39), sass(v.0.4.5), memoise(v.2.0.1), cli(v.3.6.1), withr(v.2.5.0), magrittr(v.2.0.3), digest(v.0.6.31), grid(v.4.3.0), rstudioapi(v.0.15.0), lifecycle(v.1.0.3), vctrs(v.0.6.4), downlit(v.0.4.2), evaluate(v.0.20), glue(v.1.6.2), farver(v.2.1.1), fansi(v.1.0.4), colorspace(v.2.1-0), purrr(v.1.0.2), rmarkdown(v.2.21), tools(v.4.3.0), pkgconfig(v.2.0.3) and htmltools(v.0.5.5)

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Citation

For attribution, please cite this work as

Walsh (2023, Dec. 3). Alice Walsh: Data Science Jobs in Pharma in 2023. Retrieved from https://awalsh17.github.io/posts/2023-12-03-job-report-2023/

BibTeX citation

@misc{walsh2023data,
  author = {Walsh, Alice},
  title = {Alice Walsh: Data Science Jobs in Pharma in 2023},
  url = {https://awalsh17.github.io/posts/2023-12-03-job-report-2023/},
  year = {2023}
}