The job market is tough right now.
As many people know too well, it has not been a good year for jobs in pharma and biotech. As STAT News reported, it is hard out there for job seekers.
I have previously done some data mining for pharma job postings, so I went ahead and browsed more recent data to understand whether data science jobs might have been affected more or less than other roles.
My methods are imperfect, but I focused on top pharma and jobs descriptions that focused on data science and related-skills.
I used data from November 2022, February 2023, and November 2023 (when I downloaded the job descriptions).
library(dplyr) # data wrangling
library(gt) # make nice tables
library(ggplot2) # nice plots
# plot theme
theme_set(theme_minimal(base_family = "Avenir"))
# set seed
set.seed(1203)
data_files <- c("2022-11-03/combined_data2.csv",
"2023-02-20/combined_data2.csv",
"2023-11-21/combined_data2.csv")
jobs <- purrr::map_dfr(
file.path("~/Documents/code/job_description_collector/data/", data_files),
read.csv, .id = "id")
date_map <- data.frame(
id = as.character(1:length(data_files)),
date = dirname(data_files)
)
jobs <- jobs |>
left_join(date_map, by = "id")
# require distinct descriptions
jobs <- distinct(jobs,
date, company, title, description, .keep_all = TRUE) |>
add_count(date, company, name = "n_company")
Looking at all jobs, we can see that the postings have decreased from February to November.
jobs |>
distinct(date, n_company, company) |>
ggplot(aes(date, n_company, group = company)) +
geom_line() +
geom_point(aes(color = company)) +
coord_cartesian(ylim = c(0, NA)) +
labs(title = "Total jobs per company",
y = "Jobs", x = "Date")
Next, I labeled jobs as “data jobs” if the descriptions had R or data science in them. Most the jobs are not “data jobs,” so to look at the trends, I normalize the values to the first data point (November, 2022).
Back in November of 2022, there were 232 jobs that mentioned R or “data science”. In November of 2023, that is down to only 66 jobs.
I also plotted “chemistry” and “sales” as comparisons. It seems that some types of roles have been hit harder than others!
jobs_w_type <- mutate(jobs,
job_type = case_when(
grepl("[^(Maurice)] R[\\., ][^(&N.Ph)]", description) |
grepl("[Dd]ata [Ss]science", description) ~ "data",
grepl("[Cc]hemistry", description) ~ "chemistry",
grepl("[Ss]ales", description) ~ "sales",
TRUE ~ "other")
) |>
count(job_type, date) |>
group_by(job_type) |>
mutate(fraction = n / n[date == "2022-11-03"])
jobs_w_type |>
ggplot(aes(date, fraction, group = job_type, color = job_type)) +
geom_line() +
geom_point() +
coord_cartesian(ylim = c(0, NA)) +
scale_y_continuous(labels = scales::percent) +
scale_color_manual(values = c("#87B8DA", "firebrick4", "#87B8DA", "#87B8DA")) +
geom_text(aes(label = job_type), nudge_x = 0.1, hjust = 0,
data = filter(jobs_w_type, date == "2023-11-21"),
family = "Avenir") +
labs(title = "Jobs by type",
y = NULL, x = "Date",
color = "Job type") +
theme(legend.position = "none")
Good luck out there job seekers! If anyone is looking for job search advice, feel free to reach out to see if I can help.
pander::pander(sessionInfo())
R version 4.3.0 (2023-04-21)
Platform: aarch64-apple-darwin20 (64-bit)
locale: en_US.UTF-8||en_US.UTF-8||en_US.UTF-8||C||en_US.UTF-8||en_US.UTF-8
attached base packages: stats, graphics, grDevices, utils, datasets, methods and base
other attached packages: ggplot2(v.3.4.4), gt(v.0.10.0) and dplyr(v.1.1.2)
loaded via a namespace (and not attached): gtable(v.0.3.3), jsonlite(v.1.8.4), highr(v.0.10), compiler(v.4.3.0), Rcpp(v.1.0.10), tidyselect(v.1.2.0), xml2(v.1.3.4), jquerylib(v.0.1.4), scales(v.1.2.1), yaml(v.2.3.7), fastmap(v.1.1.1), R6(v.2.5.1), labeling(v.0.4.2), generics(v.0.1.3), knitr(v.1.42), tibble(v.3.2.1), pander(v.0.6.5), distill(v.1.6), munsell(v.0.5.0), bslib(v.0.4.2), pillar(v.1.9.0), rlang(v.1.1.1), utf8(v.1.2.3), cachem(v.1.0.7), xfun(v.0.39), sass(v.0.4.5), memoise(v.2.0.1), cli(v.3.6.1), withr(v.2.5.0), magrittr(v.2.0.3), digest(v.0.6.31), grid(v.4.3.0), rstudioapi(v.0.15.0), lifecycle(v.1.0.3), vctrs(v.0.6.4), downlit(v.0.4.2), evaluate(v.0.20), glue(v.1.6.2), farver(v.2.1.1), fansi(v.1.0.4), colorspace(v.2.1-0), purrr(v.1.0.2), rmarkdown(v.2.21), tools(v.4.3.0), pkgconfig(v.2.0.3) and htmltools(v.0.5.5)
If you see mistakes or want to suggest changes, please create an issue on the source repository.
For attribution, please cite this work as
Walsh (2023, Dec. 3). Alice Walsh: Data Science Jobs in Pharma in 2023. Retrieved from https://awalsh17.github.io/posts/2023-12-03-job-report-2023/
BibTeX citation
@misc{walsh2023data, author = {Walsh, Alice}, title = {Alice Walsh: Data Science Jobs in Pharma in 2023}, url = {https://awalsh17.github.io/posts/2023-12-03-job-report-2023/}, year = {2023} }