This method measures lifecycle open science on OSF by looking at open science practices linked to open science registrations.
Definitions
OSR: An OSF Registration that is jointly open, non-deprecated and authentic
Openness Criteria:
Public (open-1)
Not Embargoed (open-2)
Non-Deprecation Criteria:
Registered (nondeprecated-1)
Not Deleted (nondeprecated-2)
Not Retracted (nondeprecated-3)
Authenticity Criteria:
Not Spam (authentic-1)
LOS-Reg: An OSR that represents a lifecycle open science research project.
Lifecycle Opennness Criteria:
Output - The OSR has at least 1 linked Open Practice Resource from among: “Data”, “Analytic Code”, “Materials” or “Supplements”.
Outcome - The OSR has at least 1 linked “Papers” Open Practice Resource.
Results
See the code
# Packageslibrary(arrow)library(ggplot2)library(ggiraph)library(glue)library(gt)library(dplyr)library(lubridate)library(plotly)library(scales)library(tidyr)library(timetk)# Modulesbox::use( R / helpers[tidy_registry_names, tidy_template_names], R / plot[pivoter, factorizer, ts_prep], R / parameters[DATES, OUTPUTS, OUTCOMES])OUTPUTS <- stringr::str_to_title(OUTPUTS)OUTCOMES <- stringr::str_to_title(OUTCOMES)# Local functionstable_helper <-function(tbl, title ="Lifecycle Open Science Registrations", interactive =TRUE, ...) { gtbl <- tbl |>gt() |>tab_header(title = title,subtitle =paste0("as of ", MOST_RECENT_CHR) ) |>fmt_number(columns =c(OSR:`LOS-Reg`), decimals =0) |>fmt_percent(columns =`LOS-Reg / OSR`, decimals =2) |>tab_footnote(footnote =md("*OSR: Open Science Registration*"),locations =cells_column_labels(columns = OSR) ) |>tab_footnote(footnote =md("*LOS-Reg: Lifecycle Open Science Registration*"),locations =cells_column_labels(columns =`LOS-Reg`) ) |>opt_footnote_marks("letters") |>opt_row_striping(row_striping =TRUE) if (interactive) { gtbl |>opt_interactive(...) |>opt_horizontal_padding(0) } else { gtbl }}tte_table <-function(tbl, title ="Time to Lifecycle Open Status (in days)", group_label =NULL, ...) {# Set title TITLE <-"Time to Lifecycle Open Status (in days)"if (!is.null(group_label)) { TITLE <-paste0(TITLE, " by ", stringr::str_to_title(group_label)) }# Subset data tbl <- tbl |>select(..., event, n, p, mean, p50, sd, min, max) |>filter(tolower(event) %in%c("output", "outcome", "lifecycle")) |>mutate(event = stringr::str_to_title(event),event =factor(event, levels =c("Output", "Outcome", "Lifecycle")) ) |>arrange(event) |>mutate(event =as.character(event))# Format table gtbl <- tbl |>gt(row_group_as_column =TRUE) |>tab_header(title = TITLE,subtitle ="Among all Open Science Registrations (OSR)") |>cols_label(event ="Event",n ="Count",p ="Percent",mean ="Mean",p50 ="Median",sd ="Std. Dev.",min ="Min",max ="Max" ) |>#Spannerstab_spanner(columns =c(n, p), label ="Prevalance") |>tab_spanner(columns =c(mean, p50, sd, min, max), label ="Distribution") |># Value formattingfmt_number(columns =c(n, mean, p50, sd, min, max), decimals =0) |>fmt_percent(columns = p, decimals =1)# Return table gtbl}# Data sourcesall_ts <-read_parquet(here::here("data/registration_tsmonthly.parquet")) |>filter(date >="2022-08-01")registry_ts <-read_parquet(here::here("data/registration_registries_tsmonthly.parquet")) |>filter(date >="2022-08-01") |>mutate(registry =tidy_registry_names(registry))template_ts <-read_parquet(here::here("data/registration_templates_tsmonthly.parquet")) |>filter(date >="2022-08-01") |>mutate(template =tidy_template_names(template)) |>group_by(date, template) |>summarise_all(sum) # Pivot for plots and assign labelsall_summary <-pivoter(all_ts) |>ts_prep()registry_summary <-pivoter(registry_ts, registry) |>ts_prep()template_summary <-pivoter(template_ts, template) |>ts_prep()# ConstantsMOST_RECENT_CHR <-max(all_ts$date)MOST_RECENT <-ymd(MOST_RECENT_CHR)TABLE_CRITERIA <-c("los_plan", "los_outputs", "los_outcomes", "los_complete")TABLE_NAMES <-c("OSR", "OSR + Output(s)", "OSR + Outcome(s)","LOS-Reg")
Today
Here is the current state of LOS research projects on the OSF as of 2025-10-01. For all of the summary tables below, the following fields are included:
<GROUP_NAME>: An optional grouping variable for seeing disaggregated results (e.g., by registry, registration template, etc.)
OSR: Number of Open Science Registrations (i.e., Open + Non-deprecated + Authentic)
OSR + Output(s): Number of Open Science Registrations with at least one linked output resource
OSR + Outcome(s): Number of Open Science Registrations with at least one linked outcome resource
LOS-Reg: Lifecycle Open Science Registrations - number of Open Science Registrations with at least one linked outcome and at least one linked output resource
LOS-Reg / OSR: Percentage of Open Science OSF Registrations (OSR) that are Lifecycle Open Science Registrations (LOS-Reg)
# TODO: Compute date of OSR status from logged event tables, not just current statusreg_tbl <-read_parquet(here::here("data/registration_current.parquet")) |>filter(is_osr ==1) |>select(node_id, registry, registered_date) reg_events <-read_parquet(here::here("data/registration_badges_time.parquet")) |>inner_join(reg_tbl, by ="node_id", relationship ="many-to-one")#' Time to event summarizertte_summarizer <-function(tbl, t = time_to_event, x = event, ...) {# Totals df_n <- tbl |>summarize(.by =c(...),N =n_distinct(node_id) ) |>cross_join(distinct(tbl, {{ x }}))# Summary tbl |>summarise(.by =c({{ x }}, ...),n =sum(!is.na({{ t }})),mean =mean({{ t }}, na.rm =TRUE),p50 =median({{ t }}, na.rm =TRUE),sd =sd({{ t }}, na.rm =TRUE),min =min({{ t }}, na.rm =TRUE),max =max({{ t }}, na.rm =TRUE), ) |>left_join(df_n) |>mutate(p = n / N ) |>select(..., {{ x }}, n, p, mean, p50, sd, min, max)}
Unless otherwise stated, the starting sample for these analyses is all Open Science Registrations (OSR). As of 2025-10-01, there are 1,988 Open Science Registrations (OSR) in the OSF.
The following table summarizes how long it takes for an OSR to become fully lifecycle open. Each row corresponds to three different milestones in the research project lifecycle:
The first linked output resource
The first linked outcome resource
The achievement of lifecycle open status
The “Prevalance” columns report the number and percentage of OSRs that have reached each milestone, while the “Distribution” columns provide summary statistics characterizing the center, spread, and range of time to each milestone (in days).
See the code
df_summary <-tte_summarizer(df_tte, t = time_to_event, x = event)tte_table(df_summary)
Time to Lifecycle Open Status (in days)
Among all Open Science Registrations (OSR)
Event
Prevalance
Distribution
Count
Percent
Mean
Median
Std. Dev.
Min
Max
Output
1,452
73.0%
449
355
442
0
3,270
Outcome
892
44.9%
648
564
491
−109
3,270
Lifecycle
356
17.9%
662
561
468
0
3,270
The following interactive graphic provides the same information via a boxplot.
We can disaggreate our findings further by looking at the prevalance and distribution of time for individual resources (code, data, materials, supplements, and papers). As before, we present results via a table and an interactive boxplot.
The following tables looks at differences in time to lifecycle open science moments (first output, first outcome, and lifeycle open) by registry. To ease interpretion, rows with zero counts are omitted. Results can be sorted along any column and the search box can be used to filter results.
See the code
df_summary_registry <- df_tte |>left_join(select(reg_tbl, node_id, registry), by ="node_id") |>tte_summarizer(t = time_to_event, x = event, registry) |>mutate(registry =tidy_registry_names(registry)) |>filter(n >0) |>rename(Registry = registry) tte_table(tbl = df_summary_registry, title ="Time to Lifecycle Open Status by Registry (in days)", group_label ="Registry", Registry) |>opt_interactive(use_search =TRUE, use_pagination =FALSE) |>opt_horizontal_padding(0)
Time to Lifecycle Open Status (in days) by Registry
Among all Open Science Registrations (OSR)
Sequencing
There are several potential insights related to the sequencing of resource connections. The following table summarizes sequencing patterns for OSRs with at least two linked resources. The rows correspond to resource types (code, data, materials, supplements, and papers), while the columns sequencing order.
For instance, when looking within the “First” column, we see that Data is most frequently linked first (41%), followed by Papers (31%).
Among all Open Science Registrations (OSR) [n = 1,988]
Resource Sequencing and Latency
A third potential line of inquiry explores how/whether sequence order relates to overall latency times for resource connections of different types. Due to the limited number of OSRs with all five or at least four linked resources of any type, the following table summarizes latency times by resource type and sequence order.
We can also examine the efficiency of Lifecycle Open Science Registrations over time. By efficiency, we mean the ratio of Lifecycle Open Science Registrations to Open Science Registrations. In other words, the percentage of Open Science Registrations that meet criteria for Lifecycle Open Science.
We can also plot these percentages over time for each registry. Because of the disparity in the number of registrations between registries, we calculate the percentage of registrations that meeting Open Science Registration criteria. This is the LOS-Reg / OSR netric from the previous table.