This vignette examines candidacy criteria for lifecycle open science among OSF Registrations.
Candidate registrations meet standards for and openness, non-deprecation, authenticity. Specifically, an open-science registraiton must be:
Open
Public (open-1)
Not embargoed (open-2)
Non-deprecated
Registered (nondeprecated-1)
Not deleted (nondeprecated-2)
Not retracted (nondeprecated-3)
Authentic
Not spam (authentic-1)
Operationalization
We can use the date variables from the registration dataset and relevant actions from the osf_nodelog table to generate a time series dataset of OSF Registrations.
From the osf_abstractnode table, we already know when the following actions occured:
created
registered_date
deleted
To establish when a registration becomes:
public, we need to look for the made_public and made_private action types in the osf_nodelog table.
(un)embargoed, we need to look for corresponding actions with the embargo_ prefix in the osf_nodelog table.
retracted, we will look for corresponding actions with the retraction_ prefix in the osf_nodelog table.
spam, we look for the actions of type confirm_spam in the osf_nodelog table.
For every criteria of interest (i.e., public, non-embargoed, registered, non-deleted, non-retracted, non-spam), if we cannot find a corresponding action in the osf_nodelog table, we will assume that the criteria is met at the time of creation (created).
Implementation
See registrations.r and registration-ts.r for the bulk of the implementation details. For now, we just load the datasets we need.
See the code
# Packageslibrary(arrow)library(dplyr)library(lubridate)library(tidyr)library(purrr)library(rlang)library(timetk)library(gt)# Modulesbox::use( R / connect[open_parquet], R / helpers[tidy_registry_names], R / plot[pivoter, factorizer, ts_prep])# Data sourcesall_ts <-read_parquet(here::here("data/registration_tsmonthly.parquet"))registry_ts <-read_parquet(here::here("data/registration_registries_tsmonthly.parquet")) |>mutate(registry =tidy_registry_names(registry))# Pivot for plots and assign labelsall_summary <-pivoter(all_ts) |>ts_prep()registry_summary <-pivoter(registry_ts, registry) |>ts_prep()# ConstantsMOST_RECENT_CHR <-max(all_ts$date)MOST_RECENT <-ymd(MOST_RECENT_CHR)TABLE_CRITERIA <-c("total", "open", "not_deprecated", "authentic","open_notdep", "open_auth", "notdep_auth","los_plan")TABLE_NAMES <-c("Total", "Open", "Non-deprecated", "Authentic","Open + Non-deprecated", "Open + Authentic", "Non-deprecated + Authentic","Open Science Registration")
Results
All Registrations
NOTE: All graphs are interactive. Click on the legend to toggle series on/off. You can also use the slider to scroll through the timerange. Zooming is also supported.
The following plots show the distribution of registrations along multiple (sub)criteria of interest. The total number of registrations (i.e., all registrations in the database wihthout any filtering) is shown in black.
The following graph plots the distribution of registrations along the six different sub-criteria of interest denoted at the beginning. Registrations can be duplicated across criteria. The total number of registrations is shown in black.
See the code
all_summary |>factorizer(criteria) |>plot_time_series(date, n, .color_var = criteria, .smooth =FALSE,.title ="OSF Registrations by Open Science Subcriteria")
The next figure plots the distribution of registrations according to the three main criteria of interest (i.e., Open, Non-deprecated, Authentic). Plus signs (+) indicate registrations that jointly satisfy 2 or more criteria. The “Open Science Registration” line plots the number of registrations meeting all 3 criteria.
See the code
all_summary |>factorizer(criteria, 2) |>plot_time_series(date, n, .color_var = criteria, .smooth =FALSE,.title ="OSF Registrations by Open Science Criteria")
The next graph is just a simplified version of the previous one, where we only show registrations that jointly satisfy 2 or more criteria. The “Total” line is included for reference.
See the code
all_summary |>factorizer(criteria, 3) |>plot_time_series(date, n, .color_var = criteria, .smooth =FALSE,.title ="OSF Registrations with 2+ Open Science Criteria")
Here’s a summary table of OSF Registrations by Registry:
Total: Total number of OSF Registrations (no filtering at all)
Open: Number of registrations that are public and not embargoed
Open + Non-deprecated: Number of registrations that are Open and are registered, not deleted, and not retracted
Open + Authentic: Number of registrations that are Open and are not spam
Open Science Registration: Number of registrations meeting all three criteria (i.e., Open + Non-deprecated + Authentic)
OSR / Open: Percentage of Open (i.e., public and not embargoed) registrations that meet the Open Science Registration criteria
OSR / Total: Percentage of Total registrations that meet the Open Science Registration criteria
See the code
registry_gtbl
Open Science Registrations by Registry
as of 2025-10-01
Registry
Total
Open
Open + Non-deprecated
Open + Authentic
Open Science Registration
OSR / Open
OSR / Total
OSF
297,119
234,543
222,326
230,291
218,624
93.2%
73.6%
EGAP
2,969
2,831
2,806
2,800
2,776
98.1%
93.5%
Character Lab
448
448
447
433
432
96.4%
96.4%
GFS
569
279
272
279
272
97.5%
47.8%
DAM
161
133
132
132
131
98.5%
81.4%
Real World Evidence
186
120
112
120
112
93.3%
60.2%
DARPA ASIST
112
78
77
74
73
93.6%
65.2%
YOUth Study
64
56
55
56
55
98.2%
85.9%
OSF Data Archive
64
39
36
38
35
89.7%
54.7%
TWCF Consciousness Studies
29
24
22
23
21
87.5%
72.4%
Metascience
21
17
15
17
15
88.2%
71.4%
OSPD Workflow and Results Hub
23
14
12
13
11
78.6%
47.8%
Lifecycle Journal
26
16
9
16
9
56.2%
34.6%
State of North Carolina
8
8
8
7
7
87.5%
87.5%
We can also plot these numbers over time for each registry. Because of the disparity in the number of registrations between registries, we calculate the percentage of registrations that meeting Open Science Registration criteria. This are the OSR / Open and OSR / Total metrics from the previous table.