An Empirical Validation of Open Source Repository Stability Metrics PDF Free Download

Name: An Empirical Validation of Open Source Repository Stability Metrics PDF
Author: Patrick R. Manning

1 / 11

0 views•11 pages

An Empirical Validation of Open Source Repository Stability Metrics PDF Free Download

An Empirical Validation of Open Source Repository Stability Metrics PDF free Download. Think more deeply and widely.

An Empirical Validation of Open Source Repository

Stability Metrics

1st Elijah Kayode Adejumo

Computer Science

George Mason University

Fairfax, USA

eadejumo@gmu.edu

2nd Brittany Johnson

Computer Science

George Mason University

Fairfax, USA

johnsonb@gmu.edu

Abstract—Over the past few decades, open source software has

been continuously integrated into software supply chains world-

wide, drastically increasing reliance and dependence. Because of

the role this software plays, it is important to understand ways to

measure and promote its stability and potential for sustainability.

Recent work proposed the use of control theory to understand

repository stability and evaluate repositories’ ability to return

to equilibrium after a disturbance such as the introduction of

a new feature request, a spike in bug reports, or even the

inﬂux or departure of contributors. This approach leverages

commit frequency patterns, issue resolution rate, pull request

merge rate, and community activity engagement to provide a

Composite Stability Index (CSI). While this framework has

theoretical foundations, there is no empirical validation of the CSI

in practice. In this paper, we present the ﬁrst empirical validation

of the proposed CSI by experimenting with 100 highly ranked

GitHub repositories. Our results suggest that (1) sampling weekly

commit frequency pattern instead of daily is a more feasible

measure of commit frequency stability across repositories and

(2) improved statistical inferences (swapping mean with median),

particularly with ascertaining resolution and review times in

issues and pull request, improves the overall issue and pull

request stability index. Drawing on our empirical dataset, we

also derive data-driven half-width parameters that better align

stability scores with real project behavior. These ﬁndings both

conﬁrm the viability of a control-theoretic lens on open-source

health and provide concrete, evidence-backed applications for

real-world project monitoring tools.

Index Terms—component, formatting, style, styling, insert

I. INTRODUCTION

Since the early introduction and broad acceptance of open

source software [1], concerns have been raised regarding the

long term sustainability [2]–[4] and stability [5], [6] of open

source innovation. Numerous techniques and suggestions have

been proposed to effectively maintain open source projects [7],

[8]. There have also been numerous efforts to understand

practical and actionable metrics [9], [10] that support improved

engagement to open source software. These metrics are espe-

cially beneﬁcial to community managers (maintainers). With

the rapid integration of open source software into diverse

workﬂows [11], it is imperative to evaluate all factors and

metrics that contribute to its sustainability and stability.

Most prior work aimed at supporting open source sustain-

ability has focused on facilitating continuous contributions.

Many research efforts with this goal have focused on investi-

gating and evaluating tool support for open source on-boarding

process [12], [13]. Tool support for on-boarding newcomers

ranges from various kinds of recommendation systems and

mentorship matching to gamifying contributions and interac-

tions [14]. A subset of the prior work on tool support has

focused on detecting and recommending good ﬁrst issues

for newcomers [15]–[17]. Recently, in light of the current

trends, large language models (LLMs) have been proposed as a

means to enhance the on-boarding experience through various

documentation transformation techniques [18], [19]. More

broadly, there have been numerous studies on understanding

trends in open source communities [20], including research

on learning culture, collective intelligence, motivation, inno-

vation, community structure, success, and virtual organization.

These studies have explored necessary support mechanisms

through knowledge sharing techniques and the utilization of

social Q&A websites [21], [22] for open source software

communities.

As suggested by prior work, livelihood and survival of the

open source ecosystem is a critical concern for all participants,

including software developers, end-users, and investors. Par-

ticipants require sufﬁcient knowledge about ecosystem health

and wellness, which in turn builds trust and attracts greater

recognition and investment [23]. Static repository health met-

rics provided by tools like CHAOSS [24] offer maintainers

access to various metrics such as the ratio of new and returning

contributors, new downloads, knowledge and artifact creation

patterns, mailing list responsiveness, bug ﬁx time, new patents,

usage patterns, and variations in contributor types. These

metrics provide valuable insights into repository health.

However, open source software systems are inherently dy-

namic [25], [26] and are often inﬂuenced by social behavior

[27], making it necessary to consider assessment or evaluation

from a dynamic perspective. Recently, Destefanis et al. [28]

introduced the Composite Stability Index (CSI) framework,

which utilizes control theory dynamics to ascertain repository

stability—that is, repositories’ ability to return to equilibrium

after disruptions. The framework considers key factors such

as commit frequency, issue resolution, pull request merging,

and community engagement patterns. However, its practicality

and feasibility have not yet been determined.

To this end, we carefully curated a dataset of 100 repos-

arXiv:2508.01358v1 [cs.SE] 2 Aug 2025

itories to investigate use of the CSI in practice. We found

that contrary to the daily contribution frequency pattern sug-

gested in the CSI framework, weekly aggregation proves

more practical among projects. We also found that the sta-

tistical inferences (mean) applied by the original CSI could

be improved for more practical application and feasibility,

particularly regarding issue resolution and pull request review

times. Finally, based on our empirical evaluations, we derived

and recommend a data-driven half-width parameter for the

triangular normalizer, which has a signiﬁcant inﬂuence on CSI

variance.

In the following sections II–VI, we present the related work,

methodology, results, and discussion.

II. BACKGROUND & RELATED WORK

To ground our empirical investigation, we ﬁrst survey

prior efforts on repository-level metrics, repository health,

and theoretical stability models. We then focus on recent

control-theoretic approaches to assessing repository stability,

particularly the Composite Stability Index (CSI), and address

the empirical validation on real-world projects.

A. Repository Level Metrics

Advancements in software engineering, both in industry and

research, have been driven by the ability to obtain and curate

data from software repositories [29]. GitHub’s REST API

enables researchers to extract meaningful data and uncover

hidden patterns in software development, leading to signiﬁcant

advances in software engineering [30]. Commit history, the

evolutionary record of changes to source code and other

artifacts in version control systems like GitHub [31] has been

instrumental in revealing various patterns. These patterns have

proven valuable for predicting potential fault-proneness of

classes in object-oriented software [32], analyzing developer

commit patterns as indicators of code quality and developer

expertize [33], and conducting commit impact analysis to

understand software quality evolution [34]. Numerous other

studies have demonstrated how important commit history met-

rics has been advancing software engineering research [35]–

[37].

GitHub utilizes pull requests to facilitate collaborative

development among developers working on or interested in

a project. This approach allows developers to work on project

features independently without committing every individual

change directly to the main codebase, thereby mitigating

frequent merge conﬂicts among team members. Once a sub-

stantial set of changes has been completed, the developer

creates a pull request, which is then reviewed by maintainers

or project owners/administrators who can approve and merge

the commits [38]. The signiﬁcance of pull-based software

development [39] has attracted considerable research attention,

particularly in developing efﬁcient methods for assigning ap-

propriate reviewers to review pull requests [40]. Additionally,

several studies have proposed strategies for how developers

and teams can maximize the beneﬁts of pull requests [41]–

[43].

Faults discovered by developers or users of a software

system are reported to maintainers or administrators through

issue tracking systems such as Jira and Bugzilla. Open

source software, particularly projects hosted on GitHub, utilize

built-in issue reporting and tracking systems. These systems

have signiﬁcantly improved the reporting, management, and

resolution of software issues [44]. Extensive research has

been conducted to enhance user experiences in issue tracking

systems and to develop tools and techniques that support better

reporting, management, and resolution of issue reports [45]–

[47].

Comments on commits, issues, and pull requests play a

vital role in revealing communication patterns and the overall

health of open source software projects. Prior studies have

proposed using comments to understand the emotions of users

and contributors in open source software development [48],

[49]. GitHub discussions have also proven effective in helping

project communities address errors and unexpected behaviors,

ultimately advancing project development [50]. The trends and

patterns derived from these repository metrics have proven

valuable in addressing problems and proposing diverse solu-

tions within open source software development and software

engineering research more broadly.

B. Repository Health and Sustainability

The health and sustainability of open source projects shows

provides insight to the reliability of the project which is

a critical factors that consumers and contributors consider

before adopting a project as a dependency or integrating into

workﬂow [51] The CHAOSS initiative [24] provides a curated,

visualizable dataset of health metrics such as conversion rate,

issue age, change request closure ratio and many other metrics

that are useful for determining project health. Crowston et

al. recommended that consumers also examine community

engagement patterns through project websites, mailing lists,

and list archives before choosing to adopt a particular open

source project [52]. A study by Schweik [53] found that

developers are motivated by multiple factors such as learning

opportunities, incentives, contributing to the public good, or

genuine passion for the project, all of which could be crucial

to project sustainability. Several other studies have proposed

techniques and revealed signiﬁcant ﬁndings regarding how

sustainability can be achieved in open source software through

efﬁcient code forking and improved patch contributions [4],

[54], [55].

C. Stability in Software Engineering

Software stability is deﬁned as a software’s resistance to

ripple effects caused by modiﬁcations, bugs, and other distur-

bances to the software [56], stability in software spread across

diverse areas in software engineering. From the perspective

of software maintenance processes [56], [57], stability is

used to understand the consequences of modiﬁcations, such

as maintenance costs and potential errors that could emerge

when developing maintenance plans. Stability has also been

considered in software evolution planning, where studies have

identiﬁed it as an important criterion for evaluating design

and making design decisions [58], [59]. There has been

considerable research into understanding how stability relates

to other areas, including software aging [60], software reuse

[61], incremental software development [62], and adaptation

[63], [64].

The concept of stability in control theory centers on how

systems evolve over time and react to disruptions. Stability

is achieved when a system, after being displaced from its

equilibrium position, demonstrates a natural tendency to re-

store itself to that equilibrium. Several studies have explored

the application of control theory to software engineering [65],

[66]. However, there was no uniﬁed framework for assessing

stability, especially considering the socio-technical [67] nature

of open source software repositories. Recently, Destefanis et

al. introduced a uniﬁed framework known as the Composite

Stability Index [28], which integrates key repository metrics

into a weighted framework. They proposed that the Composite

Stability Index consists of four major repository properties that

represent the state of a repository as:

R(t) = c(t), i(t), p(t), a(t)T,(1)

where the properties are,

•c(t): commit frequency function;

•i(t): issue resolution rate function;

•p(t): pull request merge rate function;

•a(t): activity engagement function.

Composite Stability Metric Deﬁnitions:The commit fre-

quency, issue resolution rate, pull request merge rate, and

activity engagement represent the fundamental dimensions of

repository activity and health [28].

The commit frequency function c(t)provides insights into

development patterns, showing the frequency of code changes

and ultimately revealing how active the project is through

commit frequencies and their timestamps. The issue resolution

function i(t)provides insights into how the project handles

and resolves problems or concerns raised. This is calculated

by analyzing issue creation and closure timestamps. The pull

request merge rate function p(t)provides insights into how

the project handles pull requests; it assesses code review and

integration processes by utilizing pull request creation and

entire lifecycle timestamps. The activity engagement function

a(t)provides overall insights into repository engagement

through comment activity and interactions.

The authors conceptualized the repository as a dynamical

system and provided detailed deﬁnitions for each component

as follows:

Commit Frequency Function

c(t) = NCt, t + ∆t

∆t,(2)

Where NCt, t + ∆trepresents the number of commit

within the interval t, t + ∆t, this data can be curated from

available commit history.

Issue Resolution Rate

i(t) = Nclosed

it, t + ∆t

Ntotal

i(t)·1

1 + Tresolution(t),(3)

Where, Nclosed

it, t + ∆trepresents the number of closed

issues within the interval t, t + ∆t,Ntotal

i(t)represents the

total issues and Tresolution(t)represents the average resolution

time for issues to be closed in the interval t, t + ∆t.

Pull Request Merge Rate

p(t) = Nmerged

pt, t + ∆t

Ntotal

p(t)·1

1 + Treview(t),(4)

Where, Nmerged

pt, t + ∆trepresents the number of pull

request merged within the interval t, t + ∆t,Ntotal

p(t)rep-

resents the total pull request, while Treview(t)represents the

average review time for pull request in the intervalt, t+∆t.

Activity Engagement Function

a(t) = Ncomments(t, t + ∆t)

Nissues(t) + Nprs(t)·Nactive users(t, t + ∆t)

Ntotal users(t),(5)

Where, Ncomments(t, t + ∆t)represents the number of com-

ments in the interval (t, t + ∆t),Nissues(t) = Ntotal

i(t)−

Nclosed

i(t), and Nprs(t) = Ntotal

p(t)−Nmerged

p(t)represent the

number of open issues and open pull requests at time t,

respectively. Nactive users(t, t + ∆t)represents users who have

interacted with the repository through comments, commits, or

pull requests within the interval (t, t + ∆t), and Ntotal users(t)

represents the total number of users who have interacted with

the repository.

These stability metrics capture the core facets of a reposi-

tory, but they require rigorous empirical evaluation to conﬁrm

their effectiveness. In this paper, we evaluate and discuss the

effectiveness of these metrics as stability indicators through

practical validation methods.

III. METHODOLOGY

Our study aims to empirically evaluate how well the

Composite Stability Index (CSI) framework proposed in

prior work [28] applies to real-world software repositories.

To assess the practical adequacy of the CSI equations

(Eqs. (2)–(5)), we pose the following research questions:

RQ1To what extent do widely adopted open-source repos-

itories satisfy the individual CSI criteria for commit

frequency, issue resolution, pull-request merging, and

activity engagement?

RQ2How does swapping outlier-sensitive measures and high-

frequency sampling for robust, window-bounded alterna-

tives affect stability thresholds?

RQ3How sensitive is the CSI classiﬁcation to variations in

the triangular-normaliser parameters (µk,σk) across real-

world repositories?

For each research question, we speciﬁed the analysis win-

dow parameter as 5 years. That is, in interval t, t + ∆t,

∆t= 5 years on each of the repositories we analyzed, which

we discuss next.

Repository Selection:To evaluate the practicality of the

CSI, we curated a sample of 100 GitHub repositories that

satisfy ﬁve objective characteristics:

i. Stars >10,000 : we believe a large watcher base signals

sustained interest [68] ensuring that activity metrics are

representative of projects people actually rely on.

ii. Forks >9,000 : Fork counts correlate with external

contribution and merge trafﬁc [69]; high values stress test

the pull-request component of CSI.

iii. Maturity ≥10 years old. We believe a decade of history

provides the longitudinal data needed to observe stability

trends and cushions against short-lived bursts of activity.

iv. Not for education. Projects such as books, programming

tutorials, and screencasts attract stars and discussion but

contain relatively little code evolution. Given our focus

on software development dynamics, we removed these

repositories from our CSI analysis.

v. Not archived. Any repository ﬂagged as archived was

removed during manual screening so that the ﬁnal dataset

contains only actively maintained projects.

We curated our dataset based on the inclusion criteria above

and now detail the data extraction pipeline and the analytical

procedures used to answer each research question.

A. Data Extraction Pipeline

Window deﬁnition (∆t= 5 years.

We utilized the following GitHub REST endpoints to cu-

rate relevant data: Commits: GET /repos/o/r/commits?since=

Issues & PRs: GET /search/issues Comments: GET /is-

sues/comments, GET/pulls/comments.

Caching & retries: we utilized 24 hour JSON cache per

endpoint call. We also implemented transient 4xx/5xx errors

trigger exponential back-off with up to ﬁve retries.

Reproducibility: All scripts, csv ﬁles and raw JSON

dumps are published: https://anonymous.4open.science/r/OSS-

stability-3C1F/

B. Operationalization of the Original CSI Baseline (RQ1)

To evaluate practically the original CSI framework, we

implemented the thresholds and metrics exactly as deﬁned in

[28].

Commit Stability: To determine commit stability based on

the commit frequency function in Equation (2), we evaluate

stability for each of the 100 repositories in our sample dataset

(Section III). A repository’s commit pattern is classiﬁed as

stable when:



dc(t)

dt 

≤αc,∀t∈[t0, t0+T],(6)

where αcsets an upper limit on the allowable rate of

change in commit frequency. If the rate of change exceeds

this threshold, the repository is no longer classiﬁed as stable.

The threshold αcis deﬁned as:

αc=σdaily commits

µdaily commits

≤0.5.(7)

Issue Management Stability: To determine stable issue

resolution patterns across our sample dataset, we ﬁrst apply

the issue resolution rate function (3), then utilized the issue

stability threshold:

i(t)≥βiand Tresolution(t)≤τi,∀t∈[t0, t0+T],(8)

where βiis the minimum acceptable issue resolution rate with

a threshold of 0.3, and Tresolution(t)is the average resolution

time with a maximum acceptable threshold of τi= 14 days.

Pull Request Processing Stability: To determine stable pull

request processing patterns across our sample dataset, we ﬁrst

apply the pull request merge rate function (4), then evaluate

the pull request processing threshold:

p(t)≥βpand Treview(t)≤τp,∀t∈[t0, t0+T],(9)

where βpis the minimum acceptable pull request merge rate

with a threshold of 0.4, and Treview(t)is the average review

time with a maximum acceptable threshold of τp= 5 days.

Community Engagement Stability: To determine stable

community activity engagement across our dataset, we ﬁrst

apply the activity engagement function (5), then evaluate the

community engagement stability threshold:

a(t)≥γaand Nactive users(t)

Ntotal users(t)≥δa,∀t∈[t0, t0+T],(10)

Where, γais the minimum acceptable activity ratio with a

threshold of 0.25, and δais the minimum acceptable active

user ratio with a threshold of 0.15.

C. Composite Stability Index (CSI)

To aggregate the four stability dimensions into a single

score, we utilize the Composite Stability Index introduced by

Destefanis et al. [28]:

CSI(t) = wcϕc(c(t))+wiϕi(i(t))+wpϕp(p(t))+waϕa(a(t)),

(11)

with the weight vector

W= [wc, wi, wp, wa] = [0.3,0.25,0.25,0.2].(12)

Triangular Normalizer: Each raw metric x∈ {c, i, p, a}

is mapped onto the unit interval by

ϕk(x) = 





1−|x−µk|

σk,if |x−µk| ≤ σk,

0,otherwise,

(13)

where µkis the target value and σkis the admissible deviation

for component k.

Target and tolerance values from the triangular normalizer

Table I deﬁned by Destefanis et al. [28] has implications as

following:

i. Commit pattern (ϕc). µc= 0.25 (i.e., we expect

roughly a coefﬁcient of variation of 0.25). A repository’s

measured cthat exactly matches 0.25 yields ϕc= 1. If c

lies anywhere between 0.00 and 0.50, then

ϕc(c) = 1 −|c−0.25 |

0.25 .

As soon as cexceeds 0.50 or falls below 0.00, ϕcbecomes

zero. In other words, if a project’s commit-frequency CV

strays beyond ±0.25 around 0.25, it no longer contributes

positively to the CSI.

ii. Issue management (ϕi). Here µi= 0.40 (target resolu-

tion rate) and σi= 0.10. Thus:

ϕi(i) = (1−|i−0.40 |

0.10 ,if 0.30 ≤i≤0.50,

0,otherwise.

If a repository closes issues at exactly 40% in the given

interval, ϕi= 1. If the rate dips to 30% or climbs to 50%,

ϕireaches 0, because it is at the edge of the admissible

band.

iii. Pull-request stability (ϕp). With µp= 0.50 and σp=

0.10, any merge-rate pin [0.40,0.60] yields a strictly

positive ϕp. For example, if p= 0.55, then

ϕp(0.55) = 1 −|0.55 −0.50 |

0.10 = 1 −0.5 = 0.5.

Once p < 0.40 or p > 0.60,ϕpcollapses to 0.

iv. Community engagement (ϕa). The required target activ-

ity ratio of µa= 0.35 and allow a deviation of σa= 0.10.

Hence, ϕa(a)is nonzero only when 0.25 ≤a≤0.45. At

a= 0.35,ϕa= 1. At the tolerance edges (a= 0.25 or

a= 0.45), ϕa= 0.

TABLE I

TARGET VALUES (µk)AND TOLERANCES (σk)USED BY THE CSI.

Component k µkσk

Commit pattern (ϕc)0.25 0.25

Issue management (ϕi)0.40 0.10

Pull-request stability (ϕp)0.50 0.10

Community engagement (ϕa)0.35 0.10

D. Estimator and Sampling Robustness (RQ2)

Statistical inferences play an important role in the CSI

framework, as demonstrated in Equations (3) and (4), where

average issue resolution rates and pull request review rates are

calculated. However, several studies have highlighted potential

robustness concerns with these approaches,such as skewness

in bug resolution [70] and commit intervals [71]. In these

cases, prior work recommends using medians as indicators

to mitigate outliers or measurement of atypical activity.

The commit history stability index speciﬁed in Equation (6)

utilizes daily commit patterns where the standard deviation

remains less than half of the mean. However, daily commit pat-

terns may be affected by timezone differences, as contributors

to open source software are often distributed across continents

[72] and may have full-time software development roles [73],

[74]. Weekly aggregation of commit history patterns offers a

more logical approach, as studies have shown that aggregating

high-frequency data reduces noise [75]. These considerations

motivate our investigation of weekly aggregation for commit

history stability thresholds and the adoption of robust statisti-

cal estimators throughout the CSI framework.

E. Sensitivity Analysis of CSI Classiﬁcation (RQ3)

In each of the CSI components- commit, issues, pull request,

and activity engagement-repositories must attain stability by

meeting the thresholds as deﬁned in equations (6)–(10) (Sec-

tion III-B) before their stability measures are evaluated by the

triangular-normalizer parameter. This ensures that the stability

measures are within a target µkand a tolerance measure σk

as shown in Table I before they contribute variance to the

CSI(t). However, these target values are proposed based on

experience of the authors [28] and have not been evaluated

to determine their feasibility. To answer RQ3, we performed

descriptive statistical analysis across repositories that meet the

stability threshold in their individual CSI components but do

not contribute variance to the CSI(t)due to their stability

measures being outside the target and tolerance measures.

This analysis shows the practicality of the target and tolerance

values proposed.

IV. RESULTS

In this section, we discuss the ﬁndings from our evaluation

of the original CSI thresholds [28] and the impact of threshold

variance.

A. Satisfaction of Original CSI Criteria (RQ1)

To answer RQ1, we applied the original CSI thresholds

to the our dataset of 100 repositories. Based on the commit

frequency function and daily commit stability threshold de-

ﬁned in equation (6) which emphasizes that the daily standard

deviation remains less than the mean, only 2% of our evaluated

repositories meet this criteria. The graph 1 shows details on

the daily commit frequency against the stability threshold.

0.433

0.274

3.112438776

0.5 0.5 0.5

0.5

1.5

2.5

3.5

NIXPKGS RUST AVERAGE (OTHERS)

Daily CV

DAILY CV THRESHOLD 0.5

Fig. 1. Daily CV: Stable Repos vs Aggregated Others

This ﬁnding shows that very few of the projects sustain

stable day-to-day commit rhythms without long idle spells or

sudden bursts of activity. The remaining 98% exhibit inher-

ently “bursty” or ad-hoc behavior. Since only 2 repositories

have ϕc>0, only these two repositories contribute variance

to the CSI calculation. This raises concern regarding the

suitability and practicality of a daily-level stability criterion

for most open-source repositories.

The issue resolution pattern was another important com-

ponent of the CSI framework. Our analysis revealed that 10

repositories from our dataset do not have the issue feature

enabled (Django, FFmpeg, Flink, Gitignore, Jenkins, Kafka,

Lantern, Laravel, Spark, and WordPress). We used the remain-

ing 90 repositories for issue resolution analysis. We found

that none of the projects in our dataset achieved the target

threshold of i(t)≥0.30, with the 95th percentile still an order

of magnitude lower at 0.018. Only three repositories (“lar-

avel/framework”, “home-assistant/core”, and “ﬂutter/ﬂutter”)

achieved the mean resolution times below the 14-day target.

However, they still failed to meet the i(t)threshold, demon-

strating that quick turnaround alone may be insufﬁcient when

the overall closure rate remains low.

Speciﬁcally, because the denominator Ntotal

iin Eq. (3) is the

cumulative number of issues ever reported, its large magnitude

drags the closure ratio toward zero, so most projects do not

meet the i(t)≥0.30 requirement. This shortcoming stems from

denominator drag: large, long-lived repositories accumulate

tens of thousands of issues, making the closed issues within

the ﬁve-year window comparatively negligible. Furthermore,

the median resolution time Tresolution(t)is 128 days, but

the distribution exhibits a heavy right tail extending beyond

four years, inﬂating the mean to 236 days. Table II shows

details of the descriptive statistics across 90 repositories. Since

ϕi= 0 for all 90 analyzed repositories, the issue component

contributes no variance to the CSI calculation. This suggests

that the current thresholds may require adjustment or that

alternative normalization approaches should be considered for

meaningful statistical inference.

TABLE II

DESCRIPTIVE STATISTICS FOR 90 PROJECTS WITH ISSUES ENABLED.

Statistic Closure-rate i(t)Mean resolution age (days)

Mean 0.004 236

Median 0.001 128

Std. dev. 0.008 238

95th perc. 0.018 670

Min / Max 0.0 / 0.066 3.5 / 1430

Applying the formulation of pull request merge rates in

Equation (4) across our 100-repository dataset, we observed

that 72% of the repositories had an average review time

of Treview(t)>5days. None of the repositories achieved

a merge rate p(t)≥0.4as deﬁned in Equation (9). This

pattern may also be attributed to the denominator drag effect,

where long-lived projects accumulate large numbers of pull

requests over their lifetime, making the number of merged pull

requests within the ﬁve-year evaluation window comparatively

negligible relative to the historical total. Since ϕp= 0 for all

analyzed repositories, the pull request merge rate component

contributes no variance to the CSI calculation. Table VI shows

the detailed repository distribution.

TABLE III

PULL REQUEST STABILITY ACROSS REPOSITORIES

Criterion Pass Fail

Merge-rate p(t)≥0.40 0 100

Mean review time Treview ≤5days 28 72

Both criteria simultaneously 0 100

From the activity engagement stability formulation, the

following conditions must be met:

a(t)≥0.25 and Nactive users(t)

Ntotal users(t)≥0.15 (14)

In our analysis, we found that 86% of our 100-repository

dataset met the activity ratio criteria and 95% met the active

user ratio. However, only three (3) repositories had ϕa>0.

only these three repositories were within the target (µk) and

tolerance (σk) values of 0.35 and 0.10, as shown in Table I.

These ﬁndings suggest that the activity engagement stability

index is highly applicable and practicable. However, given we

only observed variance to the CSI with a few repositories,

this suggests that the activity engagement target and tolerance

values could beneﬁt from statistical inferences and calibration.

B. Impact of Alternatives on Stability Thresholds (RQ2)

Findings from RQ1reveal the practical application of CSI

framework. However, as detailed in Section III-D, the CSI

framework could beneﬁt from an improved statistical inference

applicable to open source software repositories. To investigate

the impact of variance in commit stability patterns, we

utilized a weekly contribution pattern where we deﬁned the

threshold αcas:

αc=σweekly commits

µweekly commits

≤0.5.(15)

In our dataset of 100 repositories, we found 29 repositories

that had a weekly coefﬁcient of variation (CV) within the pro-

posed range ≤0.5. This suggests that the weekly aggregation

commit frequency pattern may or be more appropriate, as more

repositories are able to sustain a genuinely stable week-to-

week commit rhythm without long idle weeks or sudden burst

of activity in some weeks (over the last 5 years). Figure 2

shows the repositories with stable weekly commit patterns.

From the Table IV, the daily commit counts exhibit more

bursts than their weekly-aggregated counterpart, smoothing to

a seven-day window reduces the median CV by ≈47 % and

the mean by ≈52 % (Table IV). Although the median weekly

CV (0.621) still exceeds the CSI stability cut-off of 0.5, the

shift brings a substantial fraction of repositories below the

threshold.

Our ﬁndings regarding issue stability from RQ1suggests

that issue resolution times exhibit signiﬁcant right-skewed

0.1

0.2

0.3

0.4

0.5

0.6

ASPNETCORE

PANDAS

SYMFONY

ANT

-DESIGN

ELECTRON

KUBERNETES

FFMPEG

ETHEREUM

ASPNETCORE.DOCS

PDF.JS

RAILS

GRAFANA

NUMPY

AIRFLOW

DJANGO

ANGULAR

CORE

NIXPKGS

ELASTICSEARCH

GODOT

METASPLOIT-FRAMEWORK

RUST

BITCOIN

ARDUPILOT

MATERIAL

FLUTTER

SPARK

NODE

WEEKLY CV THRESHOLD

Fig. 2. Weekly CV: Stable Repositories

TABLE IV

CENTRAL-TENDENCY COMPARISON OF COMMIT-FREQUENCY

COEFFICIENTS OF VARIATION (CV) AT DAILY VERSUS WEEKLY

GRANULARITY.

Statistic Daily CV Weekly CV

Median 1.182 0.621

Mean 3.057 1.453

distributions across repositories, with outlier issues requiring

substantially longer resolution periods that distort mean-based

metrics. Furthermore, we observe that the total issue count

Ntotal

isigniﬁcantly inﬂuences the i(t)ratio calculation. To

mitigate the impact of these outliers, we employed the me-

dian resolution time medianTresolution(t)as a more robust

measure. This analysis revealed that 64 out of 90 evaluated

repositories (excluding 10 repositories with disabled issue

tracking) maintain a median issue resolution time of ≤14

days, indicating relatively efﬁcient issue management practices

across the majority of projects.

To address the dominance of the total issue count denomi-

nator, we reﬁned our approach by utilizing Ntotal

it, , t + ∆t

as the denominator instead. This modiﬁcation constrains the

issue closure ratio to the speciﬁed time window, yielding a

more realistic and temporally-bounded assessment. Following

these methodological calibrations, we observed that 22 out

of 90 repositories were within the threshold of i(t)≥0.30.

However, only 6 repositories had a ϕiscore >0, and only

for these repositories did we see contribution to variance in

the CSI calculation, as they fell within the target performance

(µk= 0.40) and tolerance bounds (σk= 0.10) respectively.

Details are shown in Table V

Our ﬁndings regarding pull request merge rates from

RQ1indicated that the pull request review time exhibit right-

skewed distributions across repositories, with some outlier pull

request requiring more review times that distort the mean-

based metrics. We also observed that the total pull request

count Ntotal

pinﬂuenced the p(t)ratio calculation. To mitigate

the impact of these outliers we employed the median review

time medianTreview(t)for a more robust measure. This

analysis revealed that 90% of the repositories had the median

review time within the 5 days threshold, indicating median

TABLE V

DESCRIPTIVE STATISTICS FOR ISSUE RESOLUTION ACROSS REPOSITORIES

AFTER CALIBRATIONS

Statistic Closure-rate i(t)Median resolution age (days)

Mean 0.233 66.800

Median 0.160 5.250

Std. dev. 0.259 199.899

95th perc. 0.808 381.275

Min / Max 0.001 / 0.967 0 / 1331.400

review measure is a relatively efﬁcient measure. To investigate

the dominance of the total pull request count denominator,

we reﬁned our approach by utilizing Ntotal

pt, , t + ∆tas

the denominator, i.e total number of pull request within the

window being evaluated (5 years). Following this modiﬁcation,

we observed that 34% of the repositories were within the

threshold of p(t)≥0.40. However, only 18% of the reposito-

ries had a ϕpscore >0and fell within the target performance

(µk= 0.50) and tolerance bounds (σk= 0.10). Therefore,

they were the only repositories that contributed variance to

the CSI calculation.

TABLE VI

PULL REQUEST STABILITY ACROSS REPOSITORIES AFTER CALIBRATION

Criterion Pass Fail

Merge-rate p(t)≥0.40 34 66

Median review time ≤5days 90 10

Both criteria simultaneously 34 66

As our investigation of activity engagement for RQ1

revealed that most repositories met the activity engagement

stability threshold hence, no activity engagement stability

metric was re-calibrated.

C. Variations in Triangular-Normalizer Parameters (RQ3)

Findings from RQ2provided insights on how statistical

inferences can improve the robustness of each component of

the CSI. This new improvement shifted repositories within

the stability thresholds particularly the issue resolution,pull

request, and activity engagement measures. However, for

many repositories these still do not contribute to the variance

of the CSI(t)because of the thresholds of the proposed

variances (I). Our ﬁndings suggest feasible target and tolerance

measures that reward more stable repositories and contribute

variance to CSI(t).

For issue resolution, ﬁndings from RQ2, revealed that 22

repositories were within the issue stability threshold. However,

only 6 repositories had direct impact on the CSI due to

the fact that they were not within the target and tolerance

measures proposed. Building on the empirical evidence from

the 22 repositories classiﬁed as stable, we determined a

more evidence-based target and tolerance µk&σkfor issue

management (ϕi). By taking the median and Median Absolute

Deviation (MAD) of i(t)across the stable repositories, we

found that target µk= 0.620 and tolerance σk= 0.221 are

feasible measures rewarding more 10 stable repositories that

now contribute variance to the CSI. Detailed analysis is shown

in Table VII.

TABLE VII

DESCRIPTIVE STATISTICS ISSUE RESOLUTION TARGET AND TOLERANCE

Statistic Value

min i(t)0.3190

max i(t)0.9667

median i(t)0.6204

MAD 0.1489

1.4826×MAD 0.2208

In our analysis of pull request merges for RQ2, we found

that 34 repositories were within the pull request merge rate

stability threshold. However, only 18 of these repositories

contributed variance to the CSI(t), again because of the

proposed target performance and tolerance bounds of 0.50 and

0.10, respectively. Building on the empirical evidence from the

34 stable repositories, we determined a more evidence-based

target and tolerance µkand σkfor pull-request stability (ϕp).

By evaluating the median and Median Absolute Deviation

(MAD) of p(t)across the stable repositories, we found target

µk= 0.562 and tolerance σk= 0.153 to be feasible measures,

resulting in 7 additional repositories being rewarded. These

ﬁndings match closely with the proposed target and tolerance

thresholds for pull-request stability I. Details are shown in

Table VIII.

TABLE VIII

DESCRIPTIVE STATISTICS PULL REQUEST MERGE RATE TARGET AND

TOLERANCE

Statistic Value

min p(t)0.4167

max p(t)0.9648

median p(t)0.5616

MAD 0.1035

1.4826×MAD 0.1534

Community Engagement: Our ﬁndings from RQ1revealed

that 86% of repositories meet the community engagement

stability threshold of having a minimum activity ratio threshold

of 0.25 and minimum acceptable active user ratio of 0.15.

However, only three repositories were within the target (µk)

and tolerance (σk) values of 0.35 and 0.10, respectively.

Building on the empirical evidence from the 86 repositories

that met the stability threshold, we evaluated the median

and Median Absolute Deviation (MAD) of a(t)across the

stable repositories to determine a more evidence-based target

and tolerance. We found target µk= 3.7056 and tolerance

σk= 3.2644 to be feasible measures, which resulted in 64

additional repositories being considered stable contributing

variance to CSI(t). Details are shown in the Table IX.

TABLE IX

DESCRIPTIVE STATISTICS ACTIVITY ENGAGEMENT TARGET AND

TOLERANCE

Statistic Value

min a(t)0.3313

max a(t)18.6530

median a(t)3.7056

MAD 2.2018

1.4826×MAD 3.2644

V. DISCUSSION

Our ﬁndings provide novel insights into stability in open

source software development, suggesting practical implica-

tions for measuring and improving open source project sta-

bility.

A. Control Theory & Stability

Our empirical analysis across 100 diverse open source

repositories provides the ﬁrst comprehensive validation of the

control-theory metaphor for software repositories, examining

four major components: commits, issues, pull requests, and

community engagement. While previous work has explored

the application of control theory to self-adaptive and self-

controlling software systems [65], [76], our ﬁndings offer

novel practical insights into computing control-theoretic sta-

bility within temporal windows in software repositories.

Prior research has extensively studied commit frequency

distributions [71], [77], establishing them as a vital component

of open source software dynamics. However, our ﬁndings

reveal new insights into the practical derivation of commit

frequency stability and demonstrate the applicability of consis-

tent daily and weekly commit rhythms. These patterns provide

actionable insights for maintainers and administrators of open

source projects to better understand repository activity cycles.

Research on issues in open source software has examined

various aspects, including user reporting behaviors [78] and

tools for improved issue resolution [79]. However, to our

knowledge, no study has evaluated a practical approach for

determining stability in issue resolution within temporal win-

dows. Our ﬁndings address this gap and provide valuable

insights for maintainers and administrators seeking to optimize

issue management processes such as automatically ﬂagging

when a project’s closing-rate drops below a healthy threshold

allowing early interventions. Additionally, our analysis reveals

a limitation in the overall CSI(t) calculation for repositories

with disabled issue features, as these repositories contribute

no variance to the overall CSI(t) score.

Previous studies have identiﬁed various factors inﬂuencing

pull request acceptance and merging, including program-

ming language, commit count, and ﬁle modiﬁcations [80].

Research has also shown that contributor awareness of inte-

gration times motivates continued project participation [81],

making pull request timing prediction an area of signiﬁcant

interest [82]. Our ﬁndings contribute practical insights into

stable pull request merge patterns across repositories, with im-

plications for both maintainers seeking to optimize workﬂows

and contributors evaluating potential project engagement based

on historical merge patterns.

Community engagement represents a critical factor in

open source software success. A comprehensive literature

review [83] revealed that most studies employ surveys and

questionnaires as primary research methodologies, identifying

key engagement factors including joining processes, contri-

bution barriers, motivation, retention, and abandonment. Our

work provides the ﬁrst practical, data-driven evaluation of

community engagement based on user activity ratios and inter-

action patterns in issue and pull request comments, offering

empirical evidence complementing existing survey-based re-

search approaches. For example, projects identiﬁed via surveys

as facing contribution barriers often show sharp drops in a(t)

around the time they introduced new contribution guidelines,

providing empirical validation of perceived hurdles. Similarly,

repositories with high survey reported retention scores consis-

tently demonstrate stable or increasing a(t)patterns over time,

while projects reporting abandonment issues exhibit declin-

ing engagement metrics that precede developer departure by

several months. This temporal analysis provide early warning

signals for community health issues that traditional survey

methods can only detect retrospectively.

B. Supporting Stability Monitoring in Practice

Existing tools such as CHAOSS [24] provide visualiza-

tion of project health metrics including closed issues, issue

age, pull request responsiveness, and new versus repeating

contributor ratios, offering valuable insights to maintainers

and administrators. Our validated ﬁndings on stable commit

frequency patterns, issue resolution times, pull request merge

rates, and community engagement can be integrated into the

CSI(t) metric and incorporated into continuous integration

architectures, browser extensions, or project health dashboards.

This integration enables teams to access stability metrics

that are not only theoretically grounded but also calibrated

to real-world repository behavior, which provides evidence

that research suggests can enhance developer conﬁdence and

trust [84]. Also, newcomer onboarding can be enhanced by

leveraging commit frequency patterns to recommend appro-

priate tasks. Projects can direct new contributors to actively

maintained code areas (indicated by high c(t)values) where

their contributions are more likely to be reviewed and merged

successfully.

VI. THREAT TO VALIDITY

We took a number of precautions to reduce the threats

to validity of our ﬁndings and insights. However, there still

remain aspects of our experimental design that could impact

the generalizability and validity of our insights.

A. Repository Size and Selection Constraints

As deﬁned in Section III, repositories in our sample dataset

were highly ranked and inﬂuential. This selection criterion

could limit the generalizability of our ﬁndings to younger or

less popular repositories. However, we deliberately focused on

repositories with broader societal impact to ensure our anal-

ysis captured stable, well-established projects that represent

meaningful patterns in the open source ecosystem.

B. Repository Types

Prior research [23] has acknowledged that different repos-

itory types (e.g., web applications, libraries, operating sys-

tems) may exhibit distinct stability measures due to varying

contribution patterns and development practices. We did not

categorize repositories by type in this study, which represents

a limitation that could be addressed in future work to provide

more nuanced stability assessments across different project

categories.

CONCLUSION

In this paper, we present ﬁndings from an experimental vali-

dation of the Composite Stability Index (CSI), a novel control-

theoretic framework for measuring software repository stabil-

ity. Through analysis of 100 highly ranked GitHub projects,

we demonstrated that real repositories exhibit equilibrium-

seeking behavior across commits, issues, pull requests, and

community engagement, mirroring Lyapunov-style stability

found in engineered systems.

Our ﬁndings reveal three key insights: First, repositories

demonstrate measurable stability patterns that align with con-

trol theory principles. Second, weekly commit rhythms provide

a more reliable stability measure compared to the daily commit

patterns originally proposed in the CSI framework. Third, we

derived and recommend a data-driven half-width parameter for

the triangular normalizer, which signiﬁcantly inﬂuences CSI

variance and improves the framework’s practical applicability.

These results not only afﬁrm the practicality of applying

a control-theoretic lens to repository stability assessment but

also provide the empirical foundation necessary for robust,

real-world adoption and improvement of the CSI framework.

REFERENCES

[1] A. Fuggetta, “Open source software—-an evaluation,” Journal of Sys-

tems and software, vol. 66, no. 1, pp. 77–90, 2003.

[2] V. Chang, H. Mills, and S. Newhouse, “From open source to long-

term sustainability: Review of business models and case studies,” in

Proceedings of the UK e-Science All Hands Meeting 2007. University

of Edinburgh/University of Glasgow (acting through the NeSC), 2007.

[3] I. Chengalur-Smith, A. Sidorova, and S. L. Daniel, “Sustainability of

free/libre open source projects: A longitudinal study,” Journal of the

Association for Information Systems, vol. 11, no. 11, p. 5, 2010.

[4] L. Nyman and J. Lindman, “Code forking, governance, and sustainability

in open source software,” Technology Innovation Management Review,

vol. 3, no. 1, 2013.

[5] S. Bouktif, H. Sahraoui, and F. Ahmed, “Predicting stability of open-

source software systems using combination of bayesian classiﬁers,” ACM

Transactions on Management Information Systems (TMIS), vol. 5, no. 1,

pp. 1–26, 2014.

[6] S. Raemaekers, A. Van Deursen, and J. Visser, “Measuring software

library stability through historical version analysis,” in 2012 28th IEEE

international conference on software maintenance (ICSM). IEEE, 2012,

pp. 378–387.

[7] M. Aberdour, “Achieving quality in open-source software,” IEEE soft-

ware, vol. 24, no. 1, pp. 58–64, 2007.

[8] K. Fogel, Producing open source software: How to run a successful free

software project. ” O’Reilly Media, Inc.”, 2005.

[9] M. Germonprez, J. E. Kendall, K. E. Kendall, L. Mathiassen, B. Young,

and B. Warner, “A theory of responsive design: A ﬁeld study of corpo-

rate engagement with open source communities,” Information Systems

Research, vol. 28, no. 1, pp. 64–83, 2017.

[10] M. Germonprez, G. J. Link, K. Lumbard, and S. Goggins, “Eight

observations and 24 research questions about open source projects: Illu-

minating new realities,” Proceedings of the ACM on Human-Computer

Interaction, vol. 2, no. CSCW, pp. 1–22, 2018.

[11] Ø. Hauge, C. Ayala, and R. Conradi, “Adoption of open source soft-

ware in software-intensive organizations–a systematic literature review,”

Information and Software Technology, vol. 52, no. 11, pp. 1133–1154,

2010.

[12] I. F. Steinmacher, “Supporting newcomers to overcome the barriers

to contribute to open source software projects,” Ph.D. dissertation,

Universidade de S˜

ao Paulo, 2015.

[13] S. Balali, U. Annamalai, H. S. Padala, B. Trinkenreich, M. A. Gerosa,

I. Steinmacher, and A. Sarma, “Recommending tasks to newcomers in

oss projects: How do mentors handle it?” in Proceedings of the 16th

International Symposium on Open Collaboration, 2020, pp. 1–14.

[14] I. Santos, K. R. Felizardo, I. Steinmacher, and M. A. Gerosa, “Software

solutions for newcomers’ onboarding in software projects: A systematic

literature review,” Information and Software Technology, p. 107568,

2024.

[15] C. Stanik, L. Montgomery, D. Martens, D. Fucci, and W. Maalej, “A

simple nlp-based approach to support onboarding and retention in open

source communities,” in 2018 IEEE international conference on software

maintenance and evolution (ICSME). IEEE, 2018, pp. 172–182.

[16] H. Horiguchi, I. Omori, and M. Ohira, “Onboarding to open source

projects with good ﬁrst issues: A preliminary analysis,” in 2021 IEEE

International Conference on Software Analysis, Evolution and Reengi-

neering (SANER). IEEE, 2021, pp. 501–505.

[17] I. Steinmacher, T. U. Conte, C. Treude, and M. A. Gerosa, “Overcoming

open source project entry barriers with a portal for newcomers,” in 2016

IEEE/ACM 38th International Conference on Software Engineering

(ICSE), 2016, pp. 273–284.

[18] E. K. Adejumo and B. Johnson, “Towards leveraging llms for reduc-

ing open source onboarding information overload,” in Proceedings of

the 39th IEEE/ACM International Conference on Automated Software

Engineering, 2024, pp. 2210–2214.

[19] R. Tufano, A. Mastropaolo, F. Pepe, O. Dabi´

c, M. Di Penta, and

G. Bavota, “Unveiling chatgpt’s usage in open source projects: A

mining-based study,” in 2024 IEEE/ACM 21st International Conference

on Mining Software Repositories (MSR). IEEE, 2024, pp. 571–583.

[20] M. R. Mart´

ınez-Torres and M. C. D´

ıaz-Fern´

andez, “Current issues

and research trends on open-source software communities,” Technology

Analysis & Strategic Management, vol. 26, no. 1, pp. 55–68, 2014.

[21] S. Sharma, V. Sugumaran, and B. Rajagopalan, “A framework for cre-

ating hybrid-open source software communities,” Information Systems

Journal, vol. 12, no. 1, pp. 7–25, 2002.

[22] B. Vasilescu, A. Serebrenik, P. Devanbu, and V. Filkov, “How social

q&a sites are changing knowledge sharing in open source software

communities,” in Proceedings of the 17th ACM conference on Computer

supported cooperative work & social computing, 2014, pp. 342–354.

[23] S. Jansen, “Measuring the health of open source software ecosystems:

Beyond the scope of project health,” Information and Software Technol-

ogy, vol. 56, no. 11, pp. 1508–1519, 2014.

[24] S. Goggins, K. Lumbard, and M. Germonprez, “Open source community

health: Analytical metrics and their corresponding narratives,” in 2021

IEEE/ACM 4th International Workshop on Software Health in Projects,

Ecosystems and Communities (SoHeal). IEEE, 2021, pp. 25–33.

[25] J. Wu, “Open source software evolution and its dynamics,” 2006.

[26] R. Vasa, “Growth and change dynamics in open source software sys-

tems,” Ph.D. dissertation, Ph. D. dissertation, Swinburne University of

Technology, Faculty of . . . , 2010.

[27] L. Yin, M. Chakraborti, Y. Yan, C. Schweik, S. Frey, and V. Filkov,

“Open source software sustainability: Combining institutional analysis

and socio-technical networks,” Proceedings of the ACM on Human-

Computer Interaction, vol. 6, no. CSCW2, pp. 1–23, 2022.

[28] G. Destefanis, S. Bartolucci, D. Graziotin, R. Neykova, and M. Ortu, “In-

troducing repository stability,” arXiv preprint arXiv:2504.00542, 2025.

[29] G. Gousios and D. Spinellis, “Ghtorrent: Github’s data from a ﬁrehose,”

in 2012 9th IEEE Working Conference on Mining Software Repositories

(MSR), 2012, pp. 12–21.

[30] E. Kalliamvakou, G. Gousios, K. Blincoe, L. Singer, D. M. German, and

D. Damian, “The promises and perils of mining github,” in Proceedings

of the 11th working conference on mining software repositories, 2014,

pp. 92–101.

[31] J. D. Blischak, E. R. Davenport, and G. Wilson, “A quick introduction

to version control with git and github,” PLoS computational biology,

vol. 12, no. 1, p. e1004668, 2016.

[32] C. Y. Chong and S. P. Lee, “Can commit change history reveal potential

fault prone classes? a study on github repositories,” in International

Conference on Software Technologies. Springer, 2018, pp. 266–281.

[33] Y. Wu, Y. Yang, Y. Zhao, H. Lu, Y. Zhou, and B. Xu, “The inﬂuence

of developer quality on software fault-proneness prediction,” in 2014

Eighth International Conference on Software Security and Reliability

(SERE), 2014, pp. 11–19.

[34] P. Behnamghader, R. Alfayez, K. Srisopha, and B. Boehm, “Towards

better understanding of software quality evolution through commit-

impact analysis,” in 2017 IEEE International Conference on Software

Quality, Reliability and Security (QRS), 2017, pp. 251–262.

[35] Y. Weicheng, S. Beijun, and X. Ben, “Mining github: Why commit

stops–exploring the relationship between developer’s commit pattern and

ﬁle version evolution,” in 2013 20th Asia-Paciﬁc Software Engineering

Conference (APSEC), vol. 2. IEEE, 2013, pp. 165–169.

[36] N. Tsantalis, M. Mansouri, L. M. Eshkevari, D. Mazinanian, and D. Dig,

“Accurate and efﬁcient refactoring detection in commit history,” in Pro-

ceedings of the 40th international conference on software engineering,

2018, pp. 483–494.

[37] A. Eliseeva, Y. Sokolov, E. Bogomolov, Y. Golubev, D. Dig, and

T. Bryksin, “From commit message generation to history-aware commit

message completion,” in 2023 38th IEEE/ACM International Conference

on Automated Software Engineering (ASE). IEEE, 2023, pp. 723–735.

[38] M. M. Rahman and C. K. Roy, “An insight into the pull requests

of github,” in Proceedings of the 11th Working Conference on

Mining Software Repositories, ser. MSR 2014. New York, NY, USA:

Association for Computing Machinery, 2014, p. 364–367. [Online].

Available: https://doi.org/10.1145/2597073.2597121

[39] G. Gousios, M. Pinzger, and A. v. Deursen, “An exploratory study of

the pull-based software development model,” in Proceedings of the 36th

International Conference on Software Engineering, ser. ICSE 2014.

New York, NY, USA: Association for Computing Machinery, 2014, p.

345–355. [Online]. Available: https://doi.org/10.1145/2568225.2568260

[40] Y. Yu, H. Wang, G. Yin, and C. X. Ling, “Reviewer recommender

of pull-requests in github,” in 2014 IEEE International Conference on

Software Maintenance and Evolution, 2014, pp. 609–612.

[41] J. Jiang, Q. Wu, J. Cao, X. Xia, and L. Zhang, “Recommending tags

for pull requests in github,” Information and Software Technology, vol.

129, p. 106394, 2021.

[42] M. A. Batoun, K. L. Yung, Y. Tian, and M. Sayagh, “An empirical

study on github pull requests’ reactions,” ACM Transactions on Software

Engineering and Methodology, vol. 32, no. 6, pp. 1–35, 2023.

[43] M. Wessel, J. Vargovich, M. A. Gerosa, and C. Treude, “Github actions:

the impact on the pull request process,” Empirical Software Engineering,

vol. 28, no. 6, p. 131, 2023.

[44] T. F. Bissyand´

e, D. Lo, L. Jiang, L. R´

eveill`

ere, J. Klein, and Y. L.

Traon, “Got issues? who cares about it? a large scale investigation of

issue trackers from github,” in 2013 IEEE 24th International Symposium

on Software Reliability Engineering (ISSRE), 2013, pp. 188–197.

[45] N. Bettenburg, S. Just, A. Schr¨

oter, C. Weiss, R. Premraj, and T. Zim-

mermann, “What makes a good bug report?” in Proceedings of the 16th

ACM SIGSOFT International Symposium on Foundations of software

engineering, 2008, pp. 308–318.

[46] C. Sun, D. Lo, S.-C. Khoo, and J. Jiang, “Towards more accurate

retrieval of duplicate bug reports,” in 2011 26th IEEE/ACM International

Conference on Automated Software Engineering (ASE 2011). IEEE,

2011, pp. 253–262.

[47] Y. Tian, C. Sun, and D. Lo, “Improved duplicate bug report identiﬁca-

tion,” in 2012 16th European conference on software maintenance and

reengineering. IEEE, 2012, pp. 385–390.

[48] E. Guzman, D. Az´

ocar, and Y. Li, “Sentiment analysis of commit

comments in github: an empirical study,” in Proceedings of the 11th

Working Conference on Mining Software Repositories, ser. MSR 2014.

New York, NY, USA: Association for Computing Machinery, 2014, p.

352–355. [Online]. Available: https://doi.org/10.1145/2597073.2597118

[49] G. Destefanis, M. Ortu, D. Bowes, M. Marchesi, and R. Tonelli, “On

measuring affects of github issues’ commenters,” in Proceedings of

the 3rd International Workshop on Emotion Awareness in Software

Engineering, ser. SEmotion ’18. New York, NY, USA: Association

for Computing Machinery, 2018, p. 14–19. [Online]. Available:

https://doi.org/10.1145/3194932.3194936

[50] H. Hata, N. Novielli, S. Baltes, R. G. Kula, and C. Treude, “Github

discussions: An exploratory study of early adoption,” Empirical Software

Engineering, vol. 27, pp. 1–32, 2022.

[51] V. R. S´

anchez, P. N. Ayuso, J. A. Galindo, and D. Benavides, “Open

source adoption factors—a systematic literature review,” IEEE Access,

vol. 8, pp. 94 594–94 609, 2020.

[52] K. Crowston and J. Howison, “Assessing the health of open source

communities,” Computer, vol. 39, no. 5, pp. 89–91, 2006.

[53] C. M. Schweik, “Sustainability in open source software commons:

Lessons learned from an empirical study of sourceforge projects,”

Technology Innovation Management Review, vol. 3, no. 1, 2013.

[54] B. D. Sethanandha, B. Massey, and W. Jones, “Managing open source

contributions for software project sustainability,” in PICMET 2010

Technology Management For Global Economic Growth. IEEE, 2010,

pp. 1–9.

[55] J. Gamalielsson and B. Lundell, “Sustainability of open source software

communities beyond a fork: How and why has the libreofﬁce project

evolved?” Journal of systems and Software, vol. 89, pp. 128–145, 2014.

[56] S. S. Yau and J. S. Collofello, “Some stability measures for software

maintenance,” IEEE Transactions on Software Engineering, no. 6, pp.

545–552, 1980.

[57] M. Mattsson, H. Grahn, and F. M˚

artensson, “Software architecture

evaluation methods for performance, maintainability, testability, and

portability,” in Second International Conference on the Quality of

Software Architectures, 2006, p. 18.

[58] H. Gomaa, Software modeling and design: UML, use cases, patterns,

and software architectures. Cambridge University Press, 2011.

[59] M. Jazayeri, “On architectural stability and evolution,” in Reliable Soft-

ware Technologies—Ada-Europe 2002: 7th Ada-Europe International

Conference on Reliable Software Technologies Vienna, Austria, June 17–

21, 2002 Proceedings 7. Springer, 2002, pp. 13–23.

[60] D. Cotroneo, R. Natella, R. Pietrantuono, and S. Russo, “A survey of

software aging and rejuvenation studies,” ACM Journal on Emerging

Technologies in Computing Systems (JETC), vol. 10, no. 1, pp. 1–34,

2014.

[61] W. B. Frakes and K. Kang, “Software reuse research: Status and future,”

IEEE transactions on Software Engineering, vol. 31, no. 7, pp. 529–536,

2005.

[62] T. Abbas and A. Ahsan, “Value based incremental software develop-

ment,” in 17th IEEE International Multi Topic Conference 2014. IEEE,

2014, pp. 155–160.

[63] M. Salehie and L. Tahvildari, “Self-adaptive software: Landscape and

research challenges,” ACM transactions on autonomous and adaptive

systems (TAAS), vol. 4, no. 2, pp. 1–42, 2009.

[64] C. Krupitzer, F. M. Roth, S. VanSyckel, G. Schiele, and C. Becker, “A

survey on engineering approaches for self-adaptive systems,” Pervasive

and Mobile Computing, vol. 17, pp. 184–206, 2015.

[65] A. Filieri, M. Maggio, K. Angelopoulos, N. d’Ippolito, I. Gerostathopou-

los, A. B. Hempel, H. Hoffmann, P. Jamshidi, E. Kalyvianaki, C. Klein

et al., “Software engineering meets control theory,” in 2015 IEEE/ACM

10th International Symposium on Software Engineering for Adaptive

and Self-Managing Systems. IEEE, 2015, pp. 71–82.

[66] S. Shevtsov, M. Berekmeri, D. Weyns, and M. Maggio, “Control-

theoretical software adaptation: A systematic literature review,” IEEE

Transactions on Software Engineering, vol. 44, no. 8, pp. 784–810, 2017.

[67] N. Ducheneaut, “Socialization in an open source software community:

A socio-technical analysis,” Computer Supported Cooperative Work

(CSCW), vol. 14, pp. 323–368, 2005.

[68] H. Borges and M. T. Valente, “What’s in a github star? understanding

repository starring practices in a social coding platform,” Journal of

Systems and Software, vol. 146, pp. 112–129, 2018.

[69] J. Jiang, D. Lo, J. He, X. Xia, P. S. Kochhar, and L. Zhang, “Why and

how developers fork what from whom in github,” Empirical Software

Engineering, vol. 22, pp. 547–578, 2017.

[70] E. Eiroa-Lledo, R. H. Ali, G. Pinto, J. Anderson, and E. Linstead,

“Large-scale identiﬁcation and analysis of factors impacting simple bug

resolution times in open source software repositories,” Applied sciences,

vol. 13, no. 5, p. 3150, 2023.

[71] C. Kolassa, D. Riehle, and M. A. Salim, “The empirical commit

frequency distribution of open source projects,” in Proceedings of the

9th international symposium on open collaboration, 2013, pp. 1–8.

[72] J. Wachs, M. Nitecki, W. Schueller, and A. Polleres, “The geography of

open source software: evidence from github,” Technological Forecasting

and Social Change, vol. 176, p. 121478, 2022.

[73] S. O. Alexander Hars, “Working for free? motivations for participating

in open-source projects,” International journal of electronic commerce,

vol. 6, no. 3, pp. 25–39, 2002.

[74] R. A. Ghosh, “Understanding free software developers: Findings from

the ﬂoss study,” Perspectives on free and open source software, vol. 28,

pp. 23–47, 2005.

[75] R. J. Hyndman and G. Athanasopoulos, Forecasting: principles and

practice. OTexts, 2018.

[76] M. M. Kokar, K. Baclawski, and Y. A. Eracar, “Control theory-based

foundations of self-controlling software,” IEEE Intelligent Systems and

Their Applications, vol. 14, no. 3, pp. 37–45, 1999.

[77] A. Alali, H. Kagdi, and J. I. Maletic, “What’s a typical commit? a

characterization of open source software repositories,” in 2008 16th

IEEE international conference on program comprehension. IEEE, 2008,

pp. 182–191.

[78] Z. Yang, C. Wang, J. Shi, T. Hoang, P. Kochhar, Q. Lu, Z. Xing,

and D. Lo, “What do users ask in open-source ai repositories? an

empirical study of github issues,” in 2023 IEEE/ACM 20th International

Conference on Mining Software Repositories (MSR). IEEE, 2023, pp.

79–91.

[79] W. Tao, Y. Zhou, Y. Wang, W. Zhang, H. Zhang, and Y. Cheng, “Magis:

Llm-based multi-agent framework for github issue resolution,” Advances

in Neural Information Processing Systems, vol. 37, pp. 51 963–51 993,

2024.

[80] D. M. Soares, M. L. de Lima J´

unior, L. Murta, and A. Plastino, “Accep-

tance factors of pull requests in open-source projects,” in Proceedings

of the 30th annual ACM symposium on applied computing, 2015, pp.

1541–1546.

[81] G. Gousios, M.-A. Storey, and A. Bacchelli, “Work practices and

challenges in pull-based development: The contributor’s perspective,”

in Proceedings of the 38th international conference on software engi-

neering, 2016, pp. 285–296.

[82] M. L. de Lima J´

unior, D. Soares, A. Plastino, and L. Murta, “Predict-

ing the lifetime of pull requests in open-source projects,” Journal of

Software: Evolution and Process, vol. 33, no. 6, p. e2337, 2021.

[83] R. Kaur, K. K. Chahal, and M. Saini, “Understanding community

participation and engagement in open source software projects: A

systematic mapping study,” journal of king saud university-computer

and information sciences, vol. 34, no. 7, pp. 4607–4625, 2022.

[84] B. Johnson, C. Bird, D. Ford, N. Forsgren, and T. Zimmermann, “Make

your tools sparkle with trust: The picse framework for trust in software

tools,” in 2023 IEEE/ACM 45th International Conference on Software

Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE,

2023, pp. 409–419.

0 views·11 pages

An Empirical Validation of Open Source Repository Stability Metrics PDF Free Download

An Empirical Validation of Open Source Repository Stability Metrics PDF free Download. Think more deeply and widely.

Uploaded by Patrick R. Manning on 2/11/2026

/11

100%