Translated (#28352)

* 1st h

* 2

* finished

* finished
This commit is contained in:
CanYellow 2023-01-02 14:58:04 +08:00 committed by GitHub
parent c83005dd4a
commit 93a7b20dc1
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 145 additions and 142 deletions

View File

@ -1,142 +0,0 @@
[#]: subject: "8 ideas for measuring your open source software usage"
[#]: via: "https://opensource.com/article/22/12/open-source-usage-metrics"
[#]: author: "Georg Link https://opensource.com/users/georglink"
[#]: collector: "lkxed"
[#]: translator: "CanYellow"
[#]: reviewer: " "
[#]: publisher: " "
[#]: url: " "
8 ideas for measuring your open source software usage
======
Those of us who support open source project communities are often asked about usage metrics — a lot. The goal of these metrics is usually to demonstrate the software's importance as measured by its user base and awareness. We typically want to know: how many people use the software, how many installations are there, and how many lives are being touched.
To make a long story short: We cannot answer these questions directly.
Sorry to disappoint you if you were hoping for a definitive solution. No one has the perfect answers to questions about usage metrics. At least, no precise answers.
The good news is that there are approximations and alternative metrics that can satisfy your thirst for knowledge about the software's usage, at least partially. This article explores these alternatives including their benefits and shortcomings.
### Downloads
When you visit websites that offer software, you can often see how many times the software has been downloaded. An example that comes to mind is Firefox, which used to have a download counter. It was an impressive number and gave the impression that Firefox was a popular browser—which it was for a while.
However, individual behavior can directly impact the accuracy of this number. For example, when a person wipes their machine regularly, each rebuild incurs a separate download. To account for this reality, there needs to be a way to subtract a few dozen (maybe hundreds) downloads from the number because of that one person.
Not only can downloads overestimate usage, but they can also underestimate usage. For instance, a system administrator may download a new version of Firefox once to a flash drive and then install it on hundreds of devices.
Download metrics are easy to collect because you can log each download request on the server. The problem is that you don't know what happens to the software after it is downloaded. Was the person able to use the software as anticipated? Or did the person run into issues and abandon the software?
For open source projects, you can consider a variety of download metrics, such as the number of binaries downloaded from:
- the project website
- package managers such as npm, PyPi, and Maven
- code repositories like GitHub, GitLab, and Gitee
You may also be interested in downloads of the source code because downstream projects are most likely to use this format (also read [How to measure the impact of your open source project][1]). Relevant download metrics include:
- The number of clones (source code downloads) from code repositories like GitHub, GitLab, and Gitee
- The number of archives (tar, zip) downloaded from the website
- The number of source code downloads through package managers like npm, PyPi, and Maven
Download metrics for source code are an even less reliable measure than binary downloads (although there is no research to demonstrate this). Just imagine that a developer wants to use the most recent version of your source code and has configured their build pipeline to always clone your repository for every build. Now imagine that an automated build process was failing and retrying to build, constantly cloning your repository. You can also imagine a scenario where the metric is lower than expected—say the repository is cached somewhere, and downloads are served by the cache.
**[ Related read [5 metrics to track in your open source community][2] ]**
In conclusion, download metrics are good proxies for detecting trends and providing context around current usage. We cannot define specifically how a download translates to usage. But we can say that an increase in downloads is an indicator of more potential users. For example, if you advertise your software and see that download numbers are higher during the campaign, it would be fair to assume that the advertisement prompted more people to download the software. The source and metadata of the download can also provide additional context for usage patterns. What versions of your software are still in use? What operating system or language-specific versions are more popular? This helps the community prioritize which platforms to support and test.
### Issues
As an open source project, you probably have an issue tracker. When someone opens an issue, two common goals are to report a bug or request a feature. The issue author has likely used your software. As a user, they would have found a bug or identified the need for a new feature.
Obviously, most users don't take the extra step to file an issue. Issue authors are dedicated users and we are thankful for them. Also, by opening an issue, they have become a non-code contributor. They may become a code contributor. A rule of thumb is that for every 10,000 users, you may get 100 who open an issue and one who contributes code. Depending on the type of user, these ratios may differ.
With regard to metrics, you can count the number of issue authors as a lower-bound estimation for usage. Related metrics can include:
- The number of issue authors
- The number of active issue authors (opened an issue in the last 6 months)
- The number of issue authors who also contribute code
- The number of issues opened
- The number of issue comments written
### User mailing lists, forums, and Q&A sites
Many open source projects have mailing lists for users, a forum, and presence on a Q&A site, such as Stack Overflow. Similar to issue authors, people who post there can be considered the tip of the iceberg of users. Metrics around how active a community is in these mailing lists, forums, and Q&A sites can also be used as a proxy for increasing or decreasing the user base. Related metrics can focus on the activity in these places, including:
- The number of user mailing list subscribers
- The number of forum users
- The number of questions asked
- The number of answers provided
- The number of messages created
### Call-home feature
To get accurate counts of users, one idea is to have your software report back when it is in use.
This can be creepy. Imagine a system administrator whose firewall reports an unexpected connection to your server. Not only could the report never reach you (it was blocked), but your software may be banned from future use.
Responsible ways to have a call-home feature is an optional service to look for updates and let the user know to use the latest version. Another optional feature can focus on usage telemetry where you ask the user whether your software may, anonymously, report back how the software is used. When implemented thoughtfully, this approach can allow users to help improve the software by their style of using it. A user may have the opinion: "I often don't allow this usage information sharing but for some software I do because I hope the developers will make it better for me in the long term."
### Stars and forks
Stars and forks are features on social coding platforms like GitHub, GitLab, and Gitee. Users on these platforms can star a project. Why do they star projects? GitHub's documentation explains, "You can star repositories and topics to keep track of projects you find interesting and discover related content in your news feed." Starring is the equivalent of bookmarking and also provides a way to show appreciation to a repository maintainer. Stars have been used as an indicator of the popularity of a project. When a project has a big announcement that attracts considerable attention, the star count tends to increase. The star metric does not indicate the usage of the software.
Forks on these social coding platforms are clones of a repository. Non-maintainers can make changes in their fork and submit them for review through a pull request. Forks are more a reflection of community size than stars. Developers may also fork a project to save a copy they can access even after the original repository has disappeared. Due to the use of forks in the contribution workflow, the metric is a good indicator for the developer community. Forks do not typically indicate usage by non-developers because non-developers usually do not create forks.
### Social media
Social media platforms provide gathering places for people with shared interests, including Facebook, Instagram, LinkedIn, Reddit, Twitter, and more. Using a social media strategy, open source projects can attract people with interest and affinity for their projects by setting up respective gathering spaces on these platforms. Through these social media channels, open source projects can share news and updates and highlight contributors and users. They can also be used to meet people who would not otherwise interact with your project.
We are hesitant to suggest the following metrics because they have no clear connection to actual usage of your software and often require analysis for positive, negative, and neutral sentiment. People may be excited about your project for many different reasons and want to follow it without actually using it. However, like other metrics already discussed, showing that you are able to draw a crowd in social media spaces is an indicator of the interest in your project overall. Metrics for different social media platforms may include:
- The number of followers or subscribers
- The number of messages
- The number of active message authors
- The number of likes, shares, reactions, and other interactions
### Web analytics and documentation
Website traffic is a useful metric as well. This metric is influenced more by your outreach and marketing activities than your number of users. However, we have an ace up our sleeve: our user documentation, tutorials, handbooks, and API documentation. We can see what topics on our website draw attention, including documentation. The number of visitors to the documentation would arguably increase with an increase in the number of discrete users of the software. We can therefore detect general interest in the project with visitors to the website and more specifically observe user trends by observing visitors to the documentation. Metrics may include:
- The number of website visitors
- The number of documentation visitors
- The duration visitors spend on your website or in documentation
### Events
Event metrics are available if you are hosting events around your project. This is a great way to build community. How many people submit abstracts to speak at your events? How many people show up to your events? This can be interesting for both in-person and virtual events. Of course, how you advertise your event strongly influences how many people show up. Also, you may co-locate your event with a larger event where people travel anyway, and thus, are in town and can easily attend your event. As long as you use a consistent event strategy, you can make a case that a rise in speaker submissions and attendee registrations are indicative of increasing popularity and user base.
You don't need to host your own event to collect insightful metrics. If you host talks about your project at open source events, you can measure how many people show up to your session focused on your project. At events like FOSDEM, some talks are specifically focused on updates or announcements of open source projects and the rooms are filled to the brim (like almost all sessions at FOSDEM).
Metrics you might consider:
- The number of attendees at your project-centric event
- The number of talks submitted to your project-centric event
- The number of attendees at your project-centric talks
### Conclusion about approximating usage of open source software
As we've illustrated, there are many metrics that can indicate trends around the usage of your software, and all are imperfect. In most cases, these metrics can be heavily influenced by individual behavior, system design, and noise. As such, we suggest that you never use any of these metrics in isolation, given the relative uncertainty of each one. But if you collect a set of metrics from a variety of sources, you should be able to detect trends in behavior and usage. If you have the means to compare the same set of metrics across multiple open source projects with commonalities—such as similar functionality, strong interdependencies, hosted under the same foundation, and other characteristics—you can improve your sense of behavioral baselines.
Note that in this overview, we've also chosen to highlight metrics that evaluate direct usage. As most software depends on a variety of other software packages, we would be remiss if we did not mention that usage and behavior can also be heavily impacted by indirect usage as part of a dependency chain. As such, we recommend incorporating the count of upstream and downstream dependencies as another layer of context in your analysis.
In closing, as the wielder of data and metrics, we encourage you to recognize the power and responsibility that you have for your stakeholders. Any metric that you publish has the potential to influence behavior. It is a best practice to always share your context—bases, sources, estimations, and other critical contextual information—as this will help others to interpret your results.
We thank the CHAOSS Community for the insightful conversation at CHAOSScon EU 2022 in Dublin, Ireland that sparked the idea for this blog post and to the CHAOSS Community members who reviewed and helped improve this article.
--------------------------------------------------------------------------------
via: https://opensource.com/article/22/12/open-source-usage-metrics
作者:[Georg Link][a]
选题:[lkxed][b]
译者:[译者ID](https://github.com/译者ID)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://opensource.com/users/georglink
[b]: https://github.com/lkxed
[1]: https://opensource.com/article/18/5/metrics-project-success
[2]: https://opensource.com/article/22/11/community-metrics

View File

@ -0,0 +1,145 @@
[#]: subject: "8 ideas for measuring your open source software usage"
[#]: via: "https://opensource.com/article/22/12/open-source-usage-metrics"
[#]: author: "Georg Link https://opensource.com/users/georglink"
[#]: collector: "lkxed"
[#]: translator: "CanYellow"
[#]: reviewer: " "
[#]: publisher: " "
[#]: url: " "
衡量你开源软件使用的8个方法
======
我们这些支持开源项目社区的人经常被问到很多有关使用指标的问题。这些指标通常是为了通过用户群体和用户认知来衡量软件的重要性。我们一般都想知道有多少人使用该软件,有多少次安装以及触发了多少次实例。
长话短说,我们尚无法直接回答上述问题。
如果你想要寻找一个终极解决方案,那很抱歉要让你失望了。有关使用指标的问题,没有人有完美的答案,至少没有准确的答案。
好消息是,有一些近似的和可选的指标至少能够部分地满足你对软件使用知识的渴求。本文探讨了这些替代方案以及他们的优点和缺点。
### 下载
当你浏览提供软件的网站时,你通常可以看到软件的下载次数。映入脑海的一个例子是 Firefox 它曾经有一个下载计数器。Firefox 的下载量是一个印象深刻的数字,给人留下 Firefox 在一段时间内是一个流行的浏览器。
然而,个人行为会直接影响这一数字的准确性。举例而言,当一个人定期重置他们的设备时,每一次重建都会引发一次独立的下载。考虑到这一现实,需要设计一种方法从下载量中去除几十次由上述人员带来的下载次数。
下载量不仅会高估使用量,还会低估使用量。例如,一个系统管理员可能会下载一个新版本的 Firefox 一次并将其拷贝到闪存上,然后安装到数百台设备上。
下载量指标是易于收集的,你可以在服务器上记录每一个下载请求。问题在于你不知道在这些软件下载以后会发生什么。下载它的人是否如预期的那样使用软件,还是运行时遇到了问题而遗弃了软件。
对于开源项目而言,你可以考虑各种下载量指标,比如来自以下途径的下载指标:
- 项目官网
- 包管理器,比如 npmPyPi 和 Maven
- 代码仓库,如 GithubGitLabGitee 等
你可能还对源代码的下载量感兴趣,因为下游项目更可能使用源代码形式 (见于[《如何衡量你的开源项目的影响》][1]一文)。相应的下载指标包括:
- 从代码仓库克隆的数量,比如 GitHubGitLab 和 Gitee
- 从官网下载的归档文件 (tarzip) 的数量
- 通过像 npmPyPi 和 Maven 这样的包管理器下载的源代码数量
源代码的下载指标比二进制代码的下载指标更不可靠(虽然尚无相关研究表明如此)。试想一下,一名开发人员想要你的最新版本的源代码并已经配置好了他们的构建流程来总是在每一次构建中都克隆你的版本库。再想象一个自动构建过程失败了,它试图重新构建而不断地克隆你的版本库。你还可以考虑这样一个下载量低于预期的场景——源代码仓库在某些地方缓存了,下载来源是由这些缓存所提供的。
**[相关阅读[跟踪你的开源社区的5个指标][2]]**
总而言之,下载量指标是用于提供当前使用情况和探测趋势的很好的代表。我们无法明确地定义一次下载是如何转化为使用的。不过我们可以认为增加的下载量是更多潜在用户的标志。举例而言,如果你为你的软件做了广告并在活动期间得到了更高的下载量,可以公正的假定广告推动了更多人下载软件。下载行为的来源与元数据还可以提供额外的与使用行为相关的内容。你的软件的哪些版本仍在使用中?什么操作系统或者专属语言的版本更加流行?这有助于社区决定将哪个平台的软件作为支持与测试的优先选项。
### 提问
作为一个开源项目,你可能有一个问题追踪器。当某个人提出一个议题时一般有两个目标,报告一个漏洞或者请求增加一项功能。提问者可能使用过你的软件。作为一名用户,他可能发现了一个漏洞或者确定了对新功能的需求。
很明显大多数用户不会执行额外的步骤来提交问题。提问者是我们的忠实用户我们对他们表示感谢。此外通过提出问题他们成为非代码贡献者他们也有希望成为代码贡献者。经验法则是大约每10000名用户中可能有100名提问者以及1名代码贡献者当然取决于用户类型上述比例可能有出入。
回到指标问题,你可以将提问者数量作为评估使用量的下界。相关的指标包括:
- 提问者数量
- 活跃提问者的数量 (在过去6个月内提出问题的提问者)
- 同时有代码贡献的提问者的数量
- 尚未解决的问题的数量
- 发布的问题评论的数量
### 邮件列表,论坛和问答网站
很多开源项目都拥有用户邮件列表,论坛,并且出现在类似 Stack Overflow 的问答网站上。与提问者一样,在这些地方发帖的人可被视作用户的冰山一角。与邮件列表、论坛和问答网站上的社区活跃程度相关的指标也可用于反映用户数量的上升或下降。相关指标可以集中于以下地方的活动,包括:
- 用户邮件列表的订阅量
- 论坛用户的数量
- 问题的数量
- 答案的数量
- 发布信息的数量
### 上报功能
为了获得精确的用户数量,一个方案是让软件在使用时上报信息。
这是令人毛骨悚然的。想象一下,系统管理员的防火墙报告了一个非预期的到你的服务器的网络连接,你不仅无法再收到软件报告(被防火墙拦截了),恐怕连你的软件也可能在未来被禁止使用。
负责任的设置上报功能的方式为设置一项可选服务来检查更新并让用户知道使用最新版本。另一项可选功能可以集中在使用检测上,你可以通过该功能询问用户是否允许匿名上报软件使用情况。如果经过深思熟虑的实施,这种方式可以允许用户通过他们使用软件的方式帮助优化软件。用户可以持有这样的意见:我一般不允许使用信息分享;但对于一些软件,因为希望开发人员从长远来看会将软件优化得更好,我愿意这样做。
### 加星标与克隆
加星标与克隆是如 GitHub 、 GitLab 、 Gitee 等社交化编程平台的功能。平台用户可以给一个项目加星标。为什么他们要给项目加星标呢GitHub 的文档作出了解释:你可以给一个仓库和主题加星标以保持对感兴趣的项目的跟踪和在你的新闻订阅中发现相关的内容。给一个项目加星标与将其加入书签的效果一样,并且还提供了一种向项目仓库的维护者展示赞赏的方式。星标的数量已经成为了项目流行程度的标志。当一个项目发布重大公告并获得相当的关注时,项目的星标数量会呈上升趋势。星标的数量指标并不反映软件的使用量。
在社交化编程平台上的克隆 (Forks) 即将项目仓库复制一份在自己名下。仓库的非维护者可以在他们自己的克隆仓库中做修改并将修改通过拉取请求 (pull request) 的方式提交审核。克隆比星标更能反映软件社区的大小。开发者也可能为了保存一份代码副本而克隆一个项目以便在原始仓库消失后他们仍能访问代码。因为克隆功能在代码贡献工作流中的应用,克隆量是衡量开发社区的良好指标。克隆量通常也不反映非开发人员的使用,因为非开发人员一般不创建克隆。
### 社交媒体
包括 Facebook、Instagram、LinkIn、Reddit、Twtter等的社交媒体平台提供了相同兴趣的人们聚集的平台。采用社交媒体策略开源项目可以通过在平台上设置相应的聚集空间来吸引对项目感兴趣的人们。通过这些社交媒体途径开源项目可以分享项目新闻、更新、突出共献者和用户。这些社交媒体途径还可以用于认识那些本不会通过其他途径与项目互动的人。
我在犹豫是否建议关注指标因为它与软件的真实使用量没有清晰的联系,并通常需要分析其中的积极、消极和中性的情绪。人们可能因为很多不同的原因对你的项目感到激动并想要在不实际使用的情况下关注它。然而与之前已经讨论过的指标一样,能够在社交媒体上吸收人群本就是项目受关注的整体指标。不同社交媒体平台的指标包括:
- 关注与订阅的数量
- 消息的数量
- 活跃的消息作者的数量
- 喜欢、分享、回复以及其他交互的数量
### 网站分析与文档
网站流量也是一个有用的指标。这一指标主要由你的服务范围以及市场营销活动影响而不是用户量。然而,我们还有一张王牌:我们的用户文档、教程、手册以及 API 文档。我们可以发现我们的网站以及文档中的什么主题更引人注意。文档的访问者数量应当大概随着软件的使用者数量增长而增长。因此我们可以通过网站的访问量探知对项目的一般性的兴趣并进一步通过观察文档的访问者观察用户风向。这些指标包括:
- 网站访问者数量
- 文档访问者的数量
- 访问者在你的网站与文档上所花的时间
### 活动
活动指标可以在你主持与项目相关的活动时使用。这是建立社区的很好的方式。有多少人提交摘要在你的活动中发言?有多少人出席你的活动?不论是在线下活动还是线上活动中这可能都很有趣。当然,你如何推广你的活动可以很大程度上决定有多少人到场。同时你可以将自己的活动与人们出行的大型活动放在一起以方便人们参加你的活动。只要你使用一贯的活动策略,你可以通过演讲者提交与参会者注册的增加表征软件受欢迎程度与用户群的增加。
你并不需要举办你自己的活动来收集有用的指标。如果你在开源活动中主持有关你项目的讨论,你可以衡量有多少人出席主要聚焦你的项目的会议。像 [FOSDEM][T1] 这样的活动,一些讨论特别聚焦开源项目的更新与公告,会议室中都挤满了人(像 FOSDEM 的所有会议一样)。
你可以考虑如下指标:
- 以你的项目为中心的活动的出席人数
- 提交到以你的项目为中心的活动的演讲数量
- 以你的项目为中心的演讲的出席人数
### 关于估算开源软件使用的结论
正如我们已经如上展现的,有很多指标可以反映软件使用的趋势,没有一个指标是完美的。在大多数情况下,这些指标可能被个人行为、系统设计和噪音所严重影响。因此,考虑到每一个指标的相对不确定性,我们建议你不要孤立地使用任何一个指标。但是如果你从不同的来源收集了一系列的指标,你应当能够探测到用户行为与软件使用的趋势。如果你有手段比较多个具有共性——比如相似的功能、强大的相互依赖以及由同一基金会托管和其他特征——的开源项目的同一组指标,你就可以提升你对用户行为基线的感知。
需要注意的是,在本概述中,我们选择突出能够评估直接使用情况的指标。而大多数软件都依赖于其他各种软件包,如果我们不提及作为软件依赖链的一部分被间接使用也会严重影响软件使用与行为,这就是我们的疏忽。因此,我们建议将上下游依赖的合计数量作为你的分析中的另一层内容。
最后,作为数据与指标的使用者,我们鼓励你认识到你的利益相关方的权利与责任。你发布的任何指标都有可能影响用户行为。最佳实践是总是一同分享你的背景信息——基础、来源、估算方法和其他关键上下文信息,这有助于其他人解释你的结果。
我们感谢 [CHAOSS][T2] 社区在爱尔兰都柏林举行的 CHAOSScon EU 2022 上的富有洞察力的对话,上述对话激发这篇博文的想法。我们还要感谢审阅并帮助优化本文的 CHAOSS 社区的成员。
--------------------------------------------------------------------------------
via: https://opensource.com/article/22/12/open-source-usage-metrics
作者:[Georg Link][a]
选题:[lkxed][b]
译者:[CanYellow](https://github.com/CanYellow)
校对:[校对者ID](https://github.com/校对者ID)
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
[a]: https://opensource.com/users/georglink
[b]: https://github.com/lkxed
[1]: https://opensource.com/article/18/5/metrics-project-success
[2]: https://opensource.com/article/22/11/community-metrics
[T1]: https://fosdem.org/
[T2]: https://chaoss.community