Deriving an article-level metric from Impact Factors | 2022-02-10 | By: Saša Marcan

A simple method to derive an article-level impact indicator from Journal Impact Factors (JIFs) is described here. I can’t vouch for its novelty, but I didn’t find anything quite similar reviewing the literature. Thanks mostly to Waltman and Traag I did learn about two proposals to fuse JIFs and citation counts/networks into hybrid impact indicators — namely by Abramo et al., and Levitt and Thelwall — however those take a different approach.

The logic behind this indicator is quite simple too. We (mis)use JIFs to evaluate individual articles. We (mis)use citation counts to evaluate individual articles. Would we end up with something at least as useful if we take JIFs of citing articles into account? I hope so.

The way it works is perhaps best visually represented in Figure 1. Basically, we look up citations to paper A — there’s five of them here labeled B-F — and list the citing papers’ JIFs. Based on that list we calculate paper A’s… let’s call it Citing Impact Factor (CIF). We can go with mean or median values, whichever is more appropriate. This is the most complicated and tedious part.

Figure 1!

Once we know our paper’s CIF we compare it to its JIF and interpret the results. In this particular case we see that the mean CIF (2.209) is ~0,9 lower than the JIF (3.103) — looks bad at first glance but that’s mostly owing to that 0 JIF paper B (should those be excluded? let me know). The median CIF (2.784) is more in line with the paper’s JIF, but we could say it’s slightly underperforming. We also see that our paper attracted 5 citations in the JIF calculation window, which is more than its JIF promised.

So what does this all tell us about our paper A? It’s not exceptional obviously, but it’s not bad either. It’s okay. In quantitative terms it overperforms — it scored 5 citations even though we expected 3 based on JIF. In qualitative terms its impact was limited — it hasn’t rippled through literature higher up in the hierarchy, but didn’t fizzle out completely either. We can characterize it as a small impact with localized effect.

To boil it down to an even more intuitive heuristic we can calculate CIF / JIF (Figure 1, bottom-right). That way we end up with a single figure similar to the Article Influence Score — if it’s greater than 1, that’s a positive indication; if it’s less than 1, that’s a negative indication. Obviously this doesn’t work as intended for 0 JIF papers — better stick with CIF for those.

That’s it. I picked the most boring example to demonstrate how the JIF can be conceived of as a promise of impact for a given paper, and its citation count + CIF as a manifestation of its impact. Given how elusive the measurement of impact is CIF might be useful, even if only as an additional data point to consider along the ubiquitous JIF — it’s certainly no panacea metric.

The least boring examples in my experience with CIF so far are the edge cases — works appearing in publications that don’t or can’t have a JIF. Those are the ones that regularly get less or no recognition in research evaluation exercises, and CIF might help us gauge where in the established JIF-based hierarchy of journals such papers would fit in. My favorite so far is a paper published in a 0 JIF journal, but with a mean CIF of 5.207 and a median CIF of a whopping 6.789 based on 24 citations! A veritable Q1 paper by that measure, but most likely to be glossed over in research evaluation.

To wrap it up: a tentative list of advantages and disadvantages of CIF.

Advantages:

  • it’s based on JIF, a metric we’re all familiar with
  • it’s simple to calculate and easy to use (and WOS could conceivably automate its calculation for WOS-indexed subset of works), with no extra weighing, normalization, etc. involved
  • it can be used to evaluate non-indexed content and discover hidden gems in the literature

Disadvantages:

  • it’s based on JIF, a metric we’re all familiar with
  • it’s useless for evaluation of top-JIF papers since their impact is more likely to ripple downwards through the journal hierarchy
  • it has no predictive ability since it takes time for a paper to accrue a meaningful number of citations for CIF calculation

References:

Abramo G, D’Angelo CA, Felici G. Predicting publication long-term impact through a combination of early citations and journal impact factor. Journal of Informetrics. 2019 Feb 1;13(1):32-49. doi: https://doi.org/10.1016/j.joi.2018.11.003

Levitt JM, Thelwall M. A combined bibliometric indicator to predict article impact. Information Processing & Management. 2011 Mar 1;47(2):300-8. doi: https://doi.org/10.1016/j.ipm.2010.09.005

Waltman L and Traag VA. Use of the journal impact factor for assessing individual articles: Statistically flawed or not? [version 2; peer review: 2 approved]. F1000Research. 2021, 9:366 doi: https://doi.org/10.12688/f1000research.23418.2

Two models of scholarly communication | 2017-11-20 | By: Saša Marcan

Taking some liberties with Shannon’s and Weaver’s model of communication, I present two models of scholarly communication: our current one, and our final one.

State of the art

The current system of scholarly communication is represented by the model below.

Current model of scholarly communication

 

Description: Author (A) starts with a research idea (1). Upon defining a hypothesis and a research methodology, (A) collects an appropriate sample of research data (2). Analyzing (2) relative to (1) outputs a scientific information (3). The whole bundle (1), (2) and (3) is then encoded into a message to be transmitted — a draft paper (4).

(A) transmits the encoded message to a publication (P). It seeks out a peer reviewer (PR) to decode (4) and verify its integrity. Due to the noisy nature of encoding (1), (2), and (3) into a single literary form, decoding failures may possibly distort the scientific information (3?).

Decoding failures may at this point be queried by (PR), and relayed via (P) back to (A) for further clarification. The back-and-forth is represented by the dotted-line process; the author’s re-encoding of (4) relative to feedback received from (PR) yields a final paper (4′).

Upon receiving the final paper, a reader (R) — just like (PR) a step before — faces equal prospect of decoding failures distorting (3?), but without the privilege to query (A) for further clarification.

Moving forward

Two critical bottlenecks identified by the above model are: the author’s encoding process (subject to a plethora of cognitive biases, conflicts of interest, etc.), and the dotted-line feedback process (which may take a long time, and is of questionable utility due to author’s feedback itself being another product of author’s encoding). One other important implication of the above model is that both (PR) and (R) essentially engage in the same activity — decoding (4) and trying to derive (3) from it.

With those three pain points taken into consideration, the optimal system of scholarly communication — in terms of both efficiency and economy — is represented by the following repository-based model.

Optimal model of scholarly communication

 

Description: Author (A) starts with a research idea (1). A hypothesis must be reduced to a single statement that shall be either true or false relative to research data (2). Upon defining such a hypothesis, (A) may proceed create a scholarly record (4) in a central repository (CR). Upon defining a research methodology, (A) collects an appropriate sample of (2). Analyzing (2) relative to (1) outputs scientific information (3), which is then encoded into (4) in a single bit (true/false).

(1) and (2) are captured in (A)’s institutional repository (IR), and persistently linked to (4). Each complete (4) ultimately contains: a central hypothesis defined in (1), persistent links to (1) and (2), and output (3) relative to them.

A reader (R) reviewing (4) is easily able take a closer look at (1) and (2) in the (IR), and ultimately dispute the validity of (3) and integrity of (4) if warranted by presence of methodological or interpretational issues. Feedback — represented by the dotted-line process — may be exchanged between (A) and (R) at any point after a (4) is created.

This model ensures that the ultimate size of each message — a scientific record — transmitted via (CR) is kept minimal: size of hypothesis + size of links to (1) and (2) + that 1 bit of (3).