Article Text
Statistics from Altmetric.com
Introduction
Researching a topic and generating an academic paper is a nuanced skill. It can take months or years to produce and publish one, if it is ever published at all. What if there were a way to make this happen instantly? Artificial intelligence (AI) may hold a flame to quickly analyse a research topic and generate an academic paper. There are many forms of AI; this editorial discusses natural language model-based AI, such as ChatGPT, and their potential ability to generate academic papers.
Natural language model-based AI, in particular ChatGPT, is generating new content and a lot of controversies. This AI software is innovative. It generates, de novo, content that has a natural conversational flow. It can quickly answer questions and write poems, fan fiction and children’s books.1 ChatGPT has even passed the United States Medical Licensing Examination theory section with no additional training and/or years of studying medicine.2
Language-based AI has already entered the scientific community. Nature reported that four manuscripts in preprint credit ChatGPT as an author.3 Also, an article reported that AI had been used to generate an academic paper.4
In this editorial, we discuss the pros and cons of AI for manuscript generation in sports and exercise medicine (SEM), generate an academic paper using AI and bypass AI-generation detection, and discuss potential concerns regarding natural language model-based AI. We aim to get insights on how AI, in particular ChatGPT and similar language model-based AI, will impact the future of manuscript generation in SEM. To achieve such purpose, we ought to consider what is an academic paper, whether AI should write academic papers, what the issues are, what our stance should be on AI-generated texts and how we deal with them.
What is an academic paper, and is AI capable of writing one?
An academic paper has a thesis and aims to persuade readers of its viewpoint using the best available evidence. Before this paper can be created, extensive research must have an advanced and balanced understanding of the topic. Research is not merely collecting and presenting data, but applying investigative and critical thinking to generate quality, interesting and original work that improves the field.5 Professionals generally write papers, and all concepts introduced are referenced accurately.
We decided to test AI’s ability to generate two academic papers, essay 1 (online supplemental appendix 1) and essay 2 (online supplemental appendix 5). For essay 1, the request ‘Can you please write a paper about the pros and cons of using AI to write scientific manuscripts? Include Harvard referencing’ was entered into ChatGPT to generate essay 1. For essay 2, the request was: ‘Can you please write a short essay on the pros and cons of using AI in sports medicine? Include Harvard referencing’ (online supplemental appendix 5). Essay 1 took a mere 47 s to be generated on 24 December 2022. Essay 2 was requested on 24 January 2023, and it was generated in 1 min and 36 s. In comparison, the current editorial was started on 24 December 2022, and our final proof was submitted on 7 February 2023. This was a total of 45 days with a team of experienced authors.
Supplemental material
From this experiment of seeing whether AI is capable of writing a quality academic paper, there are issues regarding whether the produced content is originally interesting, whether the paper shows an advanced and balanced understanding of the topic, and whether it has used critical thinking to generate original thought or if it is simply a summary of knowledge on a topic. The bibliography generated for the two generated essays (online supplemental appendix 1 and 5) was inaccurate. As such, there were no matching authors and publication titles. Due to the falsification of references, such manuscripts would likely be desk rejected or rejected by diligent peer-reviewers, although errata in many journals correcting references suggests diligence is not universal. Regardless of whether or not it is capable of generating a quality manuscript, natural language-based AI models such as ChatGPT are nowadays considered tools worth watching closely.
What are the issues?
Ethics and integrity concerns
An obvious and significant concern is plagiarism of original content. Although not ChatGPT, AI journalism has been known to commit extensive plagiarism.6 Other ethical concerns are also raised. For instance, should there be a threshold for how much AI-generated content is acceptable? Nevertheless, the personification of AI-based tools such as ChatGPT may be an objectionable and debatable topic for a wider audience.
Also, is it ethical to have AI generate scientific papers? Multiple companies advertise that they will build an academic paper using AI. These companies can be found online via a quick Google Search. It is important to consider and question whether it is in the interest of our SEM researchers to have novel and interesting theses in mind to advance SEM and improve human health outcomes rather than using AI to generate such ideas.
Equity concerns
Currently, ChatGPT is free of cost by using ChatGPT Research Preview. One major problem that one might foresee is that ChatGPT and other similar AI might turn into ‘prohibitively’ expensive subscription-based tools based on their publicity. This might cause an imbalance in equitable resource distribution to researchers in SEM and other fields.
Accuracy concerns
In addition to extensive plagiarism, writing errors were significant among the AI journalism mentioned above,6 and in the references of our AI-generated academic papers (online supplemental appendix 1 and 5).
Furthermore, there are concerns with the completeness of the information. Someone asked ChatGPT to instruct them on how to build a personal computer (PC). The information generated by ChatGPT missed critical steps that may have rendered this PC useless.7
Potentially flawed AI detection
It has been found that it can be difficult to discern the difference between AI-generated and original abstracts.8 There are currently a few tools, including GPTZero, GPT-2 Output Detector and AI Detector, to detect whether a current AI language model generated a text. The tools represent whether it believes the paper is ‘Real’ (human-generated) or ‘Fake’ (AI generated), with its confidence, reported as a percentage. It is outside the scope of this editorial to explain the intricacies of this Real/Fake language model calculation.9
One concerning point for AI detection is that the authors have discovered that by using additional paraphrasing AI in essay 1 (online supplemental appendix 1) and essay 2 (online supplemental appendix 5) (and producing rewritten manuscripts (online supplemental appendix 3, 7)), the ‘Real’ percentage using the GPT-2 Output Detector on essay 1 (online supplemental appendix 1) went from 0.02% (online supplemental appendix 2) to 99.52% (online supplemental appendix 4), and from 61.96% (online supplemental appendix 6) to 99.98% (online supplemental appendix 8) on essay 2 (figure 1).
New novel protection methods may need to be created and implemented for AI detection. No other detectors for AI language models were tested; however, we have discovered that the GPT-2 Output Detector alone cannot seem to be relied on solely for AI detection.
These ethical, equity, accuracy and detection concerns are potential threats to the integrity of scientific literature if AI-generated manuscripts were to be accepted without being thoroughly scrutinised.
What tools do we have to prevent this in the future? What can editorial boards and publishers do?
As academics and those on the editorial boards, we are aware of several tools to flag potential plagiarism, such as Turnitin or iThenticate, which are already implemented by some scientific journals as part of standard screening procedures.10 However, manual human checks with topic experts should continue to be enforced.
We can expect vendors will include new novel software for the detection of AI-generated text in their software offerings in the future. As mentioned above, the author’s ability to change the real percentage for essay 1 from 0.02% to 99.52% using paraphrasing-AI software is concerning. As part of the editorial review process and standard checks, an additional checkbox could remind editors to consider whether the text could potentially have been generated by a third-party tool rather than by the authors.
In current authorship guidelines, which many journals and publishers adhere to, AI text generation is implicitly excluded.11 Explicit bans on the use of AI text generation tools, thus potentially opening the door to future retraction of papers generated in this way, may become integrated into journal authorship guidelines as it has in Springer Nature.12 One might expect similar moves from other publishers and editorial boards shortly. Alternatively, there may be arguments for simple transparency in reporting using AI-based text generation tools.
One option to consider is to put articles behind a ‘free paywall’ and login so that AI cannot scrape the articles. This is a move that some publishers are considering and a move that academic organisations may consider; although this may seem contrary to open science principles.13
Conclusions
Natural language model-based AIs, such as ChatGPT, are tools to watch for generating natural conversational text for various manuscript contents in SEM. However, ethical, equity, accuracy and detection concerns associated with their use are potential threats to scientific integrity. Although these papers would be rejected at BMJ Open Sport & Exercise Medicine (BOSEM) and any BMJ journal due to the falsified references alone, we still need to be aware of this threat to scientific integrity and protect our intellectual property in the field of SEM. BOSEM, scientific publishing companies and academic organisations need to be aware of this threat and may need to implement novel protection methods in the future.
Ethics statements
Patient consent for publication
Supplementary materials
Supplementary Data
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Footnotes
Twitter @Sportmednews, @belavyprof, @Sharief_H, @LucaHespanhol, @evertverhagen, @DptAamir
Contributors NA, DLB, SMP and EV drafted the first version of this editorial. SH, ARM and LH reviewed and revised the draft. All authors approved the final version for publication.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests NA and DLB are senior editorial board members, SH, ARM and LH are associate editors, and EV is the editor-in-chief of BMJ Open Sports & Exercise Medicine.
Provenance and peer review Commissioned; internally peer reviewed.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.