AI has gone through many discoveries and trends throughout its history. One recent realization has been transformer models, which have stood the test of time as a standard building-block on which to build upon domain specific models. This has enabled and commoditized the advanced pre-trained models we see today.
The next logical step is to improve access. Previously, you had to spend a full day getting Tensorflow to work, or settle with primitive models on scikit-learn.
With apps like Midjourney, PlaygroundAI, and Craiyon, DALL-E has sparked a revolution with web and mobile applications that can be accessed by regular people.
Generative Pre-trained Transformers (GPT, which I mentioned in my post on text generation) has now reached that stage with OpenAI’s release of ChatGPT and Microsoft’s Copilot. This post will demonstrate how ChatGPT can be used in a realistic application.

Using ChatGPT
I started out by selecting articles from Wikipedia’s list of 50,000 vital articles (out of 6.5 million articles at the time of writing). I prioritized articles with a high view-to-size ratio.
Generation tips
- Explain the task: I started out by saying “Expand upon the following text”
- Give context to the task: I then say “from a technical article about ‘Article Title'”
- Add qualifications: I found it was better to add things like “Be specific, include historical context and technical examples”. Otherwise it read like some kind of elementary school report.
- Provide the original paragraph: I found that providing the full original context really helped give the model specific context.
Limonene
Towards the top of my list was Limonene, an extract of citrus fruits. I wrangled with this example for a while, and it eventually mentioned Perillyl Alcohol, which can be made by modifying Limonene. I checked the Perillyl Alcohol article, and it turned out to be true, so I made a copy edit. Unfortunately, this edit was reverted due to an old discussion on the talk page.

Scallion
ChatGPT mentioned that scallions, ginger, and garlic are the “holy trinity” of Chinese cooking, which I remembered having hearing before, so I added it:

Stepchild
The article for “stepchild” was depressingly small. It certainly offered no comfort to adopted children looking for explanations. I was able to add some information on more sociological impacts and associations.

Sweetbread
Sweetbread is disgusting, but it was towards the top of my list. I was able to provide examples of sweetbreads:

Action fiction
I was able to give additional context on the literary genre, “Action fiction”.

Automated Editing
The next logical step is to automate the editing process. I could have automated making requests to ChatGPT using Selenium, but violating OpenAI’s terms is not something I would fess up to 😉
Setup
Load original Wikitext:
original_text = open("wikitext_paragraph.txt").read()
Extract and remove references:
ref_pattern = "(<[ ]*ref.*?/(?:ref)?[ ]*>)"
references = re.findall(ref_pattern, original_text)
input_text = re.sub(ref_pattern, "", original_text)
Extract and remove links:
link_map = []
link_pattern = "\[[ ]*\[[ ]*([^|\]]*\|)?[ ]*([^\]]+)[ ]*\][ ]*\]"
for article, alias in re.findall(link_pattern,input_text):
if article:
link_map.append((alias, "[[{0}{1}]]".format(article, alias)))
else:
link_map.append((alias, "[[{0}]]".format(alias)))
input_text = re.sub(r"\[\[(?:[^|\]]*\|)?([^\]]+)\]\]", r"\1", input_text)
Generate prompt:
with open("chatgpt_prompt.txt", "w+") as f:
prompt = "Write the following text in a more concise way. "
if link_map:
prompt += "Be sure to mention:"
for i,link in enumerate(link_map):
if i == len(link_map)-1 and len(link_map) > 1:
prompt += " and"
prompt += " \"{0}\"".format(link[0])
prompt = "{0}:\n\n{1}".format(prompt, input_text)
f.write(prompt)
Prompt user to get ChatGPT output:
print("Paste contents of 'chatgpt_prompt.txt' into ChatGPT, and then paste the output into 'chatgpt_output.txt'")
input("Hit enter to continue:")
Load ChatGPT output:
chatgpt_output = open("chatgpt_output.txt").read()
Define a function to re-insert removed references:
def reinsert_text(original_text, new_text, removed_text):
candidates = {}
for tag, i1, i2, j1, j2 in SequenceMatcher(None, original_text, new_text).get_opcodes():
if tag == "delete":
match_size = SequenceMatcher(None, removed_text, original_text[i1:i2]).find_longest_match(0,len(removed_text),0,i2-i1).size
candidates[j1] = match_size
if candidates:
i = sorted(candidates.items(), key=lambda x: x[1],reverse=True)[0][0]
else:
i = len(new_text)+1
return new_text[:i]+removed_text+new_text[i:]
Convert ChatGPT output back into Wikitext:
for link_text, link in link_map:
if link_text in chatgpt_output:
chatgpt_output = chatgpt_output.replace(link_text, link, 1)
else:
print("Could not link '{0}'. Skipping...".format(link_text))
for i,reference in enumerate(references):
chatgpt_output = reinsert_text(original_text, chatgpt_output, reference)
with open("wikitext_output.txt", "w+") as f:
f.write(chatgpt_output)
New task: Shortening articles
Wikipedia has a set of pages called “Special pages“. These are meant for Wikipedia maintainers, and the maintenance pages can help them find problem articles.

Long pages is one of my favorite of the special pages. It shows the largest articles on Wikipedia, and consists mostly of listicles. The list articles can be very difficult to shorten, but there are often non-list articles available.

Presidency of Rotrigo Dueterte
This was the first article within Long Pages that was not a list article. I used the script that we set up earlier in order to summarize the article in a semi-automated fashion. I was not able to just copy paste into the article, because the re-insertion of references was not perfect.
It was able to summarize and combine points from the paragraphs that I fed it very well. Overall, I was able to shorten the length of the article by about 10,000 characters.

After summarizing ~1/3 of the article, it became clear to me that the article did not have a neutral point of view. It shed Rodrigo Duterte in a very positive light. One of the primary editors of the article thanked me for my edits, and I stopped there.

Conclusion
ChatGPT was far from perfect, and I found myself having to fine tune the output a good amount before it was suitable for adding to articles. Although it struggled to generate accurate new information, it was very good at making text more concise.
As these models are trained on text, including Wikipedia, this step completes a loop of positive reinforcement that will snowball into even better models in the future.