This article has been updated to reflect changes suggested by one of our readers. Thanks Andrew! Full code is at the bottom of the post.
New information, released in an instant, can synchronize the behavior of millions of people. This is magnified by the impact of the information, and the transience of its release.
Wikipedia editors edit related pages during such events, reducing the amount of time between edits, or increasing the edit rate.
Celebrity deaths
On June 25, 2009, the news of Michael Jackson’s death caused Google, Twitter, Wikipedia, and other websites to go offline.

The estimated intensity of a Hawkes process fit to Michael Jackson edit events.

Here is a zoomed-in view of the event. The edits started coming in about 3 hours after he was pronounced dead.

This is the first edit to mention Michael Jackson having medical issues. It cited a news article posted 61 minutes before the Los Angeles Times confirmed his death.

Here is where the edit happens in the list. Note that prior edits were semi-daily. After the event, they all happen within the same hour.

Running the script on a collection of celebrities who died in 2014 has a similar result. There is a notable lack of events after death.
Dates and holidays

For year pages, there is a peak during New Year’s and a smaller one at year-end.

This works for other scheduled events as well.
Terrorism and crime
Mass shootings typically have their own Wikipedia pages within 150 minutes from the start of the event, decreasing every year.
Year | Event | Article Creation |
---|---|---|
2017 | Las Vegas shooting | 84 minutes |
2017 | Sutherland Springs church shooting | 158 minutes |
2016 | Orlando nightclub shooting | 170 minutes |
2012 | Sandy Hook Elementary School shooting | 210 minutes |
2007 | Virginia Tech shooting | 241 minutes |

Conclusion
In conjunction with Wikipedia’s detailed hourly pageviews, Wikipedia article edit rates yield high-quality, live information.
Full Code
import os,time
import pandas as pd
import numpy as np
import scipy.signal as signal
import matplotlib.pyplot as plt
from tick.hawkes import SimuHawkesSumExpKernels, HawkesSumExpKern
import requests
session = requests.Session()
def fetch_revisions(title):
file_name = './revisions/{0}_revisions.csv'.format(title.replace(' ','_'))
if not os.path.isfile(file_name):
parameters = {
'action': 'query',
'format': 'json',
'continue': '',
'titles': title,
'prop': 'revisions',
'rvprop': 'ids|flags|timestamp|user|userid|size|comment',
'rvlimit': 'max'
}
wp_call = requests.get('http://en.wikipedia.org/w/api.php', params=parameters)
response = wp_call.json()
print('Fetching revisions for {0}...'.format(title))
revisions = []
while True:
print('tFetched {0} revisions'.format(len(revisions)))
wp_call = session.get('http://en.wikipedia.org/w/api.php', params=parameters)
response = wp_call.json()
for page_id in response['query']['pages']:
revisions += response['query']['pages'][page_id]['revisions']
if 'continue' in response:
parameters['continue'] = response['continue']['continue']
parameters['rvcontinue'] = response['continue']['rvcontinue']
else:
break
time.sleep(3)
# Format revisions as pandas dataframe
revisions = pd.DataFrame(revisions).set_index('timestamp')
revisions.index = pd.to_datetime(revisions.index)
revisions = revisions.sort_index()
if 'anon' in revisions.columns:
revisions['anon'] = ~revisions['anon'].astype(bool)
if 'minor' in revisions.columns:
revisions['minor'] = ~revisions['minor'].astype(bool)
# Remove revisions if the size is the same before and after edit (remove edit-wars)
revisions = revisions.loc[revisions['size'].shift(1)!=revisions['size'].shift(-1)]
# Remove minor revisions
if 'minor' in revisions.columns:
revisions = revisions.loc[~revisions['minor']]
# Remove edits by bots
revisions = revisions.loc[~revisions['user'].str.contains('(Bot|bot|BOT)([^A-Za-z]|$)',na=False)]
# Group revisions by same user. Date is of first revision & size is of last revision
revisions.loc[revisions.user != revisions['user'].shift(1),'size'] = revisions.loc[revisions['user'] != revisions['user'].shift(-1)]['size'].values
revisions = revisions.loc[revisions['user'] != revisions['user'].shift(1)]
# Remove edits that don't change >=10 bytes
revisions = revisions.loc[revisions['size'].diff().fillna(np.inf).abs()>=10]
# Remove "page blank" edits
revisions = revisions.loc[revisions['size'].pct_change().fillna(0).abs()<=0.9]
revisions.to_csv(file_name)
else:
revisions = pd.read_csv(file_name,index_col=0,parse_dates=True)
return revisions
def estimate_intensity(event_timestamps,event_titles):
start_time = min([min(timestamp) for timestamp in event_timestamps])
end_time = max([max(timestamp) for timestamp in event_timestamps])
event_timestamps = [(timestamps-start_time).astype(int)/86400e9 for timestamps in event_timestamps]
decays = [0.02,0.01,0.5]
learner = HawkesSumExpKern(decays, penalty='elasticnet', elastic_net_ratio=0.8)
learner.fit(event_timestamps)
tracked_intensity, intensity_tracked_times = learner.estimated_intensity(event_timestamps,intensity_track_step=1/24.)
estimated_intensity_index = start_time + pd.to_timedelta(intensity_tracked_times,unit='D')
estimated_intensity = pd.DataFrame(np.vstack(tracked_intensity).T,index=estimated_intensity_index,columns=event_titles)
return estimated_intensity
def main():
titles = ['Michael Jackson']
event_timestamps = []
for title in titles:
event_timestamps.append(fetch_revisions(title=title).index.values)
estimated_intensity = estimate_intensity(event_timestamps,titles).resample('1H').mean()
print('Filtering...')
order = 2
Wn = 0.001
B,A = signal.butter(order, Wn, output='ba')
for column in estimated_intensity.columns:
estimated_intensity[column] -= signal.filtfilt(B, A, estimated_intensity[column].values)
print('Plotting...')
ax = estimated_intensity.plot(linewidth=0.7)#,logy=True)
ax.set_ylim([0,100])
plt.savefig('output.png',ppi=300)
if __name__ == '__main__':
main()
Hi Nathan,
Very good article! Organized and easy to understand.
But when implementing, I met the error: Wrong number or type of arguments for overloaded function ‘new_ModelHawkesSumExpKernLeastSq’, which seemed to be a version problem. Do you know how to fix it? Thank you.
Looking forward to your reply,
Emily
Hi Emily,
Can you provide the Python version, tick version, and OS? Will look into this shortly.
Thanks,
Nathan