This article has been updated to reflect changes suggested by one of our readers. Thanks Andrew! Full code is at the bottom of the post.
New information, released in an instant, can synchronize the behavior of millions of people. This is magnified by the impact of the information, and the transience of its release.
Wikipedia editors edit related pages during such events, reducing the amount of time between edits, or increasing the edit rate.
Celebrity deaths
On June 25, 2009, the news of Michael Jackson’s death caused Google, Twitter, Wikipedia, and other websites to go offline.

The estimated intensity of a Hawkes process fit to Michael Jackson edit events.

Here is a zoomed-in view of the event. The edits started coming in about 3 hours after he was pronounced dead.
This is the first edit to mention Michael Jackson having medical issues. It cited a news article posted 61 minutes before the Los Angeles Times confirmed his death.

Here is where the edit happens in the list. Note that prior edits were semi-daily. After the event, they all happen within the same hour.

Running the script on a collection of celebrities who died in 2014 has a similar result. There is a notable lack of events after death.
Dates and holidays

For year pages, there is a peak during New Year’s and a smaller one at year-end.

This works for other scheduled events as well.
Terrorism and crime
Mass shootings typically have their own Wikipedia pages within 150 minutes from the start of the event, decreasing every year.
Year | Event | Article Creation |
---|---|---|
2017 | Las Vegas shooting | 84 minutes |
2017 | Sutherland Springs church shooting | 158 minutes |
2016 | Orlando nightclub shooting | 170 minutes |
2012 | Sandy Hook Elementary School shooting | 210 minutes |
2007 | Virginia Tech shooting | 241 minutes |

Conclusion
In conjunction with Wikipedia’s detailed hourly pageviews, Wikipedia article edit rates yield high-quality, live information.
Full Code
import os,time import pandas as pd import numpy as np import scipy.signal as signal import matplotlib.pyplot as plt from tick.hawkes import SimuHawkesSumExpKernels, HawkesSumExpKern import requests session = requests.Session() def fetch_revisions(title): file_name = './revisions/{0}_revisions.csv'.format(title.replace(' ','_')) if not os.path.isfile(file_name): parameters = { 'action': 'query', 'format': 'json', 'continue': '', 'titles': title, 'prop': 'revisions', 'rvprop': 'ids|flags|timestamp|user|userid|size|comment', 'rvlimit': 'max' } wp_call = requests.get('http://en.wikipedia.org/w/api.php', params=parameters) response = wp_call.json() print('Fetching revisions for {0}...'.format(title)) revisions = [] while True: print('tFetched {0} revisions'.format(len(revisions))) wp_call = session.get('http://en.wikipedia.org/w/api.php', params=parameters) response = wp_call.json() for page_id in response['query']['pages']: revisions += response['query']['pages'][page_id]['revisions'] if 'continue' in response: parameters['continue'] = response['continue']['continue'] parameters['rvcontinue'] = response['continue']['rvcontinue'] else: break time.sleep(3) # Format revisions as pandas dataframe revisions = pd.DataFrame(revisions).set_index('timestamp') revisions.index = pd.to_datetime(revisions.index) revisions = revisions.sort_index() if 'anon' in revisions.columns: revisions['anon'] = ~revisions['anon'].astype(bool) if 'minor' in revisions.columns: revisions['minor'] = ~revisions['minor'].astype(bool) # Remove revisions if the size is the same before and after edit (remove edit-wars) revisions = revisions.loc[revisions['size'].shift(1)!=revisions['size'].shift(-1)] # Remove minor revisions if 'minor' in revisions.columns: revisions = revisions.loc[~revisions['minor']] # Remove edits by bots revisions = revisions.loc[~revisions['user'].str.contains('(Bot|bot|BOT)([^A-Za-z]|$)',na=False)] # Group revisions by same user. Date is of first revision & size is of last revision revisions.loc[revisions.user != revisions['user'].shift(1),'size'] = revisions.loc[revisions['user'] != revisions['user'].shift(-1)]['size'].values revisions = revisions.loc[revisions['user'] != revisions['user'].shift(1)] # Remove edits that don't change >=10 bytes revisions = revisions.loc[revisions['size'].diff().fillna(np.inf).abs()>=10] # Remove "page blank" edits revisions = revisions.loc[revisions['size'].pct_change().fillna(0).abs()<=0.9] revisions.to_csv(file_name) else: revisions = pd.read_csv(file_name,index_col=0,parse_dates=True) return revisions def estimate_intensity(event_timestamps,event_titles): start_time = min([min(timestamp) for timestamp in event_timestamps]) end_time = max([max(timestamp) for timestamp in event_timestamps]) event_timestamps = [(timestamps-start_time).astype(int)/86400e9 for timestamps in event_timestamps] decays = [0.02,0.01,0.5] learner = HawkesSumExpKern(decays, penalty='elasticnet', elastic_net_ratio=0.8) learner.fit(event_timestamps) tracked_intensity, intensity_tracked_times = learner.estimated_intensity(event_timestamps,intensity_track_step=1/24.) estimated_intensity_index = start_time + pd.to_timedelta(intensity_tracked_times,unit='D') estimated_intensity = pd.DataFrame(np.vstack(tracked_intensity).T,index=estimated_intensity_index,columns=event_titles) return estimated_intensity def main(): titles = ['Michael Jackson'] event_timestamps = [] for title in titles: event_timestamps.append(fetch_revisions(title=title).index.values) estimated_intensity = estimate_intensity(event_timestamps,titles).resample('1H').mean() print('Filtering...') order = 2 Wn = 0.001 B,A = signal.butter(order, Wn, output='ba') for column in estimated_intensity.columns: estimated_intensity[column] -= signal.filtfilt(B, A, estimated_intensity[column].values) print('Plotting...') ax = estimated_intensity.plot(linewidth=0.7)#,logy=True) ax.set_ylim([0,100]) plt.savefig('output.png',ppi=300) if __name__ == '__main__': main()
Hi Nathan,
Very good article! Organized and easy to understand.
But when implementing, I met the error: Wrong number or type of arguments for overloaded function ‘new_ModelHawkesSumExpKernLeastSq’, which seemed to be a version problem. Do you know how to fix it? Thank you.
Looking forward to your reply,
Emily
Hi Emily,
Can you provide the Python version, tick version, and OS? Will look into this shortly.
Thanks,
Nathan