A Python Script for Recursive Search-And-Replace
I wrote a little Python script to solve a find and replace problem:
The problem was that I had a directory tree with several thousand files, about 2000 of which were static HTML (yeah, don't ask....I blame my predecessor for this), with the typical Google Analytics tracking code, e.g.:
var _gaq = _gaq || ;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script'); s.parentNode.insertBefore(ga, s);
Now my client started worrying about the privacy of their users, and asked me to remove all these snippets. By the way, In Europe it will soon become illegal to use Google Analytics without asking visitors whether they allow tracking them.
I wrote a regex that would capture this multi-line snippet easily.
In addition, several HTML files also had event handlers calling the GA tracking script, e.g:
lations', 'Downloaded PDF', 'Crime and Punishment vol 2 (Japanese)']); ">
The following script works fine, and takes only a few seconds to traverse several thousand files.
Here is my script: I ran it in python 2.5, hence I didn't have the "with" statement available.
from os import walk
from os.path import join
from re import compile
modified = 
event = 
for root, dirs, files in walk('myfolder', onerror=handler):
for name in files:
path = join(root, name)
file = open(path, 'r+b')
all = file.read()
fixed = script_pattern.sub('', all)
fixed = event_pattern.sub('', fixed)
for i in modified: print i
for i in event: print i