Structural and fatal errors in the HTML report files
Some HTML report files have structural and/or fatal errors that have to be fixed by editing or by creating a fixing script.
Items that required manual editing
Game Id
Game
Report
Fix
200120059
20011013 NJD @ MTL
ES
Remove invalid html tags
200120123
20011023 BOS @ TOR
GS
Remove "2" throughout the file
200520094
20051019 NYI @ NYR
GS
Insert missing period for 15:35 penalty
200520094
20051019 NYI @ NYR
PL
Edited string with 00:28 penalty
200520233
20051109 PIT @ WPG
GS
Add missing TBODY closing tag
200520264
20051113 EDM @ CHI
GS
Fixed bogus on ice
200520305
20051119 CHI @ EDM
GS
Insert missing period for 5:13 penalty
200620071
20061014 DAL @ LAK
PL
Aligned period for event #1
200620892
20070219 PIT @ NYI
GS
Add missing TBODY closing tag
201320331
20131121 NYR @ DAL
PL
Remove invalid tag at the 7:50 event
201720463
20171211 FLA @ DET
PL
Fixed time -16:0-1
In addition, the following scripting action was required:
Removal of nonbreaking space character chr(194)=Â
Handling of French texts and locale
In the mid-2000s the ES files had a corrupt html that prevented any reasonable parser to consume it. A crude fix is as following (I know, the regex for grep can be improved):
for i in `grep table /misc/nhl/2*/*/*/ES.html |
grep width=100% |grep -v 'width=100%>' |
cut -d: -f1 | sort -u`
do
sed -i s/width=100%/'width=100%>'/ $i
done