IndiaTimes EPG grabber


The Times Of India group used to host a Indian EPG site at http://tvguide.indiatimes.com/. The Dabba1 grabber is based on  pytvgrab

PyTvGrab-lib is an XMLTV grabber library for Python.It extracts information from a webpage and outputs in the xmltv format (version 0.5.15). The library provides you with helpers to generate the xml, to work with date and time, an abstract grabber model, a powerful regular expression tool and an easy to customize html parser.

 I adopted the  pytvgrab to support EPG download from this site. The network download speed was comparable to dialup speeds and for  EPG download this was a big negative. So made the following changes to the pytvgrab implementation to deal with this.



CustomizedParser

To parse the india times web pages, added a RE expression to  ignore the <script> ....</script> sequence.
- Also added many RE expressions to ignore certain unicode characters.


Checkpointing the grab-data

Reasons for checkpointing
- power failure
- program crash
- stuck network and hence need to abort (really this is the issue with the pytvgrab library)

How to checkpoint ?

EPG has three levels of data:
  • main page - the updated .conf file is the way of check pointing
  • Program List - (p1,p2,p3,p4) - These files can be cached
  • Program description - [ 1,2,3......] - 
     Checkpoint by caching and saving the list of the program items

Favorite channels

Channel configuration was modified to indicate some channels are "favorite" and other are not.

- channel configuration:
y : want EPG but not a favorite
v : want EPG and is a favorite

For channels which are not favorite, no pgm items will be grabbed just the list of programs


Adding the option --parse

--parse xxx.html  : this can be used to do the html parsing of the downloaded or local files.  Useful for debugging


Adding the option --cachefile

--cachefile  URL
   First show the hashed cachefile for the URL and if it is present
   parse the cached file and display it.



Download dabbagrabber-0.1