Class NewsJob

java.lang.Object
  extended by CollectJob
      extended by NewsJob
All Implemented Interfaces:
org.quartz.Job
Direct Known Subclasses:
NewsJobAP, NewsJobBW, NewsJobCNN, NewsJobNYT, NewsJobReuters, NewsJobYahoo

public abstract class NewsJob
extends CollectJob

Basic news collection job, implementing basic configuration handling and collection routines.

Author:
fhogenboom

Field Summary
private static org.slf4j.Logger _log
           
private static String cur_date
           
protected  String file_ext
           
protected  String file_path
           
protected  Integer max_hash
           
protected  Integer max_retry
           
protected  Integer max_title
           
protected  LinkedList<Integer> mhm
           
protected  Integer mhm_size
           
private static LinkedList<String> mtm
           
private static int mtm_size
           
protected  Integer num_msgs
           
private static int num_out
           
private static int num_parts
           
protected  String rss_name
           
protected  String rss_src
           
protected  String txt_cls
           
protected  Integer zip_buffer
           
protected  Integer zip_files
           
private static boolean zipping
           
 
Fields inherited from class CollectJob
date, form_date, form_hday, form_time, hday, hour_stop, hour_strt, mnte_stop, mnte_strt, scnd_stop, scnd_strt, time, time_zone
 
Constructor Summary
NewsJob()
          Constructor.
 
Method Summary
private  void addToMemory(int hash)
          Add a given hash code to the memory of an individual NewsJob component.
private  void addToMemory(String title)
          Add a given title to the memory of all NewsJob components.
protected  void collect()
          Collects data from a specific source.
protected  boolean configureLocal(String job_key)
          Configures the NewsJob component by parsing a configuration file for local component elements.
private  String getAuthor()
          Retrieves the author of an RSS entry.
private  String getAuthor(com.sun.syndication.feed.synd.SyndEntry entry)
          Retrieves the author of an RSS entry.
private  String getDate()
          Retrieves the publishing date of an RSS entry.
private  String getDate(com.sun.syndication.feed.synd.SyndEntry entry)
          Retrieves the publishing date of an RSS entry.
private  String getLink(com.sun.syndication.feed.synd.SyndEntry entry)
          Retrieves the link to the full text of an RSS entry.
protected abstract  org.slf4j.Logger getLogger()
          Retrieves the Logger of the component.
private  String getText(com.sun.syndication.feed.synd.SyndEntry entry)
          Retrieves the full text of an RSS entry.
private  String getTitle(com.sun.syndication.feed.synd.SyndEntry entry)
          Retrieves the title of an RSS entry.
private  boolean inMemory(int hash)
          Checks whether a given hash code is stored in the memory of an individual NewsJob component.
private  boolean inMemory(String title)
          Checks whether a given title is stored in the memory of all NewsJob components.
protected  boolean loadLocal(org.quartz.JobExecutionContext context)
          Configures the NewsJob component by loading configurations for local component elements stored in a JobDataMap, which is available during runtime of the component.
protected abstract  String parseHTML(String url)
          Parses the HTML page of a specified link into plain text.
private  void saveToFile(com.sun.syndication.feed.synd.SyndEntry entry)
          Parses and saves RSS entry to a file using an XML format.
protected  void storeLocal(org.quartz.JobExecutionContext context)
          Stores the configuration for local component elements of the NewsJob component in a JobDataMap, which is available during runtime of the component.
private  void zipFiles()
          Zips output files whenever there is a date swap or the maximum number of files allowed in a single zip file (specified in the configuration file) is exceeded.
 
Methods inherited from class CollectJob
execute
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

max_hash

protected Integer max_hash

max_retry

protected Integer max_retry

max_title

protected Integer max_title

mhm_size

protected Integer mhm_size

num_msgs

protected Integer num_msgs

zip_buffer

protected Integer zip_buffer

zip_files

protected Integer zip_files

mhm

protected LinkedList<Integer> mhm

file_path

protected String file_path

file_ext

protected String file_ext

rss_name

protected String rss_name

rss_src

protected String rss_src

txt_cls

protected String txt_cls

zipping

private static boolean zipping

mtm_size

private static int mtm_size

num_parts

private static int num_parts

num_out

private static int num_out

mtm

private static LinkedList<String> mtm

cur_date

private static String cur_date

_log

private static org.slf4j.Logger _log
Constructor Detail

NewsJob

public NewsJob()
Constructor.

Method Detail

configureLocal

protected boolean configureLocal(String job_key)
Configures the NewsJob component by parsing a configuration file for local component elements.

Specified by:
configureLocal in class CollectJob
Parameters:
job_key - ID of the component.
Returns:
Boolean value indicating whether parsed configuration is valid.

loadLocal

protected boolean loadLocal(org.quartz.JobExecutionContext context)
Configures the NewsJob component by loading configurations for local component elements stored in a JobDataMap, which is available during runtime of the component.

Specified by:
loadLocal in class CollectJob
Parameters:
context - Execution context of the component.
Returns:
Boolean value indicating whether loaded configuration is valid.

storeLocal

protected void storeLocal(org.quartz.JobExecutionContext context)
Stores the configuration for local component elements of the NewsJob component in a JobDataMap, which is available during runtime of the component.

Specified by:
storeLocal in class CollectJob
Parameters:
context - Execution context of the component.

collect

protected void collect()
Collects data from a specific source.

Specified by:
collect in class CollectJob

inMemory

private boolean inMemory(int hash)
Checks whether a given hash code is stored in the memory of an individual NewsJob component.

Parameters:
hash - Hash code of entry to be checked.
Returns:
Boolean value indicating whether hash is in own memory.

inMemory

private boolean inMemory(String title)
Checks whether a given title is stored in the memory of all NewsJob components.

Parameters:
title - Title of entry to be checked.
Returns:
Boolean value indicating whether title is in shared memory.

addToMemory

private void addToMemory(int hash)
Add a given hash code to the memory of an individual NewsJob component.

Parameters:
hash - Hash code of entry to be added.

addToMemory

private void addToMemory(String title)
Add a given title to the memory of all NewsJob components.

Parameters:
title - Title of entry to be added.

saveToFile

private void saveToFile(com.sun.syndication.feed.synd.SyndEntry entry)
Parses and saves RSS entry to a file using an XML format. File path and extension are specified in the configuration file. File name is constructed as date rss_name num_msgs, where date and num_msgs represent the current date and number of messages of an individual NewsJob component, and rss_name represents the name of an RSS feed, specified in the configuration file.

Parameters:
entry - RSS entry to be parsed and saved to file.

zipFiles

private void zipFiles()
Zips output files whenever there is a date swap or the maximum number of files allowed in a single zip file (specified in the configuration file) is exceeded. File path and extension of the files to be zipped are specified in the configuration file, as well as the buffer used for zipping. Zip files are stored in the same directory as the files they contain. After completion, target files are removed. File name is constructed as cur_date- num_parts, where cur_date represents the current date (which can be yesterday's date in case of a date swap) and where num_parts represents the number number of zip files that have been created by all NewsJob components.


getAuthor

private String getAuthor()
Retrieves the author of an RSS entry.

Returns:
String value representing the name of an RSS feed.

getAuthor

private String getAuthor(com.sun.syndication.feed.synd.SyndEntry entry)
Retrieves the author of an RSS entry.

Parameters:
entry - RSS entry to be parsed.
Returns:
String value representing the author of an RSS entry.

getDate

private String getDate()
Retrieves the publishing date of an RSS entry.

Returns:
String value representing the current date and time.

getDate

private String getDate(com.sun.syndication.feed.synd.SyndEntry entry)
Retrieves the publishing date of an RSS entry.

Parameters:
entry - RSS entry to be parsed.
Returns:
String value representing the publishing date of an RSS entry.

getLink

private String getLink(com.sun.syndication.feed.synd.SyndEntry entry)
Retrieves the link to the full text of an RSS entry.

Parameters:
entry - RSS entry to be parsed.
Returns:
String value representing the link to the full text of an RSS entry.

getText

private String getText(com.sun.syndication.feed.synd.SyndEntry entry)
Retrieves the full text of an RSS entry.

Parameters:
entry - RSS entry to be parsed.
Returns:
String value representing the full text of an RSS entry.

getTitle

private String getTitle(com.sun.syndication.feed.synd.SyndEntry entry)
Retrieves the title of an RSS entry.

Parameters:
entry - RSS entry to be parsed.
Returns:
String value representing the title of an RSS entry.

getLogger

protected abstract org.slf4j.Logger getLogger()
Retrieves the Logger of the component.

Specified by:
getLogger in class CollectJob
Returns:
Logger used for logging system output.

parseHTML

protected abstract String parseHTML(String url)
Parses the HTML page of a specified link into plain text.

Parameters:
url - Link to be parsed.
Returns:
String value representing the full text.