|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.ObjectCollectJob
NewsJob
public abstract class NewsJob
Basic news collection job, implementing basic configuration handling and collection routines.
Field Summary | |
---|---|
private static org.slf4j.Logger |
_log
|
private static String |
cur_date
|
protected String |
file_ext
|
protected String |
file_path
|
protected Integer |
max_hash
|
protected Integer |
max_retry
|
protected Integer |
max_title
|
protected LinkedList<Integer> |
mhm
|
protected Integer |
mhm_size
|
private static LinkedList<String> |
mtm
|
private static int |
mtm_size
|
protected Integer |
num_msgs
|
private static int |
num_out
|
private static int |
num_parts
|
protected String |
rss_name
|
protected String |
rss_src
|
protected String |
txt_cls
|
protected Integer |
zip_buffer
|
protected Integer |
zip_files
|
private static boolean |
zipping
|
Fields inherited from class CollectJob |
---|
date, form_date, form_hday, form_time, hday, hour_stop, hour_strt, mnte_stop, mnte_strt, scnd_stop, scnd_strt, time, time_zone |
Constructor Summary | |
---|---|
NewsJob()
Constructor. |
Method Summary | |
---|---|
private void |
addToMemory(int hash)
Add a given hash code to the memory of an individual NewsJob component. |
private void |
addToMemory(String title)
Add a given title to the memory of all NewsJob components. |
protected void |
collect()
Collects data from a specific source. |
protected boolean |
configureLocal(String job_key)
Configures the NewsJob component by parsing a configuration file for local component
elements. |
private String |
getAuthor()
Retrieves the author of an RSS entry. |
private String |
getAuthor(com.sun.syndication.feed.synd.SyndEntry entry)
Retrieves the author of an RSS entry. |
private String |
getDate()
Retrieves the publishing date of an RSS entry. |
private String |
getDate(com.sun.syndication.feed.synd.SyndEntry entry)
Retrieves the publishing date of an RSS entry. |
private String |
getLink(com.sun.syndication.feed.synd.SyndEntry entry)
Retrieves the link to the full text of an RSS entry. |
protected abstract org.slf4j.Logger |
getLogger()
Retrieves the Logger of the component. |
private String |
getText(com.sun.syndication.feed.synd.SyndEntry entry)
Retrieves the full text of an RSS entry. |
private String |
getTitle(com.sun.syndication.feed.synd.SyndEntry entry)
Retrieves the title of an RSS entry. |
private boolean |
inMemory(int hash)
Checks whether a given hash code is stored in the memory of an individual NewsJob
component. |
private boolean |
inMemory(String title)
Checks whether a given title is stored in the memory of all NewsJob components. |
protected boolean |
loadLocal(org.quartz.JobExecutionContext context)
Configures the NewsJob component by loading configurations for local component
elements stored in a JobDataMap , which is available during runtime of the
component. |
protected abstract String |
parseHTML(String url)
Parses the HTML page of a specified link into plain text. |
private void |
saveToFile(com.sun.syndication.feed.synd.SyndEntry entry)
Parses and saves RSS entry to a file using an XML format. |
protected void |
storeLocal(org.quartz.JobExecutionContext context)
Stores the configuration for local component elements of the NewsJob component in a
JobDataMap , which is available during runtime of the component. |
private void |
zipFiles()
Zips output files whenever there is a date swap or the maximum number of files allowed in a single zip file (specified in the configuration file) is exceeded. |
Methods inherited from class CollectJob |
---|
execute |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected Integer max_hash
protected Integer max_retry
protected Integer max_title
protected Integer mhm_size
protected Integer num_msgs
protected Integer zip_buffer
protected Integer zip_files
protected LinkedList<Integer> mhm
protected String file_path
protected String file_ext
protected String rss_name
protected String rss_src
protected String txt_cls
private static boolean zipping
private static int mtm_size
private static int num_parts
private static int num_out
private static LinkedList<String> mtm
private static String cur_date
private static org.slf4j.Logger _log
Constructor Detail |
---|
public NewsJob()
Method Detail |
---|
protected boolean configureLocal(String job_key)
NewsJob
component by parsing a configuration file for local component
elements.
configureLocal
in class CollectJob
job_key
- ID of the component.
Boolean
value indicating whether parsed configuration is valid.protected boolean loadLocal(org.quartz.JobExecutionContext context)
NewsJob
component by loading configurations for local component
elements stored in a JobDataMap
, which is available during runtime of the
component.
loadLocal
in class CollectJob
context
- Execution context of the component.
Boolean
value indicating whether loaded configuration is valid.protected void storeLocal(org.quartz.JobExecutionContext context)
NewsJob
component in a
JobDataMap
, which is available during runtime of the component.
storeLocal
in class CollectJob
context
- Execution context of the component.protected void collect()
collect
in class CollectJob
private boolean inMemory(int hash)
NewsJob
component.
hash
- Hash code of entry to be checked.
Boolean
value indicating whether hash is in own memory.private boolean inMemory(String title)
NewsJob
components.
title
- Title of entry to be checked.
private void addToMemory(int hash)
NewsJob
component.
hash
- Hash code of entry to be added.private void addToMemory(String title)
NewsJob
components.
title
- Title of entry to be added.private void saveToFile(com.sun.syndication.feed.synd.SyndEntry entry)
date
rss_name
num_msgs
, where date
and num_msgs
represent the
current date and number of messages of an individual NewsJob
component, and
rss_name
represents the name of an RSS feed, specified in the configuration file.
entry
- RSS entry to be parsed and saved to file.private void zipFiles()
cur_date
-
num_parts
, where cur_date
represents the current date (which can be
yesterday's date in case of a date swap) and where num_parts
represents the number
number of zip files that have been created by all NewsJob
components.
private String getAuthor()
String
value representing the name of an RSS feed.private String getAuthor(com.sun.syndication.feed.synd.SyndEntry entry)
entry
- RSS entry to be parsed.
String
value representing the author of an RSS entry.private String getDate()
String
value representing the current date and time.private String getDate(com.sun.syndication.feed.synd.SyndEntry entry)
entry
- RSS entry to be parsed.
String
value representing the publishing date of an RSS entry.private String getLink(com.sun.syndication.feed.synd.SyndEntry entry)
entry
- RSS entry to be parsed.
String
value representing the link to the full text of an RSS entry.private String getText(com.sun.syndication.feed.synd.SyndEntry entry)
entry
- RSS entry to be parsed.
String
value representing the full text of an RSS entry.private String getTitle(com.sun.syndication.feed.synd.SyndEntry entry)
entry
- RSS entry to be parsed.
String
value representing the title of an RSS entry.protected abstract org.slf4j.Logger getLogger()
Logger
of the component.
getLogger
in class CollectJob
Logger
used for logging system output.protected abstract String parseHTML(String url)
url
- Link to be parsed.
String
value representing the full text.
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |