org.opencms.search.extractors
Class CmsExtractorMsWord
java.lang.Object
org.opencms.search.extractors.A_CmsTextExtractor
org.opencms.search.extractors.A_CmsTextExtractorMsOfficeBase
org.opencms.search.extractors.CmsExtractorMsWord
- All Implemented Interfaces:
- POIFSReaderListener, I_CmsTextExtractor
public final class CmsExtractorMsWord
- extends A_CmsTextExtractorMsOfficeBase
Extracts the text form an MS Word document.
- Since:
- 6.0.0
- Version:
- $Revision: 1.8 $
- Author:
- Alexander Kandzior
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
getExtractor
public static I_CmsTextExtractor getExtractor()
- Returns an instance of this text extractor.
- Returns:
- an instance of this text extractor
extractText
public I_CmsExtractionResult extractText(InputStream in,
String encoding)
throws Exception
- Description copied from interface:
I_CmsTextExtractor
- Extracts the text and meta information from the document on the input stream, using the specified content encoding.
The encoding is a hint for the text extractor, if the value given is null then
the text extractor should try to figure out the encoding itself.
- Specified by:
extractText in interface I_CmsTextExtractor- Overrides:
extractText in class A_CmsTextExtractor
- Parameters:
in - the input stream for the document to extract the text fromencoding - the encoding to use
- Returns:
- the extracted text and meta information
- Throws:
Exception - if the text extration fails- See Also:
I_CmsTextExtractor.extractText(java.io.InputStream, java.lang.String)