text parsing to xhtml fetching just certain bits of the xhtml custom content handlers extract phone numbers from content into the metadata streaming the plain text in chunks translation translation using the microsoft translation api language identification additional examples parsing tika provides...