Package backend.logic
Klasse OCRService
java.lang.Object
backend.logic.OCRService
Service class for performing Optical Character Recognition (OCR) on PDF and image files.
Extracts raw text, invoice amount, invoice date, and invoice category.
-
Konstruktorübersicht
KonstruktorenKonstruktorBeschreibungInitializes the Tesseract OCR engine with the specified data path and languages. -
Methodenübersicht
Modifizierer und TypMethodeBeschreibungextractData
(File file) Extracts invoice data (text, amount, date, category) from a file.extractText
(File file) Extracts the text content from a PDF or image file using OCR.net.sourceforge.tess4j.ITesseract
-
Konstruktordetails
-
OCRService
public OCRService()Initializes the Tesseract OCR engine with the specified data path and languages.
-
-
Methodendetails
-
extractText
Extracts the text content from a PDF or image file using OCR.- Parameter:
file
- the file to process- Gibt zurück:
- the extracted text
- Löst aus:
IOException
- if the file cannot be readnet.sourceforge.tess4j.TesseractException
- if OCR fails
-
extractData
Extracts invoice data (text, amount, date, category) from a file.- Parameter:
file
- the file to process- Gibt zurück:
- the extracted invoice object
- Löst aus:
IOException
- if file reading failsnet.sourceforge.tess4j.TesseractException
- if OCR fails
-
getTesseract
public net.sourceforge.tess4j.ITesseract getTesseract()
-