Package backend.logic

Klasse OCRService

java.lang.Object
backend.logic.OCRService

public class OCRService extends Object
Service class for performing Optical Character Recognition (OCR) on PDF and image files. Extracts raw text, invoice amount, invoice date, and invoice category.
  • Konstruktordetails

    • OCRService

      public OCRService()
      Initializes the Tesseract OCR engine with the specified data path and languages.
  • Methodendetails

    • extractText

      public String extractText(File file) throws IOException, net.sourceforge.tess4j.TesseractException
      Extracts the text content from a PDF or image file using OCR.
      Parameter:
      file - the file to process
      Gibt zurück:
      the extracted text
      Löst aus:
      IOException - if the file cannot be read
      net.sourceforge.tess4j.TesseractException - if OCR fails
    • extractData

      public Invoice extractData(File file) throws net.sourceforge.tess4j.TesseractException, IOException
      Extracts invoice data (text, amount, date, category) from a file.
      Parameter:
      file - the file to process
      Gibt zurück:
      the extracted invoice object
      Löst aus:
      IOException - if file reading fails
      net.sourceforge.tess4j.TesseractException - if OCR fails
    • getTesseract

      public net.sourceforge.tess4j.ITesseract getTesseract()