This function takes an input file of various types and extracts textual information from it, which can be further processed. This API is the central heart that allows a wide variety of documents to be uploaded and decoded.
Use this API if you wish to decode and extract information from various document types, such as images, spreadsheets or PDF documents
Typically, this API is called by other functions to assist them. It can be called directly but the output it generates is designed to be consumed and further refined. It may also take a noticeable time to process. Depending on what type of file is loaded.
Related API: gds.documentgen.decode.file Analyses the file type (PDF, DOCX, PNG ...) but does not extract any content
Try it now
<form method="POST" action="/gnap/M/buck" target='xxx' enctype="multipart/form-data"> <input type='hidden' name='f3_s' value='gds.documentgen.extract.file'> Input File <input type=file name='f110_s'> <input type='submit' value='Execute'> </form>
Select a file, such as Excel (xlsx), PDF, PNG, TIFF, JPG, BMP.... and the API will extract and decode it. Results will open in a new window
Input Specification
This API can decode the following filetypes:
xlsx, csv, images (png, tiff, jpeg, jfif, bmp, tga, gif ....)
eml, txt, pdf
Output Specification
The output of this API is a structure describing both the document and optionally its decoded meaning.
<DATS> <DecodeMethods>1</DecodeMethods> <DATS>...</DATS> - Array of FIFB blocks <DOCU>...</DOCU> - Document fields <DOCC>...</DOCC> - Cleaned document fields <GRID>...</GRID> - Layout information </DATS>
Top Level Fields
Field# | Name | Description | Example(s) |
f120 | DecodeMethods | A bitmask of which techniques were used to extract the information from the document.
|
This function allows you to upload an Excel spreadsheet and the system will decode the contents and return a data packet with the contents. Excel is an advanced product and the representation of the spreadsheet will not contain all attributes and abilities; this function is primarily about extracting tables of information.
gds.documentgen.extract.spreadsheetgds.documentgen.extract.file
This function allows you to upload a complete file and have the system attempt to decode it and the contents but without updating any part of the system. You can upload XLSX, JPEG, PNG, TIFF, GIF files. If the document is an image, the system automatically calls OCR routines to read the document first.
The invoice at the left shows an invoice that has been photographed and highlights the fields that the gds.documentgen.extract.file will attempt to decode. For clarity, only some fields are shown.
Call Arguments
f110 | Actual contents of file. |
f112 | Request Processing Options |
f113 | Expected type of document. Setting this directs the system towards what type of document is expected. |
f114 | PhysKey refering to the document |
Return Data
Several arrays of information are returned about the document, but these are limited to what can be gleaned without reference to your data. Essentially this function is reading the document but not applying any context awareness such as locating product names. If you wish to have context awareness call the function retailmax.elink.utility.document.decode which internally calls this function and then applies analysis to further decode the document
DOCU & DOCC Structure
This structure defines an document in terms of items found on the page, such as "date" or "invoce number". The DOCU structure is identical to the DOCC, the only difference is the DOCU contains raw data and the DOCC contains data that has been cleaned to what a reasonable person might do. For example, DOCU will report a number as 17.BB, while the DOCC might convert this to 17.88 DOCC may also use information from other scanned documents to complete its information. For example, a GST number that scans as "88-45Be~$y" may be presented as "88-458-126" in the DOCC
Field# | Name | Description | Example(s) |
f100 | Holds Physkey when this record is stored in a database file | ||
f101 | Datetime this document was created | ||
f108 | DocTypeCode | Number indicating type of document we best believe this is. 100=Invoice 101=Order Confirmation 102=OCR Test Page 103=ASN | 100 |
f109 | DocTypeName | Text version of f108. | Invoice |
f110 | InvoiceNumber | The reference number of an invoice | 0041 |
f111 | Date | The date on this document | 24/05/14 |
f112 | TaxIdNumber | GST registered number. This is supplied without formatting characters, but you should be prepared for this rule to not be honoured. | 27797318 |
f119 | OCR Quality score. A value from 0 to 100% indicating how well we think the OCR process worked. | ||
f120 | GrandTotal | The grand total value of an invoice | |
f121 | Tax1 | GST total | |
f122 | SubTotal | Sub total | |
f123 | Freight charge amount | ||
f130 | The primary email on this document. This is not designed to extract any random email but rather to identify what appears to be the authors email on this document. | ||
f131 | Telephone number | ||
f132 | Fax number | ||
f133 | Issuer | Issuing Party Name, eg the name of the supplier on an invoice | |
f134 | PurchaseOrderNumber | Purchase order number. | |
f135 | Comments or remarks found on the document. | ||
f136 | "Our Reference". Some documents have an our reference field in addition to invoice number etc, this field contains that additional "our reference" value. | ||
f137 | Account Number | ||
f142 | Website | ||
f143 | Mobile | ||
f144 | Contact Name | ||
f145 | Address of document creator | ||
f160 | DeliveryAddress | Address document sent too (street or postal, not email). Contains "ship-to" address if both ship to and bill to addresses are specified. If only one address is present, this field contains the addres | |
f161 | Bill to address, but only loaded if f160 also has a value. If only one address is specified it is stored in f160. | ||
f300 | Gst Rate. | 15 | |
f301 | Gst Number | 123-454-6789 | |
f302 | Due date for invoice | ||
f310 | Bank Account Number | ||
f311 | Bank Name | ||
f312 | Bank Branch | ||
f313 | Payment instructions text if special instructions were present on document | Please quote ABC123 on payment | |
f350 | MAF packhouse id (New Zealand) | PH531 | |
LINE Structure
This packet is a subtype of DOCU and DOCC. It holds the repeating lines on documents such as invoices and packing slips
Field# | Name | Description | Example |
f200 | Pid | Product Id in POS if already known. | |
f204 | ActualQtyUnits | Actual Quantity, ideally in units if possible | |
f240 | EachPrice | Each Price (price of single qty) excl tax | |
f241 | Discount amount in money | ||
f242 | Discount percentage | ||
f243 | RawNetPrice | Raw Net price | |
f244 | raw Line total | ||
f245 | Sale discount amount per each | ||
f246 | Sale discount amount per line total | ||
f247 | Final net line total | ||
f260 | Each Price (price of single qty) including tax | ||
f261 | Discount amount in money | ||
f262 | Discount percentage | ||
f263 | Raw Net price | ||
f264 | raw Line total | ||
f265 | Sale discount amount per each | ||
f266 | Sale discount amount per line total | ||
f267 | Final net line total | ||
f300 | SupplierPartCode | Suppliers partcode | |
f303 | Item Barcode. This is a retail level barcode not a trade unit | ||
f304 | OrderQtyUnits | Order quantity in units, not outers | |
f306 | Outer packing type in words, such as "CTN6" or "Carton 6". This text is not standardised and can change from supplier to supplier. | ||
f308 | Item Name in suppliers terms | ||
f309 | Return date. Some invoices specify a return date per line item. This is the date at which unsold items should be returned | 12-feb-2015 | |
f420 | RRP | RRP | |
f1830 | SupPartPhyskey | Physkey for suppliers partcode. |
AGNT Structure
This packet is a subtype of DOCU and is added by agent programs that created or altered this DOCU structure. The AGNT block allows program to see which programs and versions supplied information.
Field# | Name | Description | Example |
f110 | Name | Short and friendly name of the Agent. | Suppliers_NZ |
f111 | BuildDate | Date and Time the code was built. Ideally this should automatically inserted using preprocessor macros __DATE__ " " __TIME__if these exist in the language used by the agent. | 23-mar-2015 11:09:34 |
f112 | Version | Single increasing number containing the version of code. This must be a single number, not 10.3 style version numbering. | 1283 |