Importing files (XLSX, CSV, XML & JSON) to the database using Oracle APEX

Importing files (XLSX, CSV, XML & JSON) to the database using Oracle APEX

Introduction

In a recent project I was involved in, I was tasked with handling a data load that was to be performed by external system users. There are several interesting features in APEX and Oracle databases to assist with this type of requirement, such as the UTL_FILE package, the LOAD_DATA API, External tables, etc. Today, I will focus on a straightforward solution: the APEX_DATA_PARSER API.

Presentation of the APEX_DATA_PARSER

This API was introduced in version 19.1 and, in general, it simplifies the importation of files such as XLSX, CSV, JSON, and XML into the database. It also provides tools for data analysis from these files, having an embedded profile.

Data used in the demonstration:

Main utilities

  • Function discover: Returns the file profile, detailing its metadata and import/export settings.
select apex_data_parser.discover(
           p_content => f.blob_content,
           p_file_name=> f.file_name) as profile_json
from files f
where id = :id;

--example of a returned XLSX file:
/*

 "file-type":1,
   "file-encoding":"AL32UTF8",
   "headings-in-first-row":true,
   "xslx-worksheet":"sheet1.xml",
   "csv-enclosed":"\"",
   "force-trim-whitespace":true,
   "columns":[
      {
         "name":"NAME",
         "data-type":1,
         "data-type-len":100,
         "selector":"NAME",
         "is-json":false
      },
      {
         "name":"AGE",
         "data-type":2,
         "selector":"AGE",
         "is-json":false
      },
      {
         "name":"SECRET_IDENTITY",
         "data-type":1,
         "data-type-len":50,
         "selector":"SECRET_IDENTITY",
         "is-json":false
      }
   ],
   "parsed-rows":7
}
  • Functionget_columns: Returns the columns of a profile (previously discussed), very useful for validating if the file meets expectations.
select * from 
table(apex_data_parser.get_columns( apex_data_parser.discover(
                                    p_content => :BLOB_CONTENT,
                                    p_file_name=> :FILE_NAME) )
                                   );

FunctionGET_XLSX_WORKSHEETS: Returns the names and sequence of sheets in an XLSX file. Particularly useful when we need to navigate between sheets in an XLSX.


APEX_DATA_PARSER.PARSE

This is the main function of the API, returning a generic table that delivers the data from the file.
Basic example of use:

--Assuming I have a table with stored files:
--There are 300 varchar2 columns
--There are 10 CLOB columns
SELECT line_number, col001, col002, col003, col004, col005, col006, col007, col008
    FROM   
     files f,
    TABLE(
             APEX_DATA_PARSER.parse(
               p_content   => f.blob_content,
               p_file_name => f.file_name,
               p_skip_rows => 1,
               p_max_rows  => 500 --default is 200
             )
           )
    where f.file_name = :P8_FILE_NAME;

Query using only APEX resources:

Make sure the “file browser” page item is set to “Table” storage type:

--Query consuming only APEX resources
--Remember that: APEX_APPLICATION_TEMP_FILES is only updated upon submitting the page.
--The file browser item needs to be set with session-state = Per session (persistent)
SELECT line_number, col001, col002, col003, col004, col005, col006, col007, col008
    FROM   
     TABLE(
             APEX_DATA_PARSER.parse(
               p_content   => :PX_FILE, --file browser item, configured to record the file in the APEX_APPLICATION_TEMP_FILES 
               p_file_name => :PX_FILE_NAME,
               p_skip_rows => 1,
               p_max_rows  => 500 --default is 200
             )
           );

Practical Example

For this demonstration, I created a file import region above an employee report.

  • When selecting the file and clicking the “Import” button, we will consume employee data from the XLSX file, validating it and adding it to the employee table.

    File used:

    On the page, under the processes tab, I created a new process under the “processing” group:

Declare
    l_EXIST_EMPLOYEE integer default 0; 
    l_COUNT_RECORDS_BEFORE number;
    l_COUNT_RECORDS_AFTER  number;

    cursor l_cursor is
        select line_number line, col001, col002, col003, col004, col005, 
                    col006, col007, col008, col009, col010, col011, col012, col013, col014, col015, col016  
        from apex_application_temp_files f, 
           table( apex_data_parser.parse(
                      p_content                     => f.blob_content,
                      p_add_headers_row             => 'Y',
                      p_max_rows                    => 500,
                      p_skip_rows                   => 0,
                      p_store_profile_to_collection => 'PROFILE_COLLECTION', -- <OPTIONAL> Creates a collection to store the profile
                      p_file_name                   => f.filename ) ) p
        where f.name = :PX_FILE;

Begin

           /* perform all necessary validations */


  --Check how many records are currently in the table to later validate if the inserts were successful
    SELECT count(*) INTO l_COUNT_RECORDS_BEFORE FROM OEHR_EMPLOYEES;

    for x in l_cursor loop

    IF x.line > 1 and x.col001 is not null then --Checks if the file ended or has no data

    --Checks if the file contains mandatory information in the expected format
    IF X.col001 is null then
        RAISE_APPLICATION_ERROR(-20004,'The first name of the employee is null in line '||X.line);
    END IF;

   --Simple test to check for duplicates
    select count(*) into l_EXIST_EMPLOYEE from OEHR_EMPLOYEES 
    where upper(FIRST_NAME)  = upper(X.col001) 
    and upper(LAST_NAME) = upper(X.col002) 
    and upper(EMAIL) = upper(X.col003); 

    --Check if the employee exists in the database
    IF l_EXIST_EMPLOYEE > 0 then
    raise_application_error(-20003,'The employee '||X.col001||
                            ' with email '|| x.col003 || ' already exists. Line: '||X.line );
    END IF;

        INSERT INTO OEHR_EMPLOYEES
                (
                FIRST_NAME,
                LAST_NAME,   
                EMAIL,
                HIRE_DATE,
                JOB_ID

                )
                VALUES
                (
                x.col001,
                x.col002,
                x.col003,
                to_date(x.col004,'DD/MM/YYYY'),
                x.col005); 
       end if;
    end loop;

    SELECT count(*) INTO l_COUNT_RECORDS_AFTER FROM OEHR_EMPLOYEES;

    IF l_COUNT_RECORDS_AFTER = l_COUNT_RECORDS_BEFORE THEN
        RAISE_APPLICATION_ERROR(-20006,'Import failed. Please use the standard file model.');
    END IF;
End;

In this way, the data load is functioning as expected:

Antes de clicar no botão importar: Arquivo employees.xlsx já upado

Depois de clicar no botão importar: Os dados do arquivo employees.xlsx já foram corretamente importados

Using the APEX_DATA_PARSER with JSON and XML

JSON

File:


{
  "squadName": "Super hero squad",
  "homeTown": "Metro City",
  "formed": 2016,
  "secretBase": "Super tower",
  "active": true,
  "members": [
    {
      "name": "Molecule Man",
      "age": 29,
      "secretIdentity": "Dan Jukes",
      "powers": ["Radiation resistance", "Turning tiny", "Radiation blast"]
    },
    {
      "name": "Madame Uppercut",
      "age": 39,
      "secretIdentity": "Jane Wilson",
      "powers": [
        "Million tonne punch",
        "Damage resistance",
        "Superhuman reflexes"
      ]
    },
    {
      "name": "Eternal Flame",
      "age": 1000000,
      "secretIdentity": "Unknown",
      "powers": [
        "Immortality",
        "Heat Immunity",
        "Inferno",
        "Teleportation",
        "Interdimensional travel"
      ]
    }
  ],

  "squadName": "Super Villains squad",
  "homeTown": "Metro City",
  "formed": 2020,
  "secretBase": "Super Villain tower",
  "active": true,
  "members": [
    {
      "name": "Evil Man",
      "age": 29,
      "secretIdentity": "Evil Jukes",
      "powers": ["Radiation resistance", "Turning tiny", "Radiation blast"]
    },
    {
      "name": "Madame Lowercut",
      "age": 39,
      "secretIdentity": "Jane Wilson",
      "powers": [
        "Million tonne punch",
        "Damage resistance",
        "Superhuman reflexes"
      ]
    },
    {
    "name": "Eternal Evil Flame",
    "age": 1000000,
    "secretIdentity": "Known",
    "powers": [
        "Immortality",
        "Heat Immunity",
        "Inferno",
        "Teleportation",
        "Interdimensional travel"
      ]
    }
  ]

}

Parse of the query:

  • By default, only the first JSON array is consumed. To consume nested arrays, it's necessary to specify the path:
SELECT line_number, col001, col002, col003, col004, col005, col006, col007, col008
    FROM   
     apex_application_temp_files f,
    TABLE(
             APEX_DATA_PARSER.parse(
               p_content   => f.blob_content,
               p_file_name => f.filename,
               p_row_selector    => 'members', 
               p_add_headers_row => 'N'
             )
           )
    where f.filename = :PX_FILE_NAME

XML

File**:**

<squads>
    <squad>
        <squadName>Super hero squad</squadName>
        <homeTown>Metro City</homeTown>
        <formed>2016</formed>
        <secretBase>Super tower</secretBase>
        <active>true</active>
        <member>
            <name>Molecule Man</name>
            <age>29</age>
            <secretIdentity>Dan Jukes</secretIdentity>
            <powers>Radiation resistance; Turning tiny; Radiation blast</powers>
        </member>
        <member>
            <name>Madame Uppercut</name>
            <age>39</age>
            <secretIdentity>Jane Wilson</secretIdentity>
            <powers>Million tonne punch; Damage resistance; Superhuman reflexes</powers>
        </member>
    </squad>
    <squad>
        <squadName>Super Villains squad</squadName>
        <homeTown>Metro City</homeTown>
        <formed>2020</formed>
        <secretBase>Super Villain tower</secretBase>
        <active>true</active>
        <member>
            <name>Evil Man</name>
            <age>29</age>
            <secretIdentity>Evil Jukes</secretIdentity>
            <powers>Radiation resistance; Turning tiny; Radiation blast</powers>
        </member>
        <member>
            <name>Madame Lowercut</name>
            <age>39</age>
            <secretIdentity>Jane Wilson</secretIdentity>
            <powers>Million tonne punch; Damage resistance; Superhuman reflexes</powers>
        </member>
    </squad>
</squads>

Parse of the query:

  • By default, the first hierarchical structure is consumed. In this case, <squads>, having 2 rows representing “Super hero squad” and “Super Villains squad” respectively. But since our goal is to bring only the members (tag <member>) we pass the path to this structure:
 SELECT line_number, col001, col002, col003, col004, col005, col006, col007, col008
            FROM
             apex_application_temp_files f,
            TABLE(
                     APEX_DATA_PARSER.parse(
                       p_content   => f.blob_content,
                       p_file_name => f.filename,
                       p_row_selector    => '/squads/squad/member',
                       p_add_headers_row => 'N'
                     )
                   )
            where f.filename = :P8_FILE_NAME

The result will be the same as seen in the JSON parse.

Conclusion

The APEX_DATA_PARSER API, available since version 19.1 of APEX, can greatly assist us in validating and importing data into our database, providing more convenience to the end user. Its powerful tools allow us to:

  • Navigate between sheets in an XLSX file through the GET_XLSX_WORKSHEETS function

  • Create a file profile through the DISCOVER function

  • Return the header value (columns) of the files through the GET_COLUMNS function

  • Consume the data contained in the file through the PARSE function

It's important to remember that this API has some limitations regarding the maximum amount of data imported, column sizes, and file sizes. Consult the documentation of your version for more details.