whedimer Just another WordPress weblog

4Mar/100

Importing XML to a Relational Database Part I

I've come to a project that requires some data in XML format to be imported into a relational database. My understanding of the XML files is that they may be in various standard formats, suggesting that they all are well defined and conforms to some schema. The goal is form a mapping from XML data to the database, so how does one accomplish this? There are XML importing tools for Microsoft Access, but I do not have personal resources to spend on part-time projects and so those are not an option. A brief inspection of those family of tools suggests that they mostly import the XML files by create table structures based on the XML (more specifically, its schema), which seems useless when the goal is to add to an existing database. Therefore, if adding the structure does not work, then just add the data to the existing table structure.

Stating the solution is always simpler than implementing it or working out all the details. Getting the data out and forming it into a new structure is fairly straight forward when the XML is known, but becomes a bit more difficult when constructing a generic solution that could work with any XML file and a wider range of database designs. The non-generic solution involves finding all 1-1 mappings between the XML and database fields, selecting the XML data, forming SQL insert/update queries in the appropriate order deemed by the particular structure of the database (such as inserting rows that may be used as foreign keys first), and performing any additional queries.

A generic solution does not know which order to perform the SQL queries, or what XML nodes to map, or the structure of the database or XML file. The generic solution takes in account situations where the XML has fewer or more fields that what the database requires without having to rewrite code and mundane insert queries. For example, and a set of XML files describe a set of metal parts for some machine. The database tracks whether the part is metal, plastic, or rubber. The XML files have no field describing the fact that they are made from metal, yet to be imported into the database they must have that property. One could simply alter the non-generic solution by changing the insert statement, or one could change some mapping markup. For example (not exactly what I am using):

<map>
<dbfield>part.partmaterial</dbfield>
<value>metal</value>
</map>

In this xml fragment it is clear that the field partmaterial of the table part is mapped to a static value metal. Then this mapping can be used for all metal parts and if a new part comes by that is known to be metal, then the same XML config file can use. This formatting style also makes it completely obvious what is being mapped to what, unlike an SQL insert query.

The solution I am pursuing is using a configuration file (or mapping file) to map particular XML text nodes to fields in a database and focusing on 1-1 mappings first and ignoring runtime constants (such as surrogate keys) until everything else works. There are limitations and does nothing about forming triggers or procedures based on the XML data. I don't plan on supporting them, so the problem is solved.

As of now the generic tool is not completed from end to end (from reading the mapping file, validating the mapping file, checking for mapping for semantic errors, forming and executing the SQL queries, defining runtime constants). In the future this post may be updated or new posted be created with additional information on the process.

Comments and constructive criticism are welcomed in the comments.

Part II

1Mar/100

PHP Lib Testing

Lately the PHP has gained a few units tests. I know usually the tests are done while the code is being developed and there is less to gain by writing tests after the fact, but the point it to help future development. The whole library is not completely cover by test cases, but it in progress. Currently, I received some consultant work to do and so increases the unit tests will have to wait a while.

29Oct/090

PHP lib

I created a PHP library for my Master's PILOT and had been debating on whether to release it to the public in a GPL license. I finally realized that there are many others PHP libraries out there with many more features and proven uses and so the chances of my library being used in any significant way is small. So why release it then? It might teach someone something and would increase the quality of the library with potential public eyes on it. I spent some time creating a script that automatically creates an tarball of the latests version from the repository and can be found at download.whedimer.com/snapshot/wlib_php. Stable releases will eventually come.

Criticism on the library is welcome so long as the language used is civil and specific towards a particular aspect (no vague or general criticisms please). Keep in mind I have little professional experience and so keep your expectations low.