whedimer Just another WordPress weblog

2Apr/102

Importing XML to a Relational Database Part II

In this second part (part I) focusing on how to import XML into a relational database I will provide details on a solution that I have developed.  I am not primarily interested in importing XML "databases", but more interested in importing a single record because once a single record is imported it is simpler to apply the same logic to more than one record. I will also not be concerned about generating tables based from XML schema or DTD because I assume that a database already exists. To do this I will be using PHP, xPath, and MySQL.

You can find a draft of the guide to the xml2db tool at: http://download.whedimer.com/doc/xml2db.pdf .

From the first part of this series, I mentioned that importing XML into a relational database can be done by mapping particular text nodes in the XML to particular fields in database tables using some sort of mapping. This is the approach I will be taking, using PHP to handle the parsing and implementation, xPath to handle the selection of XML nodes and MySQL as the database to import into.

The focus will be on creating a tool does the following in order:

  • reads and loads the mapping configuration
  • checks the mapping for validity
  • performs the mapping and imports the data

The overall tasks are:  reading the mapping file, creating a model of mapping, validating the model, and then use the model to execute insertions in the database. Before we can begin on the first step, reading and loading the mapping configuration, we must decide what sort of markup should be used to specify the mapping. In part one I used XML to show a basic mapping between an database field and a static value and will extend upon it further to include more data sources. To fully understand what kinds of mappings are required, we have to understand XML schema, such as the fact that a node's existence can be optional or a node might be empty.  I will not be explaining XML or how to write or understand XML schema or DTDs.

The Mapping File

A mapping file is a list of maps from some datasource to a single database field. The most common datasources are static values, and  XML node(s) in the file to import. There are some other less common sources, but lets focus on the common ones. So we create a mapping prototype.

Note: we will not use attributes of nodes due to various issues they raise.

<mapset>
   <map>
      <dbfield>table1.field1</dbfield>
      <value>static value</value>
   </map>
</mapset>



With this prototype and every revision to it, we will implement everything up to and including validation. We use PHP to load it into DOMDocument object, making sure it load correctly before moving on to the next step, getting values from the DOMDocument into our own model. What I did was to create a MapSet object that took a file and created a list of Map objects that included all the data required for a single mapping. The Map object has a type varaible based on the type of datasource it was given. An alternative would be create a separate object for each map type (or datasource) that extends a base Map object.

For this prototype, we just need the MapSet object to create Map and  configure them with the database field and the static value using the Map's accessor methods for those fields. More specifically the MapSet object gets DOMXPath from the DOMDocument and uses the xPath expression "/mapset/map" to get a NodeList, which is iterated upon to generate each Map object. When the MapSet has no more Map objects to create, then we pass MapSet to a checking function.

For now all the checking function does is to ensure the supplied database field exists, which can be done either by executing a select statement and checking for errors, or by first getting the list of tables, the fields for the tables mentioned in the database, store them, and then compared against them. We must check the mapping file because we do not want to come across a table or field that does not exist when executing SQL statements, and then tell the user that a partial import happened,  leaveing the database in an unknown state.

In the next part we'll extend the prototype to actually use data from the xml data file we want to import.

Filed under: Uncategorized 2 Comments
29Oct/090

PHP lib

I created a PHP library for my Master's PILOT and had been debating on whether to release it to the public in a GPL license. I finally realized that there are many others PHP libraries out there with many more features and proven uses and so the chances of my library being used in any significant way is small. So why release it then? It might teach someone something and would increase the quality of the library with potential public eyes on it. I spent some time creating a script that automatically creates an tarball of the latests version from the repository and can be found at download.whedimer.com/snapshot/wlib_php. Stable releases will eventually come.

Criticism on the library is welcome so long as the language used is civil and specific towards a particular aspect (no vague or general criticisms please). Keep in mind I have little professional experience and so keep your expectations low.