Geospatial Business Intelligence (GeoBI) DWG

The GeoBI DWG has published an initial white paper, but it is now time to take the process forward and determine what action to take. This is an interesting period in the development of the web and there are opportunities for organisations to pull data from a variety of sources to add understanding to their business operations. Governments are
releasing data based on URIs for use in Linked Data applications (and this includes administrative boundaries) and the open source movement is doing similar things with DBPedia and in our domain Geonames.

A key activity of the GeoBI working group is to properly define the scope and conceptual model for and how this fits within a problem statement and the OGC as a standards body.

Problem Statement

Typically location is treated as a “dumb” attribute in BI systems (coded sales zones as opposed to sales zones as geographic boundaries). Location is a key
component in adding to understanding across time series data. Maps have been added to ERP solutions over the last 15 years with a degree of success but without fully contributing to a deeper understanding. There are a number of reasons for this:

  • Key analytical tools (for example point-in-polygon, adjacency queries) are not in the hands of the end-users
  • Data assemblage, a difficulty for GIS professionals, is even more difficult for non-GIS practioners where data originates from unstructured as well as structured sources
  • Non GIS users do not understand the issues associated with co-ordinate geometry (nor should they need to)
  • Consistently repeating analyses over time and staffing changes is hard. Certain data are organised by boundaries which are transient, impacting time series analysis as they change. GIS data comes with its own boundaries such as tiles and layers which non GIS users do
    not understand (no reason they should)
  • How do we build solid geographic data sets when we only have addresses or zip codes?
  • Semantic issues such as, “what is London King’s Cross station?” are compounded by a difference in the view non GIS users have of what constitutes an explicit geographical description.

These problems are not unique to the GeoBI DWG and there is overlap between the SDI DWG and ourselves in proposing solutions to these issues. The GeoBI issues go beyond the SDI problem domain because of the confidentiality of certain statistical datasets that governments
have collected on our behalf and the need to anonymise such data as a starting point.

User Stories

Almost all of the stories involve the use of sensors to track movement (this may just be a personal mobile phone).

Retail

We do Big Data. Amongst many things, we collect personal data on our customer’s shopping habits. We mine Facebook and Twitter for trend information.

We want to extend that capability to include location so that we can take a customer profile and target other customers in the same socio-economic group. One of the key things we need to understand is why a customer comes to our store as opposed to other stores in
the area.

We need to maintain profit margins on home delivery. We need to combine routing with vehicle requirements and driver shift patterns in order to do this effectively.

Logistics

We need to combine routing with vehicle requirements and driver shift patterns in order to maximise return on capital invested.

When one of my trucks passes out of GPS coverage I don’t want cascading alarms when it returns to a signal coverage area.

Insurance

Our products are offered on the basis of driver age, gender, location, mileage (proportion of motorway vs non-motorway), etc. We want to move to dynamic product charging for our products (like a smart grid), and for this we need sophisticated location information.

We need to enhance our fraud detection capabilities.

Banking

Customer makes a call to a call-centre and says, “I have a XYZ debit card. Where can I find the nearest ATM that dispenses cash free of handling charges?” How does the Bank answer this question and not end up with a disgruntled customer? How do they associate the ATM with a
co-ordinate  and provide directions from the current co-ordinate?

We need to enhance our fraud detection capabilities with a sophisticated understanding of how our customers travel so that we can freeze a credit card being used in a location the real cardholder is unlikely to be, but avoid false positives – stopping our customer from using the card just because they’ve never been in that town before. For example, if
an American has used their card in Paris, Munich and Copenhagen in the last few years, it’s not odd to see the card being used in Barcelona, but it may be odd to see a charge in Belarus.

Moving Forward – Towards a Test Bed

The work undertaken by Australian Bureau of Statistics (ABS) on the Standard Statistical Framework [Reference Ben?] could form the basis of a model for GeoBI and SDI. The 5 components are:

  • An agreed and consistent
    mechanism for geocoding personal and business data.
  • Location capable data
    management (unit record data stored with a geocode).
  • A
    common set of population density related geographic boundaries.
  • Statistical-Spatial
    Metadata interoperability.
  • Policies, practices,
    protocols & guidelines

    • Confidentiality and privacy methodologies
    • Data quality
    • Analysis approaches
    • Dissemination
    • Visualisation

 

Some, all or none of these components exist in countries around the world, and the level of geospatial maturity of government and organisations in each country varies considerably.   To develop the Statistical Spatial Framework it is considered that a phased approach to establish these components might be sensible.

A standard boundary methodology based on mesh blocks has merit.  Mesh blocks are based
on a population centric approach, so that each level in the hierarchy has a similar number of people, regardless of the actual area covered by the polygon.  Other concepts have  been investigated in OWS-8, where a hexagonal grid based approach was developed by Pxys. While this may have merit and need to be investigated further, it appears that this gridded approach will not easily cater for changes in population densities that occur in the real world.  Consequently there is likely to be considerable variations in population numbers within each level of the hexagon heirarchy.

Additionally, some refinement may be necessary to meet the confidentiality issues that personal data collected by government is constrained by. Each hexagon could be assigned a triple.
Aggregation could take place to protect confidentiality and data published at the standard area appropriate to achieve this. Standard naming conventions could be associated to given levels of aggregation. This approach has merit in allowing organisations like the UN to be able to compare countries’ needs on a consistent basis as opposed to the basis set up by the jurisdiction itself.

Once this grid has been established could I easily pull a set of co-ordinates and then go and assemble other data? The other data could be pulled from other Linked Data sources to combine with these data. These include references to place names, boundaries (current and former). Rich sources include Geonames and the UNSDI gazetteer.