October 2008 - Posts

  • Spotfire 2.2 and Network Analytics

    Earlier this week, we announced the release of Spotfire 2.2, the latest update to the Spotfire platform.  It's always good to get an update to the platform into the market, and we've made some great strides with Spotfire this year (a topic for another post), but I'm particularly pleased with some of the things we've added in this release.

    Spotfire has historically (10+ years) been a leader in in-memory, interactive visualization, and given how much end users like being able to actually understand their data, it's not surprising that we've started to see other vendors adding some data visualization capabilities to their offerings.  Not-unrelatedly, we've started to get some questions about whether or not our core historical strengths were enough to continue differentiating ourselves from the rest of the market.

    Without getting into the other things that we're doing to marry user-driven analytics with predictive analytics, event-processing and other enterprise technologies, the 2.2 release of the Spotfire platform provides a great response to questions about how Spotfire is different.

    The two biggest additions are both new visual analysis tools:

    3D Plot The 3-D Plot allows for the visualization of multiple dimensions on a single plot.  While you can add multiple dimensions to a 2-D dot plot with the use of color, shape or size, or by trellising multiple plots, it's not always easy to identify trends within groups, or across different plots.

    The 3-D plot addresses some of those challenges, and provides an understandable visual framework for displaying results from statistical techniques such as Principle Components Analysis (a dimension reduction technique for highly multi-variate data).

    It's also great for cases, such as the example shown here--measurements from the drill hole of an oil well--where the actual data are measurements made in three dimensions.

     

    Network AnalyticsNetwork Analytics, a extensible visualization tool for navigating and analyzing networks, is built entirely using the Spotfire public SDK, and it's something that I'm really excited about.

    Wearing my analysis-loving geek hat, I think that analysis of networks is going to be one of, if not the, hottest area of data analysis in the not-too-distant future.  It's been used extensively for years in a few areas such as intelligence and other specialized fields, but its value is becoming more and more evident as everything becomes ever-more connected.

    For instance, I'm a member of a Harvard-sponsored working group on Food Safety (last meeting detailed here), and it's absolutely critical for the FDA to be able to quickly traverse the immense network of food suppliers when there is an outbreak of food-borne illness, not only to identify the source, but to quickly clear the suppliers whose products aren't at risk.

    That's not something that can be readily done with other types of visualization or analysis techniques.

    Similarly, social networking sites such as LinkedIn, Facebook, Twitter and others create networks, the analysis of which is interesting to many, and a real business opportunity for folks who would like to advertise to targeted groups of consumers.  Such networks are only going to proliferate in the future, and the ability to understand them will be key to decision making across industries and disciplines.

    Beyond those two items, there are a number of other improvements to the platform, but it's these two pieces that I'm really excited by.

  • Best Thing I Read Today

    "Analyzing data in aggregate is a crime against humanity."

    That's according to Avinash Kaushik, Analytics Evangelist (hey, cool title!) at Google.  He goes on to say:

    Bold statement, but the reality is that a “monolith” does not come to your website. Your site does not exist for a singular reason either. The core drivers of traffic are magnificently different for each core group of visitors.

    So your website’s really a mix of Visitor Sources, Visitor Behavior and your Desired Outcomes.

    When you look at all that in aggregate you get nothing. You think Average Time on Site means something. No! You think All Visits and Overall Conversion Rate gives you insights. Nyet! You think understanding Keywords without drilling down to each search engine will be awesome. Non!

    If you want to find actionable insights you need to segment your web analytics data.

    (emphasis mine)

    The only thing that I'd change is that his comments are applicable to all data, not just web analytics data.  If you want to find actionable insights, aggregations just won't cut it.  You've got to move beyond the cube.

     

  • Spreadsheets Don't Cause Problems?

    I'd suggest that anyone who thinks that it's not possible to cause all manner of trouble with uncontrolled spreadsheets read this:

    A formatting fubar involving an Excel spreadsheet has left Barclays Capital with contracts involving collapsed investment bank Lehman Brothers than it never meant to acquire.

    Working to a tight deadline, a junior law associate at Cleary Gottlieb Steen & Hamilton LLP converted an Excel file into a PDF format document. The doc was to be posted on a bankruptcy court's website before a midnight purchase offer deadline on 18 September, just four hours after Barclays sent the spreadsheet to the lawyers. The Excel file contained 1,000 rows of data and 24,000 cells.

    Some of these details on various trading contracts were marked as hidden because they were not intended to form part of Barclays' proposed deal. However, this "hidden" distinction was ignored during the reformatting process so that Barclays ended up offering to take on an additional 179 contracts as part of its bankruptcy buyout deal, Finextra reports.

     

    (HT: Andy Hayler)

     

  • Harvard Executive Session on Food Safety

    I was recently asked to participate in the Harvard University Executive Session on Food Safety--hosted by the Kennedy School of Government--dedicated to enhancing cooperation and data sharing between the various components of industry and the agencies responsible for preventing and responding to outbreaks of food-borne illness.  It was attended by senior people from the FDA and State health agencies, as well as leaders from all points in the food-supply chain.  I was invited to provide some insight about how analytics might be useful in tracing products back to their origins in the case of outbreaks, and how such outbreaks could be predicted and prevented.
    Interestingly, and perhaps unsurprisingly, the challenges aren't predominantly analytic, but related to data integration.  Think for a moment about what the FDA needs to go through to trace an outbreak:


    From a set of cases, they need to track down where those who are ill ate or bought their food, and from each of those locations, track the implicated food back along its supply chain to its source, looking for points at which multiple cases converge to identify the problem.


    If you’ve got the data on who bought what from whom and when, it’s a pretty easy problem.  However, the required data don’t conveniently live in someone’s data repository, but are diffused across all points of the food-supply chain.  Based on some quick googling, it seems that there are roughly 1 million restaurants in the United States and nearly 200k grocery stores.  They are sold to by a vast and complex network of suppliers, distributors, wholesalers, shippers and producers.  There is no standard for keeping shipping records, nor standard for describing which items are shipped—you wouldn’t believe how many varieties there are of a single vegetable there are, and how many more names those varieties go by.


    The challenge of being able to navigate this data is immense—literally millions of different silos of information, much of it stored only in paper documents such as invoices.  Being able to do it under the kind of time pressure the FDA faces when there is an outbreak of food-borne illness is tougher still.
    However, it is a tractable problem, and the session yesterday was a step towards a solution, and I’m looking forward to further sessions with the group.

  • Visit to the Boulder BI Braintrust

    On Friday, I visited the Boulder BI Braintrust with Spotfire's Sr. Director of Marketing, Mark Lorion.  Mark and I were invited to give the folks in the Braintrust an overview of Spotfire and get feedback from some of the brightest people in Business Intelligence and Data Warehousing.

    Though the weather wasn't the perfect Colorado blue sky and fall air that I, being a CO native, bragged to Mark about, the visit with the folks at the Braintrust more than made up for the rain.  It was great to have so many smart people together in a single room, and get their feedback on some of the things that we're doing and planning here at Spotfire.  One thing that the group found particularly interesting was the on-going integration of Spotfire with TIBCO's event-processing and BPM software, currently sold as Operations AnalyticsRichard Hackathorn blogged the meeting, and describes the integration:

    Operational BI is seeking advanced analytics that operate upon event streams. The gaps are quite apparent between mainstream BI sitting on top of the enterprise business data versus CEP (like TIBCO) sitting on top of the enterprise business processes. Spotfire can act as an integrating component that bridges those gaps. If Spotfire moves beyond the pixels-on-the-screen, its integration value will be based upon consuming data from and generating data to the BI infrastructure.

    It was great to see other people as excited about this as I am.  As I've mentioned before, I think that the way that BI becomes pervasive is to embed itself into business processes, and doing analytics on the event stream presents an obvious opportunity for such an integration.

    I also recorded a podcast with Claudia Imhoff, which you can find on the Braintrust's podcasts page

  • What Does Increased Integration of R Mean?

     I was pointed to an interesting post on the growing prevalence of R support in statistical packages.

    In terms of OSS, we are seeing wholesale integration of R into such packages as Spotfire and SPSS.  SPSS it seems is even offering a menu system to access R routines! I’ve also heard rumors that SAS is demoing an R interface in their SAS/STAT Studio product.

    In my opinion, integrating R into each of these packages will have the effect of making statistical code and models portable across packages. This will eventually dilute the value of the packages statistically and make their value being evaluated on ability to manipulate and import data, connect to databases, and how effectively they put together their menu systems.

    On one hand, I woudn't say that Spotfire has a "wholesale integration of R," but with the recent addition of S+ to the Spotfire platform, it's clear that our support for R is stronger than ever before. 

    The point of the original comment stands.  But even with the growing stature of R, there's a whole lot of value packaged up in the "ability to manipulate and import data.. and how effectively they put together their menu systems."  Offering end-users the ability to easily manipulate their data, and effectively interact with it is the entire value proposition for several software vendors, and a big part of the value for others (including Spotfire).  Statistical and predictive analytics are becoming a bigger and more important part of Business Intelligence, but even though they comprise a relatively small part of most BI vendors' offers, BI is still a multi-billion dollar market.

    In any case, if R becomes the defacto language for statistical modeling, I like Spotfire's chances of competing with SAS and other BI vendors on quality of user experience!

in

About this Blog

This blog's objective is to bring TIBCO closer to our customers, potential customers, analysts, partners, and employees. Please join the discussion and add smart comments frequently. The opinions expressed here are those of the individuals and not reviewed by anyone but the individual authors. While they are employed by TIBCO, neither TIBCO nor anybody else necessarily agrees with them.

Copyright 2000-2007 TIBCO Software Inc | Privacy Policy | Terms of Use | Spotfire Central Sitemap | Guidelines