The USA's healthcare.gov site and LAMP
The USA's health care exchange site, healthcare.gov, has had well-publicized initial woes.
The New York Times has said one of the problems was the government's choice of DBMS, namely MarkLogic. A MarkLogic employee has said that "If the exact same processes and analysis were applied to a LAMP stack or an Oracle Exa-stack, the results would have likely been the same."
I don't know why he picked Exastack for comparison, but I too have wondered whether things would have been different if the American government had chosen a LAMP component (MySQL or MariaDB) as a DBMS, instead of MarkLogic.
What is MarkLogic?
The company is a software firm founded in 2001 based in San Carlos California. It has 250 employees. The Gartner Magic Quadrant classes it as a "niche player" in the Operational DBMS Category.
The product is a closed-source XML DBMS. The minimum price for a perpetual enterprise license is $32,000 but presumably one would also pay for support, just as one does with MySQL or MariaDB.
There are 250 customers. According to the Wall Street Journal "most of its sales come from dislodging Oracle Corp."
One of the customers, since 2012 or before, is CMS (the Centers for Medicare and Medicaid), which is a branch of the United States Department of Health and Human Services. CMS is the agency that built the healthcare.gov online portal.
Is MarkLogic responsible for the woes?
Probably MarkLogic is not the bottleneck.
It's not even the only DBMS that the application queries. There is certainly some contact with other repositories during a get-acquainted process, including Oracle Enterprise Identity management -- so one could just as easily blame Oracle.
There are multiple other vendors. USA Today mentions Equifax, Serco, Optum/QSSI, and the main contractor
A particular focus for critics has been a web-hosting provider, Verizon Terremark. They have been blamed for some of the difficulties and will eventually be replaced by an HP solution. HP also has a fairly new contract for handling the replication.
Doubtless all the parties would like to blame the other parties, but "the Obama administration has requested that all government officials and contractors involved keep their work confidential".
It's clear, though, that the site was launched with insufficient hardware. Originally it was sharing machines with other government services. That's changed. Now it has dedicated machines.
But the site cost $630 million so one has to suppose they had money to buy hardware in the first place. That suggests that something must have gone awry with the planning, and so it's credible what a Forbes article is saying, that the government broke every rule of project management.
So we can't be sure because of the government confidentiality requirement, but it seems unlikely that MarkLogic will get the blame when the dust settles.
Is MarkLogic actually fast?
One way to show that MarkLogic isn't responsible for slowness, would be to look for independent confirmations of its fastness. The problem with that is MarkLogic's evaluator-license agreement, from which I quote:
MarkLogic grants to You a limited, non-transferable, non-exclusive, internal use license in the United States of America
[You must not] disclose, without MarkLogic's prior written consent, performance or capacity statistics or the results of any benchmark test performed on Software
[You must not] use the Product for production activity,
You acknowledge that the Software may electronically transmit to MarkLogic summary data relating to use of the Software
These conditions aren't unheard of in the EULA world, but they do have the effect that I can't look at the product at all (I'm not in the United States), and others can look at the product but can't say what they find wrong with it.
So it doesn't really matter that Facebook got 13 million transactions/second in 2011, or that the HandlerSocket extension for MySQL got 750,000 transactions/second with a lot less hardware. Possibly MarkLogic could do better. And I think we can dismiss the newspaper account that MarkLogic "continued to perform below expectations, according to one person who works in the command center." Anonymous accounts don't count.
So we can't be sure because of the MarkLogic confidentiality requirements, but it seems possible that MarkLogic could outperform its SQL competitors.
Is MarkLogic responsible for absence of High Availability?
High Availability shouldn't be an issue.
At first glance the reported uptime of the site -- 43% initially, 90% now -- looks bad. After all, Yves Trudeau surveyed MySQL High Availability solutions years ago and found even the laggards were doing 98%. Later the OpenQuery folks reported that some customers find "five nines" (99.999%) is too fussily precise so let's just round it to a hundred.
At second glance, though, the reported uptime of the site is okay.
First: The product only has to work in 36 American states and Hawaii is not one of them. That's only five time zones, then. So it can go down a few hours per night for scheduled maintenance. And uptime "exclusive of scheduled maintenance" is actually 95%.
Second: It's okay to have debugging code and extra monitoring going on during the first few months. I'm not saying that's what's happening -- indeed the fact that they didn't do a 500-simulated-sites test until late September suggests they aren't worry warts -- but it is what others would have done, and therefore others would also be below 99% at this stage of the game.
So, without saying that 90 is the new 99, I think we can admit that it wouldn't really be fair to make a big deal about some LAMP installation that has higher availability than healthcare.gov.
Is it hard to use?
MarkLogic is an XML DBMS. So its principal query language is XQuery, although there's a section in the manual about how you could use SQL in a limited way.
Well, of course, to me and to most readers of this blog, XQuery is murky gibberish and SQL is kindergartenly obvious. But we have to suppose that there are XML experts who would find the opposite.
What, then, can we make out of the New York Times's principal finding about the DBMS? It says:
"Another sore point was the Medicare agency’s decision to use database software, from a company called MarkLogic, that managed the data differently from systems by companies like IBM, Microsoft and Oracle. CGI officials argued that it would slow work because it was too unfamiliar. Government officials disagreed, and its configuration remains a serious problem."
-- New York Times November 23 2013
Well, of course, to me and to most readers of this blog, the CGI officials were right because it really is unfamiliar -- they obviously had people with experience in IBM DB2, Microsoft SQL Server, or Oracle (either Oracle 12c or Oracle MySQL). But we have to suppose that there are XML experts who would find the opposite.
And, though I think it's a bit extreme, we have to allow that it's possible the problems were due to sabotage by Oracle DBAs.
Yet again, it's impossible to prove that MarkLogic is at fault, because we're all starting off with biases.
Did the problem have something to do with IDs?
I suspect there was an issue with IDs (identifications).
It starts off with this observation of a MarkLogic feature: "Instead of storing strings as sequences of characters, each string gets stored as a sequence of numeric token IDs. The original string can be reconstructed using the dictionary as a lookup table.”
It ends with this observation from an email written on September 27 2013 by a healthcare.gov worker: "The generation of identifiers within MarkLogic was inefficient. This was fixed and verified as part of the 500 user test."
Of course that's nice to see it was fixed, but isn't it disturbing that a major structural piece was inefficient as late as September?
Hard to say. Too little detail. So the search for a smoking gun has so far led nowhere.
Is it less reliable?
Various stories -- though none from the principals -- suggest that MarkLogic was chosen because of its flexibility. Uh-oh.
The reported quality problems are "one in 10 enrollments through HealthCare.gov aren't accurately being transmitted" and "duplicate files, lack of a file or a file with mistaken data, such as a child being listed as a spouse."
I don't see how the spousal problem could have been technical, but the duplications and the gone-missings point to: uh-oh, lack of strong rules about what can go in. And of course strong rules are something that the "relational" fuddy-duddies have worried about for decades. If the selling point of MarkLogic is in fact leading to a situation which is less than acceptable, then we have found a flaw at last. In fact it would suggest that the main complaints so far have been trivia.
This is the only matter that I think looks significant at this stage.
How's that hopey-changey stuff working out for your Database?
The expectation of an Obama aide was: "a consumer experience unmatched by anything in government, but also in the private sector."
The result is: so far not a failure, and nothing that shows that MarkLogic will be primarily responsible if it is a failure.
However: most of the defence is along the lines of "we can't be sure". That cuts both ways -- nobody can say it's "likely" that LAMP would have been just as bad.
Submissions at Percona Live Santa Clara 2014 and Lightning talks
The call for participation at Percona Live MySQL Conference and Expo 2014 is now closed. There have been more than 320 submissions, and this will keep the review committee busy for a while.One important point for everyone who has submitted: if you have submitted a proposal but haven’t included a bio in your account, do it now. If you don’t, your chances of being taken seriously are greatly reduced. To add a bio, go to your account page and fill in the Biography field. Including a picture is not mandatory, but it will be definitely appreciated.Although the CfP is closed for tutorials and regular sessions, your chances of becoming a celebrity are not over yet. The CfP is still open for Lightning talks and Bird of a Feather sessions.If you want to submit a lightning talk, you still have time until the end of January. Don’t forget to read the instructions and remember that lightning talks don’t give you a free pass, but a healthy 20% discount.So far, I have received 16 proposals. Of these, 6 have been rated highly enough to guarantee acceptance (including mine, for which I have not voted.) We still have 6 spots to fill (12 spots in total, 5 minutes each,) and I’d rather fill them with talks that appeal to everyone in the committee, than scrap the barrel of the mediocre ones. My unofficial goal is to have so many good submissions that I will have to withdraw my own talk. Thus, the potential number of available spots is 7. Please kick my talk off stage, by submitting outstanding proposals!
Props to the MySQL Community Team
Enough negativity sometimes gets slung around that it’s easy to forget how much good is going on. I want to give a public thumbs-up to the great job the MySQL community team, especially Morgan Tocker, is doing. I don’t remember ever having so much good interaction with this team, not even in the “good old days”:
Advance notice of things they’re thinking about doing (deprecating, changing, adding, etc)
Heads-up via private emails about news and upcoming things of interest (new features, upcoming announcements that aren’t public yet, etc)
Solicitation of opinion on proposals that are being floated internally (do you use this feature, would it hurt you if we removed this option, do you care about this legacy behavior we’re thinking about sanitizing)
I don’t know who or what has made this change happen, but it’s really welcome. I know Oracle is a giant company with all sorts of legal and regulatory hoops to jump through, for things that seem like they ought to be obviously the right thing to do in an open-source community. I had thought we were not going to get this kind of interaction from them, but happily I was wrong.
(At the same time, I still wish for more public bug reports and test cases; I believe those things are really in everyone’s best interests, both short- and long-term.)
S**t sales engineers say
Here’s a trip down memory lane. I was just cleaning out some stuff and I found some notes I took from a hilarious MySQL seminar a few years back. I won’t say when or where, to protect the guilty.
I found it so absurd that I had to write down what I was witnessing. Enough time has passed that we can probably all laugh about this now. Times and people have changed.
The seminar was a sales pitch in disguise, of course. The speakers were singing Powerpoint Karaoke to slides real tech people had written. Every now and then, when they advanced a slide, they must have had a panicked moment. “I don’t remember this slide at all!” they must have been thinking. So they’d mumble something really funny and trying-too-hard-to-be-casual about “oh, yeah, [insert topic here] but you all already know this, I won’t bore you with the details [advance slide hastily].” It’s strange how transparent that is to the audience.
Here are some of the things the sales “engineers” said during this seminar, in response to audience questions:
Q. How does auto-increment work in replication? A: On slaves, you have to ALTER TABLE to remove auto-increment because only one table in a cluster can be auto-increment. When you switch replication to a different master you have to ALTER TABLE on all servers in the whole cluster to add/remove auto-increment. (This lie was told early in the day. Each successive person who took a turn presenting built upon it instead of correcting it. I’m not sure whether this was admirable teamwork or cowardly face-saving.)
Q. Does InnoDB’s log grow forever? A: Yes. You have to back up, delete, and restore your database if you want to shrink it.
Q. What size sort buffer should I have? A: 128MB is the suggested starting point. You want this sucker to be BIG.
There was more, but that’s enough for a chuckle. Note to sales engineers everywhere: beware the guy in the front row scribbling notes and grinning.
What are your best memories of worst sales engineer moments?
1. For the avoidance of doubt, it was NOT any of the trainers, support staff, consultants, or otherwise anyone prominently visible to the community. Nor was it anyone else whose name I’ve mentioned before. I doubt any readers of this blog, except for former MySQL AB employees (pre-Sun), would have ever heard of these people. I had to think hard to remember who those names belonged to.
5 great things about Markus Winand’s book SQL Performance Explained
Join 12,100 others and follow Sean Hull on twitter @hullsean. 1. Covers databases broadly You may not have noticed, but there’s a whole spectrum of relational databases on offer. Of course in the database world, most get infatuated with one, and that becomes their bread & butter before long. Their life, their passion, their devotion. […]The post 5 great things about Markus Winand’s book SQL Performance Explained appeared first on Scalable Startups.