Blog Archives

Welcome Microsoft

Hadoop is unique from other open source projects beyond just its technology.

Hadoop is the first open source technology that created a big market. Unlike past open source successes such as Linux and MySQL that brought low-cost alternatives to existing technology markets, Hadoop has led the creation of the multi-billion dollar big data analytics market. While a number of traditional, commercial software and hardware technologies have tried to address big data analytics, they are limited by high cost, lack of linear scalability and inability to process unstructured and structured data.

Hadoop is revolutionary in the way that analytics are done. So commercial interest has exploded with virtually every large company having an Hadoop project and smaller, innovative social media, gaming and Web 2.0 companies leveraging the scalability and cost effectiveness of Hadoop to break new ground in interaction analytics. Every major hardware vendor has some form of Hadoop offering now to help meet the demand for more computing power and data storage. Even Oracle, who questioned the relevance of NoSQL databases over the last several years, announced an Hadoop offering a few weeks back.

This week, Datameer and Microsoft announce their partnership to bring Datameer’s end-user BI platform to Microsoft’s new Azure-based Hadoop offering. We are excited about this partnership for three reasons. First, Microsoft’s embrace of Hadoop as a key analytics platform expands the Hadoop eco-system in a big way. Second, Microsoft made the spreadsheet the industry standard interface for basic number crunching with over 500,000,000 Excel users. And third, given the success of the spreadsheet user interface for both Microsoft and Datameer, this partnership is a perfect match to bring Hadoop-based spreadsheet analytics to every business user.

We welcome Microsoft to the Hadoop community and look forward to working with them and our joint customers to connect people to the world’s data.

Stefan Groschupf

Posted in Announcements, Big Data Analytics Perspectives | Tagged , | Leave a comment

How I hacked California’s largest healthcare provider – by mistake.

BTW, names have been omitted on advice of our lawyers.

It all started when I tried to sign up my then fiancé for a health care plan. After creating the account, the “request a quote” form was long, more than ten pages of questions about pre-existing health conditions. I ended up not having all of the information at my fingertips, luckily there was a save and continue later button. A few days later I went back and started typing in the domain name and my browser recommended the last URL of the domain. To my surprise, the form page with all my information opened. No redirect to a login page.

Interesting.

I copy/pasted the full URL into another browser and the page opened, no login required. Wow, personal info, credit card, health history – all there without login. The URL had an id=55237 parameter. Now I got curious and concerned at the same time. I changed the id=55236? OMG, it opened the page of a different person without any login. 55235 same thing. 44567, same thing. How could be such sensitive data not be secured?

I went on their homepage looking for a number to call. When I finally got through their voice menu, the receptionist asked me for my customer number. “I don’t have customer ID, but I would like to report a security problem on your website.” “Sir – without a customer number I can’t help you.” “Ahmm – seriously? Let me speak to your manager.” “Sir – without a customer ID I can’t put you through to my manager”. “No, you don’t understand – I managed to get access to all your customer data on your website and want to report this.” After endless minutes on hold (and me sweating that my personal and financial data was at risk), I hung up. There were other numbers listed but all ended up in the call center. I tried sending messages to their technical management folks on Linkedin, 30 minutes later – nothing.

I found the number for the PR department. A woman picked up and I explained that someone technical needed to call me back immediately. Finally a young man from the privacy department called me. At first, he thought I was trying to pull a prank. “We are the largest healthcare provider in California, we take security seriously…” I managed to convince him to open a browser and gave him the full URL. Long silence as I had him change the ID to a few different numbers. He became very nervous and promised to call me back in a few minutes. The next call came from a VP. “I want to inform you that we took our complete website offline investigating the security problem, all senior management is informed including the CEO.” Nice. Finally they took me seriously. To their credit, they kept me in the loop over the next few weeks as they crunched through the log files analyzing if IP’s might have accessed more than one customer page.

Interestingly enough, more than a year after this event, Datameer has become one of the leading big data, cyber security solution providers. Today a significant number of our customers are using Datameer to analyze log files to identify abnormal server access patterns and perform security forensics.  For example, we have customers that analyze their public websites, tracking pages and APIs to identify attacks early. A major private investment service provider is analyzing server logs from hundred of servers in their service oriented architecture to understand system interaction and behavior. Two of the leading anti-virus companies are using Datameer to identify threats early and understand pattern spread by analyzing honey pot log files and virus scanner signals.

BTW, my healthcare insurance application was rejected and the healthcare company is not yet a Datameer user.

Posted in Cyber Security | 1 Comment

Calendar heatmap – “Growing a company”

 

Data visualization is worth a thousands words.

1.) 2008, 10+ engineers in a small development company doing Hadoop projects.

2.) 2009, November we founded Datameer and raised series A.

3.) 2010, developing the product, private beta, launching Datameer version 1.0 in winter.

4.) 2011, selling the product and doubling the company in less than 6 months.

5.) 2012, focus on execution and growth of the company with regular department meetings.

Posted in Company Culture | Leave a comment

Welcome, Oracle. Seriously.

.

Welcome to the most exciting and important enterprise software market in the last three decades.

And congratulations on your first Hadoop appliance. Despite your original skepticism, putting real computing power in the hands of everybody is already improving the way people store, process and analyze data.  This will have a major impact on our society.

When we built the first Hadoop systems, we estimated that hundred of thousands of business would justify the investment in Hadoop once they understood the benefits.

Next year alone, we project that over 10000 new businesses will come to that understanding. Over the next decade, the growth of big data and Hadoop will continue in logarithmic leaps.  Hadoop literacy is already becoming as fundamental a skill for computer engineers and data analysts as Java or SQL.

We look forward to responsible competition in efforts to distribute this a global open source technology to the world. And we appreciate the magnitude of your commitment.

Because what we doing is increasing social capital by enhancing organizational and business capabilities to gain new insights and better understand the world.

Welcome to the task.
The Hadoop Community.

:)

P.S. Here is the original.

Posted in Big Data Analytics Perspectives | Leave a comment

Well hydrated

 

We at Datameer strongly believe that some of the most important answers to mankind’s challenges are hidden in data. We hope we can contribute to the research efforts of science and academia by building and making available an amazingly simple, yet powerful data analytics platform that enables analysts and scientists to focus on their research without having technology get in the way.

We also believe that lessening our own human impact is key to our future and requires many small steps. So here at Datameer, along with the obvious recycling, we have embraced a culture of low impact, sustainability and going green whenever possible.  To start, we installed energy saving light bulbs in our new offices and banned all Styrofoam containers from our daily lunch deliveries.

One more impactful change we recently made is to stop buying soda and water in plastic bottles. Instead, we provided all staff with beautiful stainless steel water bottles and added filters to our tap water. We used to generate hundreds of empty soda cans and plastic bottles each month, but since we made this minor change, we have been able to cut this waste down dramatically.

 
We’ve also purchased a great soda stream machine to enable us to invent our own organic house made sodas. Turns out, this not only had a green impact but also reduced our spending. On average, we used to spend more than $2 a day per person to supply bottled water and sodas. But after less than a month, the money saved already paid for the stainless steel water bottles and the soda stream machine.

With these simple changes, we will save more than $10,000 a year just in our San Mateo office alone.  More importantly, we are not creating trash in the environment and filling up our landfills with thousands of plastic bottles.

So, go and buy some stainless steel water bottles for your company!

Posted in Company Culture | Tagged , , | Leave a comment

Whose Hadoop is Bigger? Really…

 

Guest post from CEO Stefan.

Actually, Datameer contributed the most code to the Hadoop ecosystem.

After weeks of hard work we made a surprising discovery, actually 99% of the Hadoop ecosystem code was written by Datameer engineers.

We hired an external, independent, respected analyst firm that had 5 PhD’s working on a new generation of algorithms that analyze the Hadoop ecosystem source code, jira posts, emails and ideas contributed in verbal conversations.

The breakthrough was that we analyzed the object inheritance and the call stack to weight the importance of each line of code. We also took the mental stability of contributors into account. BTW, if you still wondering what was in my coffee this morning, you don’t get my German sense of sarcasm.

Well….

When I joined the Nutch project in the early 2000’s, I was known to communicate my strong points of views very loudly in the community. I guess I lost some steam over the years, I have not even published a blog post in last few years and the Hadoop & Co mailing lists are on read only subscription.

But I felt I had to speak up about all this commotion around “my Hadoop is bigger than yours” currently lighting up the community.

I tried to take some wind out of this conversation over the last few months by using our product to analyze the Hadoop source code and present, in a very fun way, some Hadoop source code insights here. These analytics discovered the longest email conversation for the smallest code change or longest commit comment for the shortest change, etc etc.

So now we find our partners and friends sparring over whose contribution is bigger than the others. Frankly, this is all surprising to me since we have so much more work to do to move Hadoop forward. Don’t get me wrong, we love Hadoop for what it is but we all can agree that the code is still a work in progress, monolithic, difficult to test and concepts like inversion of control do not exist… I could go on for a while.

So actually I’m happy to announce that our own awesome engineering team is not responsible for this but instead focused on working on a great analytics product on Hadoop that brings great value to our customers.

Here at Datameer we work hard but also make sure we have a good time including sharing a laugh over the most stressful situations.

In that spirit we would love to contribute a laugh to the ongoing “civil war” in the Hadoop ecosystem.  To commemorate this epic discussion, we have designed a special t-shirt that we would love to share free with the community.

Just fill out the form and we send you your own free shirt (one shirt per household, while supplies last).

Send me one of these cool “My Hadoop is Bigger Than Yours” t-shirts.

Ok, people, now back to work – lets build some great technology instead of arguing about lines of code.

P.S. We have some customers using DAS, their Hadoop is for sure bigger than yours. :)

Stefan Groschupf

Posted in Uncategorized | Tagged , , , , , | 3 Comments