NuCypher is using proxy re-encryption to lift more enterprise big data into the cloud

After spending time at a London fintech accelerator last year, enterprise database startup ZeroDB scrapped its first business plan and mapped out a new one. By January this year it had a new name: NuCypher. It was no longer going to try to persuade enterprises to switch out their Oracle databases — but rather to sell them on a specialized encryption layer to enhance their ability to perform big data analytics by tapping into the cloud. Its slogan: body armor for big data.

Today it’s launching an open source version of its general release product here at TechCrunch Disrupt New York. At this point, the almost 1.5-year-old startup is also running a handful of pilots with major banks, says co-founder MacLane Wilkison.

“It’s a combination of cloud and big data,” he says of the underlying drivers which the team reckons are creating a need for the technology. “Now all of a sudden you’re working in computing environments that are distributed across hundreds or thousands of machines, and that could be spanning both some on-prem, some private and even public cloud. And that sort of scenario presents a lot of new and different security challenges.”

Instead of building an open source end-to-end encrypted database, NuCypher is selling a proxy re-encryption platform for corporates with large amounts of sensitive data stored in encrypted databases to let them securely tap into the power of cloud computing. An idea that might need a bit of explaining to appreciate, but one that’s grounded in a genuine need — at least based on what NuCypher’s early banking partners are telling it.

On the competitors front Wilkison names the likes of HP-owned Voltage and Protegrity as the largest existing players in the space. Albeit, he says they’re both doing tokenization of data, whereas NuCypher reckons proxy re-encryption technology offers greater security for certain types of data.

Unlike some other approaches to processing big data in the cloud, he emphasizes that NuCypher is not using tokenization to mask any data — arguing this is necessary for the target customers because certain types of data when masked with tokens can be vulnerable to statistical attacks.

While proxy re-encryption is an existing area of cryptography, applying it to big data is what’s novel here, according to Wilkison, who says the tech has mostly been used in academia thus far. “We’re the only people that applied it to big data platforms like Hadoop and Spark,” he says. “As far as I know we’re the only one using proxy re-encryption in business.”

So while the team’s early ideas focused mostly on looking at data archiving and encryption to enable banks to make use of cloud storage, he says the business was pulled onto its current rails after banks asked if they could apply the encryption tech the team had been building for data archiving to big data cloud processing.

Safe to say, this mini pivot is a familiar story for enterprise startups — after all, who knows better the business needs than the target customers?

“When we originally started the company, my co-founder and I had built an open-source database and then an encrypted database that allows you to operate unencrypted data without sharing encryption keys with the database server… What the banks were particularly interested in was taking some of what we had built for that and applying it to more compute-heavy type of workloads,” says Wilkison.

“After a period of talking to customers… we took some of what we had built for that and made it into a more generalized encryption layer for different platforms — specifically for the big data space. So Hadoop, Kafka and Spark.”

So what is proxy re-encryption — aka NuCypher’s “secret sauce,” as Wilkison puts it — and why is the technique useful for banks?

“Proxy re-encryption is a set of encryption algorithms that allow you to transform encrypted data. Specifically… it allows you to re-encrypt data — so you have data that’s encrypted under one set of keys, you can re-encrypt the data without de-encrypting it first, so that now it’s encrypted under a second, different set of keys,” is how Wilkison explains it.

He gives the example of a person who has some encrypted files stored in Dropbox. If they want to share the files with someone else that could be achieved by downloading them, decrypting them with their key and then re-encrypting them with the public key of the person they want to share with. But obviously — at scale — that’s a pretty network-intensive and cumbersome process.

Even more naively, this person could just share their private encryption key with the person they want to share the file with. But then they’re abandoning all control of their security.

Clearly neither scenario is ideal for NuCypher’s target customers — with their vast lakes of sensitive, highly regulated data. This is where NuCypher reckons proxy re-encryption can step in to offer an edge.

“What I can do with proxy re-encryption that’s much more elegant and secure than either of those alternatives is I can basically delegate access to my encrypted data to someone else’s public key,” he adds.

The platform creates a re-encryption token off of the public key of the entity with whom its customers wants to share data. That token can then be uploaded to the cloud where the third party can access it — in turn enabling them to decrypt and access the data.

Wilkison says re-encrypted tokens can be created and used to delegate access to “as many people as I like.”

Ensuring compliance with regulations around the processing of sensitive data — data such as a bank or healthcare company might hold — is one key selling point for the platform.

He points to a regulation like HIPAA, which sets standards for protecting healthcare data, as one example where a lot of care is needed when handling data to ensure compliance. He also flags up the European Union’s incoming GDPR (General Data Protection Regulation), which ramps up penalties for violations of rules on processing citizens’ personal data, as another instance of data-centric laws creating data processing pain-points that NuCypher’s platform is setting out to fix.

Other target data-laden industries could include telecoms and insurance, though the team has kicked-off focusing on financial services, and the current pilot phase of the platform is with “major banks.”

Wilkison says there are essentially three main use-cases for the platform:

  • “cloud enablement” — so giving target customers a way to move their on-premise Hadoop big data workloads to the public cloud and make use of services like AWS, particularly for “burst or transient workloads.” “What we do there is give them a way to keep their encryption keys in their own data centers, under their control so they can use the crowd to store and process data but they don’t necessarily have to trust the crowd with their encryption keys,” he adds.
  • “regulatory compliance” — currently NuCypher is working with customers in the U.S. and Europe needing to comply with regulations such as HIPAA, PCI, GDPR and PSD2.
  • “secure sharing of sensitive encrypted data” — with multiple third parties, be it a customer, partner, supplier or even a regulator. On this he also notes one of the benefits is that the system segregates the data and the encryption keys — which means, for example, a regulator could not subpoena the cloud provider in order to get their hands on the decrypted data.”It’s very important, particularly in financial services, for customers to have that segmentation between the data and the keys,” he adds.

Another benefit he notes is that NuCypher’s proxy re-encryption technology enables it to give customers the ability to manage access controls without needing to provide full access to the data — which means it can remove any single point of failure (i.e. via an admin who has to have full access control to all of the data).

“With NuCypher a hacker would have to hack into each node individually in order to get all the data,” he adds.

Given the complexities of the technology, customer education is clearly one of the big challenges, with Wilkison saying this boils down to explaining how the approach differs from standard encryption.

And on that front, he says one selling point for the platform is that the proxy re-encryption tech works with NIST standardized encryption algorithms. Which means NuCypher customers don’t have to abandon the tried and tested encryption algorithms they’re comfortable using, such as AES-256, in order to make use of the tech.

“That was one of the pieces that we added that took a pretty significant amount of research to develop for us — to get proxy re-encryption to work with things like ECIES, which is a standard elliptic curve, NIST-certified,” he notes. “So we can go to a customer and say, everything that we’re doing on a crypto level is very standardized, very well understood by industry. So they’re not having to rely on newly rolled crypto.”

NuCypher’s platform exists as an SDK and an encryption library, so its business model is licensing the software — it’s not hosting any data itself, confirms Wilkison; customers can install the software on premise, such as within an existing Hadoop deployment, or directly in the cloud on the infrastructure they’re managing.

Funding-wise, the team has raised a $750,000 seed round to date, from Valley investors including Base Ventures, NewGen Capital and some angels. It also went through Y Combinator last summer. Wilkison says it will be looking to raise again in Q3 this year.

How big do they reckon this market is? Wilkison says he’s hoping the current six to seven pilot customers of NuCypher will turn into “high double digit” or maybe “low triple digits” in a year’s time. But with those target large enterprises typically spending vast amounts of money on securely storing the sensitive data they’re entrusted with, there’s also a very sizeable incentive for them to shift some of that compute load into the cloud. And, potentially, a lot of money at stake if NuCypher can convince them to buy in.

[gallery ids="1490532,1490543,1490542,1490539,1490537,1490533"]

Judges Q&A

Q: Can you talk a bit more about how far along you are with some of the early clients?
A: We’re in pilot stage right now. The bulk of our early customers are in financial services. We’re starting to get traction in healthcare and telcos as well. Pilot phase at this stage.

Q: Tell me a bit more on the competition
A: There’s a couple of ways to look at this. One: the platforms that we support do have some native data protection built in. So Hadoop for example. These tend not to be robust enough for the types of enterprise customers that we’re working with. Other alternatives include data masking and tokenization. HP Voltage for example.

Q: You worked before at Morgan Stanley. Why did you leave a steady job with nice salary and Wall Street and went into this kind of adventure?
A: Ultimately I wanted to get back to a more technical role, and actually start building a product in a company again – as opposed to building financial models and pitch decks

Q: And this is the actually launching of the product?
A: We’re launching the open source version. We’ve had Hadoop available for a while. And then Kafka is launching as well

Q: What did your mother say when you told her that you were leaving Morgan Stanley for this adventure?
A: She was supportive. Although maybe didn’t quite understand what we were doing

Q: Can you tell me more about the implementation? What does it look like as you deploy to enterprise – how do you get all of their existing data encrypted and how do you do key management?
A: On the key management side we actually integrate with hardware security modules – so at lots of banks we use HSM from vendors like Thales or SafeNet.

For Hadoop we encrypt at the HFS layer. And everything is transparent to applications running on top of Hadoop, so it doesn’t change the experience for someone running Hive queries for example.

And we also integrate with access control tools like Ranger and Sentry. So people can keep using the standard tools that they use.

Q: Is your business a classic SaaS model?
A: We’re not hosting anything. It’s not software as a service. We have term-based subscriptions, and then also a consumption-based model for cloud deployments.

Q: How do you intend to go to market? Sales force? direct sales?
A: Some combination of direct sales, which we’ve done today, and then also the channel partners and big data vendors… and the cloud service providers as well, folks like Amazon and Microsoft.

Q: Who are your main competitors?
A: The data masking and tokenization companies are the one we run into most regularly. Voltage which is now part of HP. In Europe we see a company called Protegrity pretty frequently. And then as I mentioned before a lot of the underlying platforms will have some sort of protection tools natively.

Q: Do you run into people like Ciphercloud or Ionic?
A: Not so much anymore. We’re similar in some ways to them… we’re more focused on infrastructure like Hadoop and data platforms

Q: How many people are you now?
A: We’re the two founders and then seven people total on the team

Q: And how much money did you raise?
A: We’ve raised $750K so far from Y Combinator, NewGen Capital and Base Ventures

Q: How long ago?
A: Last fall

Q: How hard would it be for your competitors to replicate the work that you’ve done?
A: Certainly it’s a lot easier now that it’s open source… That said we do have an open core approach so we have certain enterprise features that are still proprietary that are only available in the enterprise version. Additionally if the Hadoop vendors integrated what we’re doing natively into Hadoop that’s still just for Hadoop.

So NuCypher’s meant to be layered, it sits across all of the organization’s big data platforms. Right now they’ve use Hadoop, Kafka, Spark. In the future that could include some new SQL databases, and potentially structured databases as well

Q: Judging from your experience with your colleague how do you compare the American level of mathematics and physics to the Russian one?
A: The American approach is lacking. I’m hugely impressed. Not only is my co-founder Russian educated, and Russian born, a lot of our engineers are as well, so we’ve been very fortunate in that regard