March 20, 2018

What’s new with BlockSci, Princeton’s blockchain analysis tool

Six months ago we released the initial version of BlockSci, a fast and expressive tool to analyze public blockchains. In the accompanying paper we explained how we used it to answer scientific questions about security, privacy, miner behavior, and economics using blockchain data. BlockSci has a number of other applications including forensics and as an educational tool.

Since then we’ve heard from a number of researchers and developers who’ve found it useful, and there’s already a published paper on ransomware that has made use of it. We’re grateful for the pull requests and bug reports on GitHub from the community. We’ve also used it to deep-dive into some of the strange corners of blockchain data. We’ve made enhancements including a 5x speed improvement over the initial version (which was already several hundred times faster than previous tools).

Today we’re happy to announce BlockSci 0.4.5, which has a large number of feature enhancements and bug fixes. As just one example, Bitcoin’s SegWit update introduces the concept of addresses that have different representations but are equivalent; tools such as are confused by this and return incorrect (or at least unexpected) values for the balance held by such addresses. BlockSci handles these nuances correctly. We think BlockSci is now ready for serious use, although it is still beta software. Here are a number of ideas on how you can use it in your projects or contribute to its development.

We plan to release talks and tutorials on BlockSci, and improve its documentation. I’ll give a brief talk about it at the MIT Bitcoin Expo this Saturday; then Harry Kalodner and Malte Möser will join me for a BlockSci tutorial/workshop at MIT on Monday, March 19, organized by the Digital Currency Initiative and Fidelity Labs. Videos of both events will be available.

We now have two priorities for the development of BlockSci. The first is to make it possible to implement almost all analyses in Python with the speed of C++. To enable this we are building a function composition interface to automatically translate Python to C++. The second is to better support graph queries and improved clustering of the transaction graph. We’ve teamed up with our colleagues in the theoretical computer science group to adapt sophisticated graph clustering algorithms to blockchain data. If this effort succeeds, it will be a foundational part of how we understand blockchains, just as PageRank is a fundamental part of how we understand the structure of the web. Stay tuned!

New Jersey Takes Up Net Neutrality: A Summary, and My Experiences as a Witness

On Monday afternoon, I testified before the New Jersey State Assembly Committee on Science, Technology, and Innovation, which is chaired by Assemblyman Andrew Zwicker, who also happens to represent Princeton’s district.

On the committee agenda were three bills related to net neutrality.

Let’s quickly review the recent events. In December 2017, the Federal Communications Commission (FCC) recently rolled back the now-famous 2015 Open Internet Order, which required Internet service providers (ISPs) to abide by several so-called “bright line” rules, which can be summarized as (1) no blocking lawful Internet traffic; (2) no throttling or degrading the performance of lawful Internet traffic; (3) no paid prioritization of one type of traffic over another; (4) transparency about network management practices that may affect the forwarding of traffic.  In addition to these rules, the FCC order also re-classified Internet service as a “Title II” telecommunications service—placing it under the jurisdiction of the FCC’s rulemaking authority—overturning the previous “Title I” information services classification that ISPs previously enjoyed.

The distinction of Title I vs. Title II classification is nuanced and complicated, as I’ve previously discussed. Re-classification of ISPs as a Title II service certainly comes with a host of complicated regulatory strings attached.  It also places the ISPs in a different regulatory regime than the content providers (e.g., Google, Facebook, Amazon, Netflix).

The rollback of the Open Internet Order reverted not only the ISPs’ classification of Title II service, but also the four “bright line rules”. In response, many states have recently been considering and enacting their own net neutrality legislation, including Washington, Oregon, California, and now New Jersey. Generally speaking, these state laws are far less complicated than the original FCC order. They typically involve re-instating the FCC’s bright-line rules, but entirely avoid the question of Title II classification.

On Monday, the New Jersey State Assembly considered three bills relating to net neutrality. Essentially, all three bills amount to providing financial and other incentives to ISPs to abide by the bright line rules.  The bills require ISPs to follow the bright line rules as a condition for:

  1.  securing any contract with the state government (which can often be a significant revenue source);
  2. gaining access to utility poles (which is necessary for deploying infrastructure);
  3. municipal consent (which is required to occupy a city’s right-of-way).

I testified at the hearing, and I also submitted written testimony, which you can read here. This was my first experience testifying before a legislative committee; it was an interesting and rewarding experience.  Below, I’ll briefly summarize the hearing and my testimony (particularly in the context of the other testifying witnesses), as well as my experience as a testifying witness (complete with some lessons learned).

My Testimony

Before I wrote my testimony, I thought hard about what a computer scientist with my expertise could bring to the table as a testifying expert. I focused my testimony on three points:

  • No blocking and no throttling are technically simple to implement. One of the arguments that those opposed to the legislation are making is that different state laws on blocking and throttling could become exceedingly difficult to implement, particularly if each state has its own laws. In short, the argument is that state laws could create a complex regulatory “patchwork” that is burdensome to implement. If we were considering a version of the several-hundred-page FCC’s Open Internet Order in each state, I might tend to agree. But, the New Jersey laws are simple and concise: each law is only a couple of pages. The laws basically say “don’t block or throttle lawful content”. There are clear carve-outs for illegal traffic, attack traffic, and so forth. My comments essentially focused on the simplicity of implementation, and that we need not fear a patchwork of laws if the default is a simple rule that simply prevents blocking or throttling. In my oral testimony, I added (mostly for color) that the Internet, by the way, is already a patchwork of tens of thousands of independently operated networks across hundreds of countries, and that our protocols support carrying Internet traffic over a variety of physical media, from optical networks to wireless networks to carrier pigeon. I also took the opportunity to make the point that, by the way, ISPs are in a relative sense, pretty good actors in this space right now, in contrast to other content providers who have regularly blocked access to content either for anti-competitive reasons, or as a condition for doing business in certain countries.
  • Prioritization can be useful for certain types of traffic, but it is distinct from paid prioritization. Some ISPs have been making arguments recently that prohibiting paid prioritization would prohibit (among other things) the deployment of high-priority emergency services over the Internet. Of course, anyone who has taking an undergraduate networking course will have learned about prioritization (e.g., Weighted Fair Queueing), as well as how prioritization (and even shaping) can improve application performance, by ensuring that interactive, delay-sensitive applications such as gaming are not queued behind lower priority bulk transfers, such as a cloud backup. Yet, prioritization of certain classes of applications over others is a different matter from paid prioritization, whereby one customer might pay an ISP for higher prioritization over competing traffic. I discussed the differences at length.I also talked about how prioritization and paid prioritization could more generally: it’s not just about what a router does, but about who has access to what infrastructure. The bills address “prioritization” merely as a packet scheduling exercise—a router services one queue of packets at a faster rate than another queue. But, there are plenty of other ways that some content can be made to “go faster” than others; one such example is the deployment of content across a so-called Content Delivery Network (CDN)—a distributed network of content caches that are close to users. Some application or content providers may enjoy unfair advantage (“priority”) over others merely by virtue of the infrastructure it has access to. Today’s laws—neither the repealed FCC rules nor the state law—do not say anything about this type of prioritization, which could be applied in anti-competitive ways.Finally, I talked about how prioritization is a bit of a red herring as long as there is spare capacity. Again, in an undergraduate networking course, we talk about resource allocation concepts such as max-min fairness, where every sender gets the capacity they require as long as capacity exceeds total demand. Thus, it is also important to ensure that ISPs and application providers continue to add capacity, both in their networks and at the interconnects between their networks.
  • Transparency is important for consumers, but figuring out exactly what ISPs should expose, in a way that’s meaningful to consumers and not unduly burdensome, is technically challenging. Consumers have a right to know about the service that they are purchasing from their ISP, as well as whether (and how well) that service can support different applications. Disclosure of network management practices and performance certainly makes good sense on the surface, but here the devil is in the details. An ISP could be very specific in disclosure. It could say, for example, that it has deployed a token bucket filter of a certain size, fill rate, and drain rate and detail the places in its network where such mechanisms are deployed. This would constitute a disclosure of a network management practice, but it would be meaningless for consumers. On the other hand, other disclosures might be so vague as to be meaningless; a statement from the ISP that says they might throttle certain types of high volume traffic a times of high demand might not be meaningful in helping a consumer figure out how certain applications might perform. In this sense, paragraph 226 of the Restoring Internet Freedom Order, which talks about consumers’ needs to understand how the network is delivering service for the applications that they care about is spot on. There’s only one problem with that provision: Technically, ISPs would have a hard time doing this without direct access to the client or server side of an application. In short: Transparency is challenging. To be continued.

The Hearing and Vote

The hearing itself was a interesting. There were several testifying witnesses opposing the bills: Jon Leibowitz, from Davis Polk (retained by Internet Service Providers); and a representative from US Telecom. The arguments against the bills were primarily legal and business-oriented. Essentially, the legal argument against the bills is that the states should leave this problem to the federal government. The arguments are (roughly) as follows: (1) The Restoring Internet Freedom Order prevents state pre-emption; (2) The Federal Trade Commission has this well-in-hand, now that ISPs are back in Title I territory (and as former commissioner, Leibowitz would know well the types of authority that the FTC has to bring such cases, as well as many cases they have brought against Google, Facebook, and others); (3) The state laws will create a patchwork of laws and introduce regulatory uncertainty, making it difficult for the ISPs to operate efficiently, and creating uncertainty for future investment.

The arguments in opposition to the bill are orthogonal to the points I made in my own testimony. In particular, I disclaimed any legal expertise on pre-emption. I was, however, able to comment on whether I thought the second and third arguments held water from a technical perspective. While the second point about the FTC authority is mostly a legal question, I understood enough about the FTC act, and the circumstances under which they bring cases, to comment on whether technically the bills in question give consumers more power than they might otherwise have with just the FTC rules in place. My perspective was that they do, although this point is a really interesting case of the muddy distinction between technology and the law: To really dive into arguments around this point, it helps to know a bit about both technology and the law. I was able to comment on the “patchwork” assertion from a technical perspective, as I discussed above.

At the end of the hearing, there was a committee vote on all three bills. It was interesting to see both the voting process, and the commentary that each committee member made with their votes.  In the end, there were two abstentions, with the rest in favor. The members who abstained did so largely on the legal question concerning state pre-emption—perhaps foreshadowing the next round of legal battles.

Lessons Learned

Through this experience, I once again saw the value in having technologists at the table in these forums, where the laws that govern the future of the Internet are being written and decided on. I learned a couple of important lessons, which I’ve briefly summarized below.

My job was to bring technical clarity, not to advocate policy. As a witness, technically I am picking a side. And, in these settings, even when making technical points, one is typically doing so to serve one side of a policy or legal argument. Naturally, given my arguments, I registered for a witness in favor of the legislation.

However, and importantly: that doesn’t mean my job was to advocate policy.  As a technologist, my role as a witness is to explain to the lawmakers technical concepts that can help them make better sense of the various arguments from others in the room. Additionally, I steered clear of rendering legal opinions, and where my comments did rely on legal frameworks, I made it clear that I was not an expert in those matters, but was speaking on technical points within the context of the laws, as I understood them.  Finally, when figuring out how to frame my testimony, I consulted many people: the lawmakers, my colleagues at Princeton, and even the ISPs themselves. In all cases, I asked these stakeholders about the topics I might focus on, as opposed to asking what, specifically I should say. I thought hard about what a computer scientist could bring to the discussion, as well as ensuring that what I said was technically accurate and correct.

A simple technical explanation is of utmost importance. In such a committee hearing, advocates and lobbyists abound (on both sides); technologists are rare. I suspect I was the only technologist in the room. Additionally, most of the people in the room have jobs to make arguments that serve a particular stakeholder.  In doing so, they may muddy the waters, either accidentally or intentionally. To advance their arguments, some people may even say things that are blatantly false (thankfully that didn’t happen on Monday, but I’ve seen it happen in similar forums). Perhaps surprisingly, such discourse can fly by completely unnoticed, because the people in the room—especially the decision-makers—don’t have as deep of an understanding of the technology as the technologists.  Technologists need to be in the room, to shed light and to call foul—and, importantly, to do so using accessible language and examples that non-technical policy-makers can understand.


The Rise of Artificial Intelligence: Brad Smith at Princeton University

What will artificial intelligence mean for society, jobs, and the economy?

Speaking today at Princeton University is Brad Smith, President and Chief Legal Officer of Microsoft. I was in the audience and live-blogged Brad’s talk.

CITP director Ed Felten introduces Brad’s lecture by saying that the tech industry is at a crossroads. With the rise of AI and big data, people have realized that the internet and technology are having a big, long-term effect on many people’s lives. At the same time, we’ve seen increased skepticism about technology and the role of the tech industry in society.

The good news, says Ed, is that plenty of people in the industry are up to the task of explaining what the industry does to cope with these problems in a productive way. What the industry needs now, says Ed, is what Brad offers: a thoughtful approach to the challenges that our society faces, acknowledges the role of tech companies, seeks constructive solutions, and takes responsibility that works across society. If there’s one thing we could to to help the tech industry cope with these questions, says Ed, it would be to clone Brad.

Imagining Artificial Intelligence in Thirty Years

Brad opens by mentioning the new book by his team: The Future Computed Artificial Intelligence and its Role in Society. While writing the book, they realized that it’s not helpful to think about change in the next year or two. Instead, we should be thinking about periods of ten to thirty years.

What was life like twenty years ago? In 1998, people often began their day without anything digital. They would put on a television, listen to the radio, and pull out a calendar. If you needed to call someone, you would use a land phone to reach them. At that time, the single common joke was about whether they could program their VCR machines.

In 2018, the first thing that many people reach for is their phone. Even if you manage to keep your phone in another room, you’ll find yourself reaching for your phone or sitting down in front of your laptop. You now use those devices to find out what happened in the world and with your friends.

What will the world look like in 2038? By that time, Brad argues that we’ll be living with artificial intelligence. Digital assistants are already part of our lives, but they’ll be more common at that time. Rather than looking at lots of apps, we’ll have a digital assistant that will talk to us and tell us what the traffic will be like for us. Twenty years from now, you’ll probably have your digital assistant talking to you as you shave or put on your makeup in the morning.

What is Artificial Intelligence?

To understand what that mean in our lives, we need to understand what artificial intelligence really is. Even today, computers can recognize people, and they can do more – they can make sense of someone’s emotions from their face. We’ve seen the same with the ability of computers to understand language, Brad says. Not only can computers recognize speech, they can also sift through knowledge, make sense of it, and reach conclusions.

In the world today, we read about AI and expect it all to arrive one day, says Brad. That’s not how it’s going to work- AI will become more and more part of our lives in pieces. He tells us about the BMW pedestrian alert, which allows cars to detect pedestrians, beep, signal to the driver, and apply its brakes. Brad also tells us about the Steno app, which records and transcribes. Microsoft now has a version of Skype that detects and auto-translates the conversation– something they’ve now integrated with Powerpoint. Spotify, Netflix, and iTunes all use artificial intelligence to deliver suggestions for the next TV show. None of these systems work with 100% perfection, but neither do human beings.  When asking about an AI system, we need to ask when computers will become as good as a human being.

What advances make AI real? Microsoft Amazon, Google, and others build data centers that are many football fields large in space. This enables companies to gather huge computational power and vast amounts of data. Because algorithms get better with more data, companies have an insatiable appetite for data.

The Challenges of Imagining the Future

All of this is exciting, says Brad, and could deliver huge promise for the world. But we can’t afford to look at this future with uncritical eyes. The world needs to make sense of the risks. As computers behave more like humans, what will that mean for real people? Many people like Stephen Hawking, Elon Musk, and others are warning us about that future. But there is no crystal ball. For a long time, says Brad, I’ve admired futurists, but if a futurist gets something wrong, probably nobody remembers they got it wrong. We may be able to discern patterns, but nobody has a crystal ball.

Learning from The History of the Automobile

How can we think about what may be coming? The first option is to learn from history– not because it repeats itself but because it provides insights. To illustrate this, Brad starts by talking about the transition from horses to automobiles. He shows us a photo of Bertha Benz, whose dowry paid for her husband Karl’s new business. One morning in 1888, she got up and left her husband a note saying that she was taking the car and driving the kids 70 kilometers to visit her mother. Before the day was over, she had to repair the car, but by the end of the day, they had reached her mother’s house. This stunt convinced the world that the automobile would be important to the future.

Next, Brad shows us a photo of New York City in 1905, with streets full of horses and hardly any cars. Twenty years later, there were no horses on the streets. The horse population declined and jobs involved in supporting them disappeared. These direct economic effects weren’t as important as the indirect effects. Consumer credit wasn’t necessarily connected to the automobile, but it was an indirect outcome. Once people wanted to buy cars, they needed a way to finance the cars. Advertising also changed: when people were driving past billboards at speed, advertisers invented logos to make their companies more recognizable.

How Institutions Evolve to Meet Technology & Economic Changes

The effects of the automobile weren’t all good. As the population of horses declined, farmers got smart and grew less hay. They shifted their acre-age to wheat and corn and the prices plummeted. Once the prices plummeted, farmers’ income plummeted. As the farmers fell behind on their loans, the rural banks tried to foreclose them, leading to broad financial collapse. Many of the things we take for granted today come from that experience: the FDIC and insurance regulation, farm subsidies, and many other parts of our infrastructure. With AI, we need to be prepared for changes as substantial.

Understanding the Impact of AI on the Economy

Brad tells us another story about how offices worked. In the 1980s, you handed someone a hand-written document and someone would type it for you. Between the 1980s and today, two big changes happened. First, secretarial staff went on the decline and the professional IT staff was born. Second, people realized that everyone needed to understand how to use computers.

As we think about how work will change, we need to ask what jobs AI will replace. To answer this question, let’s think about what computers can do well: vision, speech, language knowledge. Jobs involving decision-making are already being done by computers (radiology, call centers, fast food orders, auto drivers). Jobs involving translation and learning will also become automated, including machinery inspection and the work of paralegals. At Microsoft, the company used to have multiple people whose job was to inspect fire extinguishers. Now the company has devices that automatically record data on their status, reducing the work involved in maintaining them.

Some jobs are less likely to be replaced by AI, says Brad: anything that requires human understanding and empathy. Nurses, social workers, therapists, and teachers are more likely to be people who will use AI than be replaced by it. This may lead people to take on jobs that they take more satisfaction in doing.

Some of the most exciting developments for AI in the next five years will be in the area of disability. Brad shows us a project called “Seeing AI,” offers an app that describes a person’s surroundings using a phone camera. The app can read barcodes and identify food, identify currency bills, describe a scene, and read text in one’s surroundings. What’s exciting is what it can do for people. The project has already carried out 3 million tasks and it’s getting better and smarter as it goes. This system could be a game changer for people with blindness, says Brad.

Why Ethics Will Be a Growth Area for AI

What jobs will AI create? It’s easier to think about the jobs it will replace than what it will create. When young people in Kindergarten today enter the workplace, he says, the majority of jobs will be ones that don’t yet exist. Some of the new jobs will be ones that support AI to work: computer science, data science, and ethics. “Ultimately, the question is not only what computers *can* do” says Brad, “it’s what computers *should* do.” Under the ethics of AI, the fields of reliability/safety and privacy/security are well developed. Other important areas that are less well developed are research on fairness, inclusiveness. Two issues underly all the rest. Transparency is important because the world needs to know how those systems will work– people need to understand how they work.

AI Accountability and Transparency

Finally, one of the most important questions of our time is: “how do we ensure accountability of machines”- will we ensure that machines will be accountable to people, and will those people be accountable to other people? Only with accountability will be able to

What would it mean to create a hippocratic oath for AI developers? Brad asks: what does it take to train a new generation of people to work on AI with that kind of commitment and principle in mind? These aren’t just questions for people at big tech companies. As companies, governments, universities, and individuals take the building blocks of AI and use them, AI ethics are becoming important to every part of society.

Artificial Intelligence Policy

If we are to stay true to timeless values, says Brad, we need to ask the question about whether we only want ethical people to behave ethically, or everyone to behave ethically? That’s what law does; AI will create new questions for public policy and the evolution of the law. That’s why skilling up for the future isn’t just about science, technology, engineering, and math: as computers behave more like humans, the social sciences and humanities will become even more important. That’s why diversity in the tech industry is also important, says Brad.

How AI is Transforming the Liberal Arts, Engineering, and Agriculture

Brad encourages us to think about disciplines that AI can make more impactful: Ai is changing healthcare (cures for cancer), agriculture (precision farming), accessibility, and our environment. He concludes with two examples. First, Brad talks about the Princeton Geniza Lab, led by Marina Rustow, who are using AI to analyze documents that have been scattered all around the world. Using AI, researchers are joining these digitized fragments. Engineering isn’t only for the engineers– everybody in the liberal arts can benefit from learning a little bit of computer science and data science, and every engineer is going to need some more liberal arts in their future. Brad also  tells us about the AI for Earth project which provides seed funds to researchers who work on the future of the planet. Projects include smart grids in Norway that make energy usage more efficient, a project by the Singaporean government to do smart climate control in buildings, and a project in Tasmania that supports precision farming, saving 30% on irrigation costs.

These examples give us a glimpse on what it means to prepare for an AI powered future, says Brad. We’re also going to need to do more work: we may need a new social contract, because people are going to need to learn new skills, find new career pathways, create new labor rules and protections, and rethink the social safety net as these changes ripple throughout the economy.

Creating the Future of Artificial of Intelligence

Where will AI take us? Brad encourages students to think about the needs of the world and what AI has to offer. It’s going to take a whole generation to think through what AI has to offer and create that future, and he encourages today’s students to sieze that challenge.