keynote: Kelsey Hightower (Google, Inc)Kelsey Hightower has worn every hat possible throughout his career in tech, and enjoys leadership roles focused on making things happen and shipping software. Kelsey is a strong open source advocate focused on building simple tools that make people smile. When he is not slinging Go code, you can catch him giving technical workshops covering everything from programming and system administration.
Abstract:Redefining the cloud one container at a time.
Distributed systems have gone mainstream and are redefining the way we think about building and deploying applications. The container represents a modern abstraction for applications and provides the necessary constraints for building robust and self-aware systems on top of the core tenants of distributed computing. In the future the virtual machine will disappear and a logical computer will emerge that stretches across the datacenter and plugs into the global network over high speed links. Some will say this future is more than a decade away, but what if I told you it exists today and it's source code was available on GitHub?
Bart Samwel (Google, Inc.)Bart Samwel is one of the lead developers on the F1 team at Google. His primary focus is on the SQL query engine. Before joining Google he has dabbled in a variety of CS research areas ranging from tree automata to active networking. Working as a software engineer in industry he has worked on systems ranging from automated business correspondence to fourth-generation RAD environments and GIS systems.
Abstract:F1 - The Distributed SQL Database Supporting Google's Ad Business
Large scale internet operations such as Google, Facebook, and Amazon manage amazing amounts of data. Doing so requires databases that are distributed across multiple servers or even multiple data centers, with high throughput, strong latency requirements, "five nines" of availability, and often with strict data consistency requirements. This talk starts by introducing relational SQL databases, NoSQL databases, and the current state of the art in such databases as deployed in industry. It then provides an introduction to Google F1, a SQL database based on Google's Spanner distributed storage system. F1 is used to store the data for AdWords, Google's search advertising product. F1 and Spanner represent a new, hybrid approach to distributed databases that combines the scalability and availability of NoSQL storage systems like Google's Bigtable and Amazon's DynamoDB, with the convenience and consistency guarantees provided by traditional SQL relational databases.
Scott Feinberg (The New York Times)Scott is the API Architect for the New York Times where he works on ways to improve microservices and APIs.
Abstract:Innovating Journalism at Scale
The New York Times has been innovating journalism with technology for the past 164 years. Since our first website launched in 1996, we’ve faced the unique challenges of running a high-traffic website from the moment we launched, something incredibly challenging for a company where digital started off as an afterthought. Without the open source tools available today, this required building web servers, ad networks, and caches all from scratch. The Times has been around a long time and will be around for the foreseeable future. This presents a unique problem-how do you build software to last decades? Any piece of software we build is immediately in the hands of hundreds of thousands of users and we have core pieces of our architecture that have lasted 10, even 20 years. We’ll talk about some of the mistakes we’ve made, the huge diversity of tools that go into the New York Times, and what we’ve done and are doing to continue to move fast while innovating journalism at scale. We’ll look how we’re using microservices, continuous delivery, and API management to make our 164 year old newsgathering machine continue to “enhance society by creating, collecting and distributing high-quality news and information” for the next century.
Dustin Whittle (AppDynamics)Dustin Whittle is a Developer Evangelist at AppDynamics where he focuses on helping organizations manage application performance. Before joining AppDynamics, Dustin was CTO at Kwarter, a consultant at SensioLabs, and developer evangelist at Yahoo!. He has experience building and leading engineering teams and working with developers and partners to drive platform adoption. When Dustin isn't working he enjoys flying, sailing, diving, golfing, and travelling around the world. Find out more at dustinwhittle.com or follow him @dustinwhittle.
Abstract:Performance Testing Crash Course
The performance of your application affects your business more than you might think. Top engineering organizations think of performance not as a nice-to-have, but as a crucial feature of their product. Those organizations understand that performance has a direct impact on user experience and, ultimately, their bottom line. Unfortunately, most engineering teams do not regularly test the performance and scalability of their infrastructure. Dustin Whittle shares the latest performance testing tools and insights into why your team should add performance testing to the development process. Learn how to evaluate performance and scalability on the server-side and the client-side with tools like Siege, Bees with Machine Guns, Google PageSpeed, WBench, and more. Take back an understanding of how to automate performance and load testing and evaluate the impact it has on performance and your business.
Tugdual Grall (MapR)Tugdual Grall is a Technical Evangelist at MapR, an open source advocate and a passionate developer. He currently works with the European developer communities to ease MapR, Hadoop and NoSQL adoption. Before joining MapR, Tug was Technical Evangelist at MongoDB and Couchbase. Tug has also worked as CTO at eXo Plaform and JavaEE product manager, and software engineer at Oracle. Tugdual is Co-Founder of the Nantes JUG (Java User Group) that holds since 2008 monthly meeting about Java ecosystem. Tugdual also writes a blog available at https://tgrall.github.io/
Abstract:What and Why and How: Apache Drill !
The 1.1 release of Apache Drill does SQL on Hadoop, but with some big differences. The biggest difference is that Drill changes SQL from a strongly typed language into a late binding language without losing performance. This allows Drill to process complex structured data in addition to relational data. By dynamically generating code that matches the data types and structures observed in the data, Drill can be both agile as well as very fast. Drill also introduces a view-based security model that uses file-system permissions to control access to data at an extremely fine-grained level that makes secure access easy to control. These changes have huge practical impact when it comes to writing real applications. I will give several practical examples of how Drill makes it easier to analyze data, using SQL from your Java application using a simple JDBC driver.
Chris Armstrong (Deis)Chris Armstrong is a Deis open-source maintainer and leads the engineering team that works on Deis full-time. Previously, he built a SaaS company and wore the pager to support production infrastructure at an open data company. Infrastructure as code gives him warm fuzzy feelings.
Abstract:Deis: A distributed, highly-available PaaS built on CoreOS and Docker
Deis is an open-source Platform-as-a-Service inspired by Heroku and powered by CoreOS, Docker, and Ceph. Chris will demo Deis and discuss how it uses CoreOS and Docker to enable developers to deploy their applications to cloud providers or bare metal with just a 'git push'. This talk will also delve into the Deis internals, showing how we achieve a highly-available distributed platform.
Dominik Rüttimann (Protogrid)Dominik Rüttimann is a software developer in the higly motivated and qualified Cloud Innovation Team of ATEGRA AG, Switzerland. He studied computer engineering at ETH Zurich. In his work as a Lotus Notes software engineer he got to know and appreciate an early blueprint of the NoSQL philosophy. As a mobile product developer, it’s his great pleasure to transfer these ideas from the NoSQL world into the architecture of distributed mobile applications. Since 2014 he is responsible for the mobile strategy of the Protogrid development environment, the subject of this talk.
Abstract:Cloud Apps - Running Fully Distributed on Mobile Devices
The move towards the cloud and towards low powered mobile devices has led to a decline of rich clients, i.e. most apps today are heavily dependent on their home server and thus a stable Internet connection. There are some approaches for local caching of data, however the concrete synchronization mechanism is often intransparent or difficult to control for users. Developers usually need to use different data structures on mobile devices and their cloud servers, thus leading to a multiplication of the code base involved. Meanwhile mobile chip architectures have caught up in performance such that the heavy reliance on the server has become more and more questionable. As an answer to this discrepancy, the Cloud Innovation Team of ATEGRA AG has been developing a fully distributed PaaS called Protogrid. It is based on CouchDB Servers located in the cloud, on premise and even running directly on mobile devices. The Protogrid development environment supports Rapid Application Development, such that a workflow application can be created and deployed offhandedly in a few minutes. Since all client logic is completely independent of the database schema, adaption to new requirements during operation is no issue. All Protogrid Apps can be deployed on various platforms without any additional effort and they are usable offline with no loss of functionality. In particular, this talk will cover experiences and earned knowledge during the implementation of Couchbase Lite and CouchDB replication on mobile clients as well as our innovative approach regarding the database schema in a NoSQL context.
Feike Steenbergen (Zalando)Feike Steenbergen has been a Database Engineer at Zalando since September 2014—working on administrating PostgreSQL clusters, performance analysis, and designing and building a high available Postgres cluster. He also designs and gives trainings on Postgres. Prior to Zalando he was a database administrator working in Amsterdam and Groningen.
Abstract:Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER
In recent years Zalando has adopted a decentralized setup for applications and databases. This has impacted our database engineers by transferring responsibility to small teams, each of which manages its own infrastructure. Decentralization is great for team autonomy, but can present challenges in terms of how to easily manage lots of PostgreSQL clusters. That’s why our team created Spilo: an open source HA-cluster (highly available PostgreSQL cluster). This talk will show how Spilo simplifies Postgres cluster management while preserving team autonomy. By building upon streaming replication, Spilo provides a set of clusters that require no human interaction for many administration tasks and most failure scenarios; takes care of managing the number of servers (adding and removing them); and creates backups. It implements our own version of Patroni (https://github.com/zalando/patroni): a process, derived from Compose’s Governor, that governs the Postgres cluster (promoting and demoting) and updates information in etcd (the distributed consensus key/value store created by CoreOS). I’ll explore the architecture of Patroni implemented with Spilo; a live demo will show some failovers as they occur. Finally, I’ll show how Spilo combines Patroni with cloud infrastructure architecture components (for example, AWS), adding autoscaling to run a HA-cluster and allowing AWS power users to create a new HA-cluster with very little effort. Spilo relies upon STUPS, Zalando’s open-source platform as a service (PaaS) for enabling multiple, autonomous teams to use AWS while remaining audit-compliant. By using Spilo and STUPS together, our engineers can create a new HA-cluster with just a few commands. After attending this talk the audience will understand how they can also use Spilo, Patroni and STUPS to manage their Postgres clusters more efficiently while working autonomously.
Héctor Férnandez (Giant Swarm)Héctor Fernández is a DevOps at Giant Swarm, a German startup that offers a Simple Microservices Infrastructure to host microservices applications. At Giant Swarm, Héctor focused on monitoring, performance and on improving Giant Swarm's infrastructure. During the last years, he worked as a DevOps at ElasticBox Inc. and as a postdoctoral research scientist member of the Software and Services Research and High Performance Computing groups at VU University Amsterdam. Héctor received his PhD in Computer Science from University of Rennes 1 in 2012 in the area of service-oriented computing in distributed infrastructures. Héctor’s current interests include microservices, cloud computing, monitoring and high performance in large scale infrastructures.
Abstract:Monitoring a Microservices Infratructure
With the success of Linux containers, more and more organizations have started to pay special attention to a new architectural model for the application development. This model is known as Microservices and describes the development of a single application as a suite of small services that run independently in containers. At Giant Swarm, we provide an infrastructure that enable users to apply this new model and easily manage their microservice-based applications. Nevertheless the use of this new architectural model to build and offer an infrastructure for others does not come for free. The immaturity and unknown of the technologies involved and this new conceptual pattern of building applications force the utilization of new/adapted monitoring mechanisms. In Giant Swarm, we think that an efficient monitoring system is the key to quickly understand this new model and to offer a great user experience. Our monitoring infrastructure uses existing systems for monitoring, reporting, alerting, data analysis which have been adapted to be used in a microservices infrastructure.
Lucian Precup (Adelean)Lucian studied distributed systems at University. Furthermore, by developing a virtual database on top of distributed environments, he was able to enhance his expertise on distributed algorithms. Lucian worked with the French National Institute for Research in Computer Science and Control (INRIA), Business Objects and SAP where he developed real time data integration software. He is currently CTO of Adelean (https://adelean.com) and delivers search engines, nosql and big data solutions for e-commerce, banking, and insurance.
Abstract:Joins in a distributed world
A lot of database related algorithms are more difficult to implement in a distributed environment. Quite often, the "distributed" version is far from the "classical" version : constraints are dropped (see the CAP theorem), only specific cases are supported (for example : the involved data needs to be co-located within the distributed system), etc. This talk focuses on joins. We start by presenting join implementations in "classical" relational databases than we lead the audience through the challenges and solutions to make these functions available in a distributed environment. While we start with a theoretical point of view, we finish by giving real life examples from implementations in ETL systems (known for joining heterogeneous databases and therefore quite advanced in this area, but often not real-time) and some modern NoSQL databases (where most systems choose to offer less features with respect to joins).
Abstract:NoSQL meets Microservices
Just a few years ago all software systems were designed to be monoliths running on a single big and powerful machine. But nowadays most companies desire to scale out instead of scaling up, because it is much easier to buy or rent a large cluster of commodity hardware then to get a single machine that is powerful enough. In the database area scaling out is realized by utilizing a combination of polyglot persistence and sharding of data. On the application level scaling out is realized by microservices. In this talk I will briefly introduce the concepts and ideas of microservices and discuss their benefits and drawbacks. Afterwards I will focus on the point of intersection of a microservice based application talking to one or many NoSQL databases. We will try and find answers to these questions: Are the differences to a monolithic application? How to scale the whole system properly? What about polyglot persistence? Is there a data-centric way to split microservices?
Pablo Chacin (Sensefields)I'm a practitioner and researcher in large-scale distributed systems, Service Oriented Architectures and Cloud Computing, with more than 20 years of experience in the industry and academy. I received a PhD from the Polytechnic University of Catalonia and have participated in several European research projects. I'm a member and contributor of IASA, the International association of software architects. I'm presently the CTO at Sensefields, a provider of traffic management solutions based on wireless sensor networks.
Abstract:A Tale of two microservices
Microservices is one of the hottest technology trends nowadays, and also one that brings the more vehement arguments regarding of what a microservice is and what's the proper way to implement them. Underlying these apparently irreconcilable arguments there is a fundamental difference in terms of the philosophy and architecture of microservices: should they be coarse level business components exposing an API or should they actually integrate all layers up to the UI? Should they connect by api call or should be event based? In this talk we explore these two approaches, their differences and also the common concepts with the intention to clarify them and help in making sound design dessions.
Pierre Bittner (Scaled Risk)Pierre is CTO of Scaled Risk, the next generation of software for a new banking industry. With 15 years of experience in IT, Pierre is leading successfully for over 8 years strategic projects for large banking groups in the specific field of Capital Markets. He relies on his high expertise of banking information systems but also in innovative technology to answer efficiently to user's needs. Pierre graduated from Lille 1 University – Science and Technology in 2000.
Abstract:NoSQL in Financial Industry
Since his creation by Yahoo! in 2006 for web search, implementations of Hadoop never stop to evolve with nowadays strong focus on stream processing and real-time analytics. Scaled Risk aims to accelerate the adoption of Hadoop in Finance Industry. This talk explains how we leverage on HBase to respond to Capital Market specific requirements: - handling structured representation of trades with many fast evolving models which leads us to the conception of a Dynamic Data Schema - low latency message bus for extremely fast trading analytics, - data coherency and process repeatability in an externally-consistent distributed systems supported by as-of-date and versioning mechanisms for regulatory requirements.
Rob Haswell (ClusterHQ)Rob Haswell is founder and VP Product of ClusterHQ, the Container Data People. He has 12 years of experience in building distributed systems, administering advanced server deployments and using distributed storage. Inspired by the practical operational problems faced running web apps at scale, he started ClusterHQ, with an aim to become the standard in container data management.
Abstract:Running database containers using Marathon and Flocker
As microservices become more and more popular - we are encouraged to choose the right database for the job, resulting in an increase in the number of database processes in the cluster. Wouldn't it be great if we could use a Marathon manifest for our entire application including these stateful database processes. The problem is that when a database process writes to disk, it turns that server into a pet where it was cattle before. This talk will introduce Flocker, talk about Docker plugins and finally demonstrate the two working together to acheive the seamless scheduling and migration of stateful database containers using Marathon.
Rotem Hermon (Gigya)Rotem Hermon is VP Architecture at Gigya, and has been building and designing back-end systems for a long time now.
The actor model is a novel approach to writing concurrent software. It is based on the concept of small computational units communicating through asynchronous message passing, thus allowing concurrency and scalability while negating a lot of the problems of concurrent programming. Though the actor model got some adoption with the Erlang language and the Akka framework, it remained a rather niche approach and has not become a commonly used practice. But this may be changing now with the introduction of “Virtual Actors” - a new abstraction for writing distributed applications. This abstraction was introduced with the Orleans framework by Microsoft and adopted to Java by EA with their Orbit framework. This talk will include a short introduction to the actor model. We will then explore the Virtual Actors model, how it’s different from the classic model, and why it makes distributed application programming a lot simpler.
Seán C McCord (CyCore Systems, Inc)Seán C McCord has been implementing clustered Linux systems commercially since kernel version 1.3. Since 2004, he has run CyCore Systems, a boutique consulting agency out of Atlanta, which designs multi-realm, complex software systems for businesses of all sizes.
Abstract:Data persistence in containerspace
Distributing computing resources is only part of the problem. With the Container movement, managing persistent data is an increasingly important consideration of system engineering. Many new technologies work in tandem with your new container-oriented cluster. This talk will introduce some of the options for effectively managing persistent data, with insights for practical implementation for architects, developers, and devops.
Susan Potter (Lookout)Susan is a distributed systems software engineer straddling technical operations and engineering to help make data and service infrastructure more operationally manageable at scale. Over the last sixteen years Susan has worked on large scale trading systems, multi-tenant service oriented architectures, continuous deployment and "big data" analytics products. Most recently Susan has been working at Lookout build a service delivery pipeline that produces testable, veriable infrastructures in AWS using a declarative and functional stack (Nix and Haskell).
Technical operations is plagued with an unhealthy infatuation of typically untested, imperative code with a high reliance on shared mutable state using dynamically typed languages such as Ruby, Python, Bash, and - ugh - remember Perl? :) In an age where building reliable infrastructure to elastically scale applications and services are paramount to business success, we need to start rethinking the infrastructure engineer’s toolkit and guiding principles. This talk will take a look at applying various functional techniques to building and automating infrastructure. From functional package management and congruent configuration to declarative cloud provisioning we’ll see just how practical these techniques typically used in functional programming for applications can be used to help build more robust and predictable infrastructures. While specific code examples will be given, the emphasis of the talk will be on guiding principles and functional design.
Uwe Friedrichsen (codecentric AG)Uwe Friedrichsen travels the IT world for many years. As a fellow of codecentric AG he is always in search of innovative ideas and concepts. His current focus areas are resilience, scalability and the IT of (the day after) tomorrow. Often, you can find him on conferences sharing his ideas, or as author of articles, blog posts, tweets and more.
Abstract:Microservices - stress-free and without increased heart-attack risk
A microservice is written quickly: Reasonable scope, a small REST interface, nice and easy and way lot cooler than those fat web applications we did before. But, is it really that easy? Well - no, not really! A single service is quite easy to manage, but therfrom the overall complexity does not go away. Instead of a few big web applications we now have lots of microservices - and to make sure that integration, operations und maintenance will not become a lottery game with increased heart-attack risk, it is crucial to consider a few things, that were not (so) important for traditional web applications. Should I use REST or would event driven be the better choice? How can I make sure the service collaboration works as desired? With GUI or better without GUI? How can I guarantee availability and scalability in production? How to deploy best? How I can I make sure that services are easily replaceable? How can I avoid service spaghetti? Those and many more questions will be answered in this session - to make sure the encounter with microservices will not become a health risk.
Stefan EdlichProf. Dr. Stefan Edlich is a senior lecturer at the Beuth University of Applied Science Berlin. He wrote two of the world’s first NoSQL books and twelve other IT books for publishers as Apress, O’Reilly, Spektrum/Elsevier, Hanser and others. In 2008 he created the ICOODB conferences series that run in Berlin, ETH Zürich and Frankfurt. Finally, he runs the NoSQL Archive and organizes NoSQL Events worldwide. The variety of topics that surrounds the work of Stefan Edlich makes him the perfect candidate to chair the distributed matters conference Program Committee.
Marc PlanagumàMarc Planagumà is lead data engineer and researcher at Barcelona Digital Technology Center (BDigital). His current focus of work is geospatial and near-realtime analytics on mobility and smart city scenarios. He has a solid experience in building scalable systems for massive data processing research projects, dealing with cutting-edge challenges in knowledge engineering, data-mining, machine-learning and recommender systems. Marc is a NoSQL enthusiast used to deal with highly complex data-driven scenarios, able to design and deploy polyglot persistence environments supporting challenging requirements such as highly scalable GIS capabilities, optimization of very large time-series for storage and access and graph processing. Marc’s vast experience in the field and the advantage of his close relation to the local audience make him an ideal candidate to chair the distributed matters @Barcelona conference Program Committee.
Frank CellerAs head of Dr. Celler Cologne Lectures, Frank Celler is the host of the distributed matters (previously NoSQL matters) conference series as well as of the NoSQL Cologne User Group. Since 20 years he is working in the field of software business and entered the world of NoSQL more than 13 years ago. Working for different companies he early discovered the potential of high-performance databases. Today he is passionate about promoting the importance of NoSQL to the world. Together with Stefan Edlich and Marc Planagumà, he chairs the distributed matters @Barcelona 2015 Program Committee to select the finest talks for the conference’s agenda.