Did you think about closing your Java stream? by Romain Manni-Bucau, 2018-09-26

Java streams are awesome to orchestrate processing flows, merge data coming from multiple sources and so on…​however, as any new API it can hide some surprises. Let see why java.util.stream.Stream has a close() method.

How to select the best coder for your data with Apache Beam by Romain Manni-Bucau, 2018-09-19

Apache Beam coder abstraction enables you to switch between implementations without rewriting your pipeline. But how to select your coder? Performance and disk spaces are likely the most important criterias, let’s see how to measure them.

Apache Beam: convert Row structure to an Avro IndexedRecord by Romain Manni-Bucau, 2018-09-12

We previously saw that Beam Row structure allows to write generic transforms but that using its serialization can be a bad bet. To illustrate how to switch between one format to another, we will show in this post how to convert a Row to an IndexedRecord

Apache Beam and Row: a new Big Data record/serialization standard? by Romain Manni-Bucau, 2018-09-05

Handing data you don’t know at compile time is a common concern of processing libraries. Apache Beam can’t ignore that since it allows to build portable pipelines for Big Data engines. Let’s see how they started to solve that concern!

Get Maven dependencies locations without any effort by Romain Manni-Bucau, 2018-08-29

Finding a Maven project dependency location from its coordinates is not always very easy or neat…​.until you know how to do it without any effort ;).