Projekte

Big Data Consultant & Engineer at ASML (since Mar. 2020)

The Project

A lot of data, especially measurements, is created in Litho systems
Data must be transformed and loaded, various KPIs are calculated on different aggregation levels
Product transformation from a monolith to a microservice-based scalable distributed system

My Tasks

Implement basic DWH infrastructure based on Spark 3
Do workshops for tech staff on BigData technologies and tools
Consulting on architecture, performance, and best-practices

Techniques Used

Spark Structured Streaming, Spark SQL, Java 11, Kafka, Parquet
Mesos/Marathon, DC/OS, Docker, MicroServices, Hive
Git, JMS

Big Data Consultant & Engineer for AWS based services (Nov. 2018 – Mar. 2020)

The Project

Data (usage reports) from hundreds of different service providers are delivered
Data from these providers must be converted, restructured, and aggregated
Transformed data is loaded into a Datawarehouse

My Tasks

Develop a Spark program handling the various input formats
Do workshops for tech staff on BigData technologies and tools
Consulting on architecture, performance, and code-style

Techniques Used

Spark 2, Spark SQL, Scala, YARN
AWS (EMR, S3, Lambda, StepFunctions, RDS, CodePipeline)
GitHub

Big Data Consultant & DevOp at Smartclip AG (Apr. 2018 – Aug. 2018)

The Project

Video advertising in TV and internet
Collect data about ad delivery and consumer behaviour
Prepare huge amounts of data for many Third Party ad hoc queries

My Tasks

Find and advise on performance issues in Spark, Flink, and others
Advise on big data technologies
Evaluate Druid, create testcase therefore, advise on Druid usage

Techniques Used

Amazon AWS, Kubernetes, Docker
Spark, Flink, Druid, Zookeeper,
Linux, Jira, Scala

Lead BigData Developer at Merck KGaA (Oct. 2017 – Mar. 2018)

The Project

Design and implement processing steps as base for pipeline construction
Ensure metadata and data lineage is written to and stored in Apache Atlas
Lead and educate two colleagues, advise management on architecture decisions

My Tasks

Team leader and consultant for management and other BigData projects
Software design and architecture for Scala (Spark) programs
Consultant for technical infrastructure for project lifecycle of BigData projects

Techniques Used

Hortonworks HDP 2.5, Apache Atlas, Ranger, HDFS, Oozie
Scala, IntelliJ, SBT, Spark (1.6)
Linux, Windows, BitBucket, Jira, AWS

DevOp at GfK SE online market research (Apr. 2017 – Sept. 2017)

The Project

Data from many different devices (PC, smartphone…) are sent to a collector cloud
The ETL system periodically fetches data, transforms and loads them to HDFS / Hive
Some reports are already generated based on the cleaned data

My Tasks

Take care of all running ETLs
Look for possible problems
Fix bugs in the ETL software

Techniques Used

Cloudera CDH4, CDH5, Hadoop, Hive, Oozie, Pig, HUE
Icinga, Grafana2, Eclipse, Gradle, Maven
Linux, Git/Stash/Bitbucket, Confluence, Bamboo, Scrum

Developer at GfK SE online market research (June 2016 – Mar. 2017)

The Project

Data from different metering platforms have to be loaded into a datalake
Before loading the data has to be enriched, transformed, and partially aggregated, as well as deduplicated
Incoming and outgoing data has to confirm to different schema versions

My Tasks

Design a complex dataflow pipeline in Crunch and Beam
Tune and test the pipeline in a real world scenario
Manage and implement transition from Hadoop to Spark

Techniques Used

Cloudera CDH5, Hadoop, Hive, Oozie, Spark, Crunch, Beam, Kite …
Icinga, Graphite, Eclipse, Gradle
Linux, Git/Stash, Confluence, Bamboo, Scrum

DevOp at GfK SE online market research (Okt. 2015 – May 2016)

The Project

Data from many different devices (PC, smartphone…) are sent to a collector cloud
The ETL system periodically fetches the data, transforms and loads them to HDFS or Hive
Some reports are already generated based on the cleaned data

My Tasks

Take care of all running ETLs
Look for possible problems
Fix bugs in the ETL software

Techniques Used

Cloudera CDH4, Hadoop, Hive, Oozie, Pig
Icinga, Grafana, Eclipse, Gradle
Linux, Git/Stash, Confluence, Bamboo, Scrum

Evaluation, Design and Implementation of RTA Use Cases (May 2015 – Sept. 2015)

The Project

Evaluation of big data streaming systems
Decision support for choosing an RTA platform
Design and build example use cases

My Tasks

Evaluate different big data streaming systems (Spark, Flink, and others)
Design and implement different use cases for evaluation and proof of concept
Design and build a real time analytics platform

Techniques Used

Flink, Kafka, Zookeeper
Java8, Junit, Mockito, Maven, Eclipse
Git, Confluence, JIRA, Scrum

Development of a ERP system (Oct. 2014 – April 2015)

The Project

Create a web application for management of customer contracts
Manage accounting and billing
Support internal workflows

My Tasks

Database design
Create web application for employees

Techniques Used

Java7, Tomcat7, JPA2, Hibernate, Spring4 WebMVC, Spring Data,
DBUnit, Junit,
Maven2

Development of a CRM system (Oct. 2014 – May 2015)

The Project

Import, integrate and analyse data of customers
Create a web application for customers and employees
For a fincancial institution

My Tasks

Database design, import of CSV data, create integrated view on data
Design and Implement web applications
Create reports for controlling

Techniques Used

PHP 5.4, MySQL5, Java7, SQL

Stratosphere (research assistant) (2009 – 2014)

The Project

Big Data analytics system (now Apache Flink) in the Cloud
massively parallel data analysis, comparable Apache Hadoop
complex ad hoc analysis programs on very large data sets

My Tasks

developed different components in the database core
successful headed, designed and developed a meta data collection framework
developed extensible, high-performance modules combining harmoniously query execution and metadata (esp. statistics) collection at once
designed a distributed store with a central indexing component for very fast access to the metadata
planned and coordinated the work of 6 students working on the same project

Techniques Used

Java 6 and Java 7, Maven, Jenkins
Dataflow Language (Meteor), JSON
SQL, Hadoop
Libraries such as Kryo

Teacher Database Principles & Big Data Systems (2010 – 2014)

The Project

Educated students in architecture, designing, developing and programming databases and languages as SQL, Meteor, Hive, PigLatin, AQL, and so on
Gave excellent talks presenting Big Data systems such as Stratosphere, AsterixDB, Hadoop and others

My Tasks

successfully managed the course and teached students in „Principles of Database Systems“
managed the course and teached students in „Big Data Systems“
managed the course and teached students „Map/Reduce“

Techniques Used

entity relationship models, relational algebra, SQL, JDBC
Stratosphere, AsterixDB, Hadoop, IBM DB2
dataflow languages (Meteor, Hive, PigLatin), JSON, XML

Datawarehouse in a private insurance company (2007 – 2008)

The Project

insurance company introduced a new data warehouse system
had to re-develop highly complex SQL queries (now multitenant)
computed key performance indicators for the company

My Tasks

developed efficient and highly complex SQL queries over very big data
multitenant, robust

Techniques Used

SQL
MS Excel, MS Access

Java Developer in the financial sector (2001 – 2007)

The Project

Web based information system for cash logistic
generated cost-optimal proposals for filling ATMs with cash
complete handling of cash refilling orders for ATMs (create, edit, issue, monitor and close up)

My Tasks

refactored a backend modul computing proposals for filling ATMs with cash
extended, tuned, and modularised the backend modul
involved in the whole software development cycle

Techniques Used

Java, SQL, Ant
Hibernate, Struts
DB-Design, DB-Tuning, programming triggers in PL/SQL on Oracle