Projekte

Big Data Consultant & Engineer at ASML (since Mar. 2020)

The Project

  • A lot of data, especially measurements, is created in Litho systems
  • Data must be transformed and loaded, various KPIs are calculated on different aggregation levels
  • Product transformation from a monolith to a microservice-based scalable distributed system

My Tasks

  • Implement basic DWH infrastructure based on Spark 3
  • Do workshops for tech staff on BigData technologies and tools
  • Consulting on architecture, performance, and best-practices

Techniques Used

  • Spark Structured Streaming, Spark SQL, Java 11, Kafka, Parquet
  • Mesos/Marathon, DC/OS, Docker, MicroServices, Hive
  • Git, JMS

 

Big Data Consultant & Engineer for AWS based services (Nov. 2018 – Mar. 2020)

The Project

  • Data (usage reports) from hundreds of different service providers are delivered
  • Data from these providers must be converted, restructured, and aggregated
  • Transformed data is loaded into a Datawarehouse

My Tasks

  • Develop a Spark program handling the various input formats
  • Do workshops for tech staff on BigData technologies and tools
  • Consulting on architecture, performance, and code-style

Techniques Used

  • Spark 2, Spark SQL, Scala, YARN
  • AWS (EMR, S3, Lambda, StepFunctions, RDS, CodePipeline)
  • GitHub

 

Big Data Consultant & DevOp at Smartclip AG (Apr. 2018 – Aug. 2018)

The Project

  • Video advertising in TV and internet
  • Collect data about ad delivery and consumer behaviour
  • Prepare huge amounts of data for many Third Party ad hoc queries

My Tasks

  • Find and advise on performance issues in Spark, Flink, and others
  • Advise on big data technologies
  • Evaluate Druid, create testcase therefore, advise on Druid usage

Techniques Used

  • Amazon AWS, Kubernetes, Docker
  • Spark, Flink, Druid, Zookeeper,
  • Linux, Jira, Scala

 

Lead BigData Developer at Merck KGaA (Oct. 2017 – Mar. 2018)

The Project

  • Design and implement processing steps as base for pipeline construction
  • Ensure metadata and data lineage is written to and stored in Apache Atlas
  • Lead and educate two colleagues, advise management on architecture decisions

My Tasks

  • Team leader and consultant for management and other BigData projects
  • Software design and architecture for Scala (Spark) programs
  • Consultant for technical infrastructure for project lifecycle of BigData projects

Techniques Used

  • Hortonworks HDP 2.5, Apache Atlas, Ranger, HDFS, Oozie
  • Scala, IntelliJ, SBT, Spark (1.6)
  • Linux, Windows, BitBucket, Jira, AWS

 

DevOp at GfK SE online market research (Apr. 2017 – Sept. 2017)

The Project

  • Data from many different devices (PC, smartphone…) are sent to a collector cloud
  • The ETL system periodically fetches data, transforms and loads them to HDFS / Hive
  • Some reports are already generated based on the cleaned data

My Tasks

  • Take care of all running ETLs
  • Look for possible problems
  • Fix bugs in the ETL software

Techniques Used

  • Cloudera CDH4, CDH5, Hadoop, Hive, Oozie, Pig, HUE
  • Icinga, Grafana2, Eclipse, Gradle, Maven
  • Linux, Git/Stash/Bitbucket, Confluence, Bamboo, Scrum

 

Developer at GfK SE online market research (June 2016 – Mar. 2017)

The Project

  • Data from different metering platforms have to be loaded into a datalake
  • Before loading the data has to be enriched, transformed, and partially aggregated, as well as deduplicated
  • Incoming and outgoing data has to confirm to different schema versions

My Tasks

  • Design a complex dataflow pipeline in Crunch and Beam
  • Tune and test the pipeline in a real world scenario
  • Manage and implement transition from Hadoop to Spark

Techniques Used

  • Cloudera CDH5, Hadoop, Hive, Oozie, Spark, Crunch, Beam, Kite …
  • Icinga, Graphite, Eclipse, Gradle
  • Linux, Git/Stash, Confluence, Bamboo, Scrum

 

DevOp at GfK SE online market research (Okt. 2015 – May 2016)

The Project

  • Data from many different devices (PC, smartphone…) are sent to a collector cloud
  • The ETL system periodically fetches the data, transforms and loads them to HDFS or Hive
  • Some reports are already generated based on the cleaned data

My Tasks

  • Take care of all running ETLs
  • Look for possible problems
  • Fix bugs in the ETL software

Techniques Used

  • Cloudera CDH4, Hadoop, Hive, Oozie, Pig
  • Icinga, Grafana, Eclipse, Gradle
  • Linux, Git/Stash, Confluence, Bamboo, Scrum

 

Evaluation, Design and Implementation of RTA Use Cases (May 2015 – Sept. 2015)

The Project

  • Evaluation of big data streaming systems
  • Decision support for choosing an RTA platform
  • Design and build example use cases

My Tasks

  • Evaluate different big data streaming systems (Spark, Flink, and others)
  • Design and implement different use cases for evaluation and proof of concept
  • Design and build a real time analytics platform

Techniques Used

  • Flink, Kafka, Zookeeper
  • Java8, Junit, Mockito, Maven, Eclipse
  • Git, Confluence, JIRA, Scrum

 

Development of a ERP system (Oct. 2014 – April 2015)

The Project

  • Create a web application for management of customer contracts
  • Manage accounting and billing
  • Support internal workflows

My Tasks

  • Database design
  • Create web application for employees

Techniques Used

  • Java7, Tomcat7, JPA2, Hibernate, Spring4 WebMVC, Spring Data,
  • DBUnit, Junit,
  • Maven2

 

Development of a CRM system (Oct. 2014 – May 2015)

The Project

  • Import, integrate and analyse data of customers
  • Create a web application for customers and employees
  • For a fincancial institution

My Tasks

  • Database design, import of CSV data, create integrated view on data
  • Design and Implement web applications
  • Create reports for controlling

Techniques Used

  • PHP 5.4, MySQL5, Java7, SQL

 

Stratosphere (research assistant) (2009 – 2014)

The Project

  • Big Data analytics system (now Apache Flink) in the Cloud
  • massively parallel data analysis, comparable Apache Hadoop
  • complex ad hoc analysis programs on very large data sets

My Tasks

  • developed different components in the database core
  • successful headed, designed and developed a meta data collection framework
  • developed extensible, high-performance modules combining harmoniously query execution and metadata (esp. statistics) collection at once
  • designed a distributed store with a central indexing component for very fast access to the metadata
  • planned and coordinated the work of 6 students working on the same project

Techniques Used

  • Java 6 and Java 7, Maven, Jenkins
  • Dataflow Language (Meteor), JSON
  • SQL, Hadoop
  • Libraries such as Kryo

 

Teacher Database Principles & Big Data Systems (2010 – 2014)

The Project

  • Educated students in architecture, designing, developing and programming databases and languages as SQL, Meteor, Hive, PigLatin, AQL, and so on
  • Gave excellent talks presenting Big Data systems such as Stratosphere, AsterixDB, Hadoop and others

My Tasks

  • successfully managed the course and teached students in „Principles of Database Systems“
  • managed the course and teached students in „Big Data Systems“
  • managed the course and teached students „Map/Reduce“

Techniques Used

  • entity relationship models, relational algebra, SQL, JDBC
  • Stratosphere, AsterixDB, Hadoop, IBM DB2
  • dataflow languages (Meteor, Hive, PigLatin), JSON, XML

 

Datawarehouse in a private insurance company (2007 – 2008)

The Project

  • insurance company introduced a new data warehouse system
  • had to re-develop highly complex SQL queries (now multitenant)
  • computed key performance indicators for the company

My Tasks

  • developed efficient and highly complex SQL queries over very big data
  • multitenant, robust

Techniques Used

  • SQL
  • MS Excel, MS Access

 

Java Developer in the financial sector (2001 – 2007)

The  Project

  • Web based information system for cash logistic
  • generated cost-optimal proposals for filling ATMs with cash
  • complete handling of cash refilling orders for ATMs (create, edit, issue, monitor and close up)

My Tasks

  • refactored a backend modul computing proposals for filling ATMs with cash
  • extended, tuned, and modularised the backend modul
  • involved in the whole software development cycle

Techniques Used

  • Java, SQL, Ant
  • Hibernate, Struts
  • DB-Design, DB-Tuning, programming triggers in PL/SQL on Oracle