Big Data Engineer at REWE digital (Mar. 2022 – Jun 2025)
The Project
- Stock information of all warehouses streaming into the system
 
- Calculate stocks in near real-time
 
- Besides: calculate different histories, measurements, and reports in a streaming fashion
 
My Tasks
- Design & Implement new functionalities
 
- Performance analysis and improvements
 
- Train colleagues in BigData technologies
 
Techniques Used
- Spark Streaming, HBase, Kafka, Java (8, 11, 17), MapR, GCP
 
- Docker, Kubernetes (Rancher), Helm
 
- GitLab
 
Big Data Consultant & Engineer at ASML (Mar. 2020 – Mar. 2022)
The Project
- A lot of data, especially measurements, is created in Litho systems
 
- Data must be transformed and loaded, various KPIs are calculated on different aggregation levels
 
- Product transformation from a monolith to a microservice-based scalable distributed system
 
My Tasks
- Implement basic DWH infrastructure based on Spark 3
 
- Do workshops for tech staff on BigData technologies and tools
 
- Consulting on architecture, performance, and best-practices
 
Techniques Used
- Spark Structured Streaming, Spark SQL, Java 11, Kafka, Parquet
 
- Mesos/Marathon, DC/OS, Docker, MicroServices, Hive
 
- Git, JMS
 
 
Big Data Consultant & Engineer for AWS based services (Nov. 2018 – Mar. 2020)
The Project
- Data (usage reports) from hundreds of different service providers are delivered
 
- Data from these providers must be converted, restructured, and aggregated
 
- Transformed data is loaded into a Datawarehouse
 
My Tasks
- Develop a Spark program handling the various input formats
 
- Do workshops for tech staff on BigData technologies and tools
 
- Consulting on architecture, performance, and code-style
 
Techniques Used
- Spark 2, Spark SQL, Scala, YARN
 
- AWS (EMR, S3, Lambda, StepFunctions, RDS, CodePipeline)
 
- GitHub
 
 
Big Data Consultant & DevOp at Smartclip AG (Apr. 2018 – Aug. 2018)
The Project
- Video advertising in TV and internet
 
- Collect data about ad delivery and consumer behaviour
 
- Prepare huge amounts of data for many Third Party ad hoc queries
 
My Tasks
- Find and advise on performance issues in Spark, Flink, and others
 
- Advise on big data technologies
 
- Evaluate Druid, create testcase therefore, advise on Druid usage
 
Techniques Used
- Amazon AWS, Kubernetes, Docker
 
- Spark, Flink, Druid, Zookeeper,
 
- Linux, Jira, Scala
 
 
Lead BigData Developer at Merck KGaA (Oct. 2017 – Mar. 2018)
The Project
- Design and implement processing steps as base for pipeline construction
 
- Ensure metadata and data lineage is written to and stored in Apache Atlas
 
- Lead and educate two colleagues, advise management on architecture decisions
 
My Tasks
- Team leader and consultant for management and other BigData projects
 
- Software design and architecture for Scala (Spark) programs
 
- Consultant for technical infrastructure for project lifecycle of BigData projects
 
Techniques Used
- Hortonworks HDP 2.5, Apache Atlas, Ranger, HDFS, Oozie
 
- Scala, IntelliJ, SBT, Spark (1.6)
 
- Linux, Windows, BitBucket, Jira, AWS
 
 
DevOp at GfK SE online market research (Apr. 2017 – Sept. 2017)
The Project
- Data from many different devices (PC, smartphone…) are sent to a collector cloud
 
- The ETL system periodically fetches data, transforms and loads them to HDFS / Hive
 
- Some reports are already generated based on the cleaned data
 
My Tasks
- Take care of all running ETLs
 
- Look for possible problems
 
- Fix bugs in the ETL software
 
Techniques Used
- Cloudera CDH4, CDH5, Hadoop, Hive, Oozie, Pig, HUE
 
- Icinga, Grafana2, Eclipse, Gradle, Maven
 
- Linux, Git/Stash/Bitbucket, Confluence, Bamboo, Scrum
 
 
Developer at GfK SE online market research (June 2016 – Mar. 2017)
The Project
- Data from different metering platforms have to be loaded into a datalake
 
- Before loading the data has to be enriched, transformed, and partially aggregated, as well as deduplicated
 
- Incoming and outgoing data has to confirm to different schema versions
 
My Tasks
- Design a complex dataflow pipeline in Crunch and Beam
 
- Tune and test the pipeline in a real world scenario
 
- Manage and implement transition from Hadoop to Spark
 
Techniques Used
- Cloudera CDH5, Hadoop, Hive, Oozie, Spark, Crunch, Beam, Kite …
 
- Icinga, Graphite, Eclipse, Gradle
 
- Linux, Git/Stash, Confluence, Bamboo, Scrum
 
 
DevOp at GfK SE online market research (Okt. 2015 – May 2016)
The Project
- Data from many different devices (PC, smartphone…) are sent to a collector cloud
 
- The ETL system periodically fetches the data, transforms and loads them to HDFS or Hive
 
- Some reports are already generated based on the cleaned data
 
My Tasks
- Take care of all running ETLs
 
- Look for possible problems
 
- Fix bugs in the ETL software
 
Techniques Used
- Cloudera CDH4, Hadoop, Hive, Oozie, Pig
 
- Icinga, Grafana, Eclipse, Gradle
 
- Linux, Git/Stash, Confluence, Bamboo, Scrum
 
 
Evaluation, Design and Implementation of RTA Use Cases (May 2015 – Sept. 2015)
The Project
- Evaluation of big data streaming systems
 
- Decision support for choosing an RTA platform
 
- Design and build example use cases
 
My Tasks
- Evaluate different big data streaming systems (Spark, Flink, and others)
 
- Design and implement different use cases for evaluation and proof of concept
 
- Design and build a real time analytics platform
 
Techniques Used
- Flink, Kafka, Zookeeper
 
- Java8, Junit, Mockito, Maven, Eclipse
 
- Git, Confluence, JIRA, Scrum
 
 
Development of a ERP system (Oct. 2014 – April 2015)
The Project
- Create a web application for management of customer contracts
 
- Manage accounting and billing
 
- Support internal workflows
 
My Tasks
- Database design
 
- Create web application for employees
 
Techniques Used
- Java7, Tomcat7, JPA2, Hibernate, Spring4 WebMVC, Spring Data,
 
- DBUnit, Junit,
 
- Maven2
 
 
Development of a CRM system (Oct. 2014 – May 2015)
The Project
- Import, integrate and analyse data of customers
 
- Create a web application for customers and employees
 
- For a fincancial institution
 
My Tasks
- Database design, import of CSV data, create integrated view on data
 
- Design and Implement web applications
 
- Create reports for controlling
 
Techniques Used
- PHP 5.4, MySQL5, Java7, SQL
 
 
Stratosphere (research assistant) (2009 – 2014)
The Project
- Big Data analytics system (now Apache Flink) in the Cloud
 
- massively parallel data analysis, comparable Apache Hadoop
 
- complex ad hoc analysis programs on very large data sets
 
My Tasks
- developed different components in the database core
 
- successful headed, designed and developed a meta data collection framework
 
- developed extensible, high-performance modules combining harmoniously query execution and metadata (esp. statistics) collection at once
 
- designed a distributed store with a central indexing component for very fast access to the metadata
 
- planned and coordinated the work of 6 students working on the same project
 
Techniques Used
- Java 6 and Java 7, Maven, Jenkins
 
- Dataflow Language (Meteor), JSON
 
- SQL, Hadoop
 
- Libraries such as Kryo
 
 
Teacher Database Principles & Big Data Systems (2010 – 2014)
The Project
- Educated students in architecture, designing, developing and programming databases and languages as SQL, Meteor, Hive, PigLatin, AQL, and so on
 
- Gave excellent talks presenting Big Data systems such as Stratosphere, AsterixDB, Hadoop and others
 
My Tasks
- successfully managed the course and teached students in „Principles of Database Systems“
 
- managed the course and teached students in „Big Data Systems“
 
- managed the course and teached students „Map/Reduce“
 
Techniques Used
- entity relationship models, relational algebra, SQL, JDBC
 
- Stratosphere, AsterixDB, Hadoop, IBM DB2
 
- dataflow languages (Meteor, Hive, PigLatin), JSON, XML
 
 
Datawarehouse in a private insurance company (2007 – 2008)
The Project
- insurance company introduced a new data warehouse system
 
- had to re-develop highly complex SQL queries (now multitenant)
 
- computed key performance indicators for the company
 
My Tasks
- developed efficient and highly complex SQL queries over very big data
 
- multitenant, robust
 
Techniques Used
 
Java Developer in the financial sector (2001 – 2007)
The  Project
- Web based information system for cash logistic
 
- generated cost-optimal proposals for filling ATMs with cash
 
- complete handling of cash refilling orders for ATMs (create, edit, issue, monitor and close up)
 
My Tasks
- refactored a backend modul computing proposals for filling ATMs with cash
 
- extended, tuned, and modularised the backend modul
 
- involved in the whole software development cycle
 
Techniques Used
- Java, SQL, Ant
 
- Hibernate, Struts
 
- DB-Design, DB-Tuning, programming triggers in PL/SQL on Oracle
 
					 
			 
			
		
	
			 
	     
		            
	
			 
		
Nach oben scrollen