Big Data Consultant & Engineer at ASML (since Mar. 2020)
The Project
- A lot of data, especially measurements, is created in Litho systems
- Data must be transformed and loaded, various KPIs are calculated on different aggregation levels
- Product transformation from a monolith to a microservice-based scalable distributed system
My Tasks
- Implement basic DWH infrastructure based on Spark 3
- Do workshops for tech staff on BigData technologies and tools
- Consulting on architecture, performance, and best-practices
Techniques Used
- Spark Structured Streaming, Spark SQL, Java 11, Kafka, Parquet
- Mesos/Marathon, DC/OS, Docker, MicroServices, Hive
- Git, JMS
Big Data Consultant & Engineer for AWS based services (Nov. 2018 – Mar. 2020)
The Project
- Data (usage reports) from hundreds of different service providers are delivered
- Data from these providers must be converted, restructured, and aggregated
- Transformed data is loaded into a Datawarehouse
My Tasks
- Develop a Spark program handling the various input formats
- Do workshops for tech staff on BigData technologies and tools
- Consulting on architecture, performance, and code-style
Techniques Used
- Spark 2, Spark SQL, Scala, YARN
- AWS (EMR, S3, Lambda, StepFunctions, RDS, CodePipeline)
- GitHub
Big Data Consultant & DevOp at Smartclip AG (Apr. 2018 – Aug. 2018)
The Project
- Video advertising in TV and internet
- Collect data about ad delivery and consumer behaviour
- Prepare huge amounts of data for many Third Party ad hoc queries
My Tasks
- Find and advise on performance issues in Spark, Flink, and others
- Advise on big data technologies
- Evaluate Druid, create testcase therefore, advise on Druid usage
Techniques Used
- Amazon AWS, Kubernetes, Docker
- Spark, Flink, Druid, Zookeeper,
- Linux, Jira, Scala
Lead BigData Developer at Merck KGaA (Oct. 2017 – Mar. 2018)
The Project
- Design and implement processing steps as base for pipeline construction
- Ensure metadata and data lineage is written to and stored in Apache Atlas
- Lead and educate two colleagues, advise management on architecture decisions
My Tasks
- Team leader and consultant for management and other BigData projects
- Software design and architecture for Scala (Spark) programs
- Consultant for technical infrastructure for project lifecycle of BigData projects
Techniques Used
- Hortonworks HDP 2.5, Apache Atlas, Ranger, HDFS, Oozie
- Scala, IntelliJ, SBT, Spark (1.6)
- Linux, Windows, BitBucket, Jira, AWS
DevOp at GfK SE online market research (Apr. 2017 – Sept. 2017)
The Project
- Data from many different devices (PC, smartphone…) are sent to a collector cloud
- The ETL system periodically fetches data, transforms and loads them to HDFS / Hive
- Some reports are already generated based on the cleaned data
My Tasks
- Take care of all running ETLs
- Look for possible problems
- Fix bugs in the ETL software
Techniques Used
- Cloudera CDH4, CDH5, Hadoop, Hive, Oozie, Pig, HUE
- Icinga, Grafana2, Eclipse, Gradle, Maven
- Linux, Git/Stash/Bitbucket, Confluence, Bamboo, Scrum
Developer at GfK SE online market research (June 2016 – Mar. 2017)
The Project
- Data from different metering platforms have to be loaded into a datalake
- Before loading the data has to be enriched, transformed, and partially aggregated, as well as deduplicated
- Incoming and outgoing data has to confirm to different schema versions
My Tasks
- Design a complex dataflow pipeline in Crunch and Beam
- Tune and test the pipeline in a real world scenario
- Manage and implement transition from Hadoop to Spark
Techniques Used
- Cloudera CDH5, Hadoop, Hive, Oozie, Spark, Crunch, Beam, Kite …
- Icinga, Graphite, Eclipse, Gradle
- Linux, Git/Stash, Confluence, Bamboo, Scrum
DevOp at GfK SE online market research (Okt. 2015 – May 2016)
The Project
- Data from many different devices (PC, smartphone…) are sent to a collector cloud
- The ETL system periodically fetches the data, transforms and loads them to HDFS or Hive
- Some reports are already generated based on the cleaned data
My Tasks
- Take care of all running ETLs
- Look for possible problems
- Fix bugs in the ETL software
Techniques Used
- Cloudera CDH4, Hadoop, Hive, Oozie, Pig
- Icinga, Grafana, Eclipse, Gradle
- Linux, Git/Stash, Confluence, Bamboo, Scrum
Evaluation, Design and Implementation of RTA Use Cases (May 2015 – Sept. 2015)
The Project
- Evaluation of big data streaming systems
- Decision support for choosing an RTA platform
- Design and build example use cases
My Tasks
- Evaluate different big data streaming systems (Spark, Flink, and others)
- Design and implement different use cases for evaluation and proof of concept
- Design and build a real time analytics platform
Techniques Used
- Flink, Kafka, Zookeeper
- Java8, Junit, Mockito, Maven, Eclipse
- Git, Confluence, JIRA, Scrum
Development of a ERP system (Oct. 2014 – April 2015)
The Project
- Create a web application for management of customer contracts
- Manage accounting and billing
- Support internal workflows
My Tasks
- Database design
- Create web application for employees
Techniques Used
- Java7, Tomcat7, JPA2, Hibernate, Spring4 WebMVC, Spring Data,
- DBUnit, Junit,
- Maven2
Development of a CRM system (Oct. 2014 – May 2015)
The Project
- Import, integrate and analyse data of customers
- Create a web application for customers and employees
- For a fincancial institution
My Tasks
- Database design, import of CSV data, create integrated view on data
- Design and Implement web applications
- Create reports for controlling
Techniques Used
- PHP 5.4, MySQL5, Java7, SQL
Stratosphere (research assistant) (2009 – 2014)
The Project
- Big Data analytics system (now Apache Flink) in the Cloud
- massively parallel data analysis, comparable Apache Hadoop
- complex ad hoc analysis programs on very large data sets
My Tasks
- developed different components in the database core
- successful headed, designed and developed a meta data collection framework
- developed extensible, high-performance modules combining harmoniously query execution and metadata (esp. statistics) collection at once
- designed a distributed store with a central indexing component for very fast access to the metadata
- planned and coordinated the work of 6 students working on the same project
Techniques Used
- Java 6 and Java 7, Maven, Jenkins
- Dataflow Language (Meteor), JSON
- SQL, Hadoop
- Libraries such as Kryo
Teacher Database Principles & Big Data Systems (2010 – 2014)
The Project
- Educated students in architecture, designing, developing and programming databases and languages as SQL, Meteor, Hive, PigLatin, AQL, and so on
- Gave excellent talks presenting Big Data systems such as Stratosphere, AsterixDB, Hadoop and others
My Tasks
- successfully managed the course and teached students in „Principles of Database Systems“
- managed the course and teached students in „Big Data Systems“
- managed the course and teached students „Map/Reduce“
Techniques Used
- entity relationship models, relational algebra, SQL, JDBC
- Stratosphere, AsterixDB, Hadoop, IBM DB2
- dataflow languages (Meteor, Hive, PigLatin), JSON, XML
Datawarehouse in a private insurance company (2007 – 2008)
The Project
- insurance company introduced a new data warehouse system
- had to re-develop highly complex SQL queries (now multitenant)
- computed key performance indicators for the company
My Tasks
- developed efficient and highly complex SQL queries over very big data
- multitenant, robust
Techniques Used
- SQL
- MS Excel, MS Access
Java Developer in the financial sector (2001 – 2007)
The Project
- Web based information system for cash logistic
- generated cost-optimal proposals for filling ATMs with cash
- complete handling of cash refilling orders for ATMs (create, edit, issue, monitor and close up)
My Tasks
- refactored a backend modul computing proposals for filling ATMs with cash
- extended, tuned, and modularised the backend modul
- involved in the whole software development cycle
Techniques Used
- Java, SQL, Ant
- Hibernate, Struts
- DB-Design, DB-Tuning, programming triggers in PL/SQL on Oracle