Data-intensive Computing with Spark & Hadoop
Content: Data sets of increasing volume and complexity are often very difficult to process with ‘standard’ HPC or DBMS technology. Large-scale data processing is particularly popular in the fields of linguistics, data mining, machine learning, bioinformatics and the social sciences, but certainly not limited to those disciplines. Open-source frameworks such as Apache Spark and Hadoop have been developed with this challenge in mind and can be of great benefit for data-intensive computing.
This workshop gives:
- Background: learn about the underlying concepts of Apache Spark & Hadoop
- Hands-on session: get experience with Spark in a Python notebook environment
- Optional: discuss your own data problem
- Duration: 6 hours
- Date and Time: Schedule 2018
- Target group: Researchers who need to analyze large amounts of data.
- Course Leader: Machiel Jansen / Haukur Pall Jonsson (SURFsara).