Information Modelling - Assignment 1 - Time Series Analysis with R

This Web page describes the 1st assignment in the module "Information Management in hydroinformatics systems" (semester 3) dealing with time series analysis using R scripts.

Programming Languages. Data Analysis Environment and Test Data

This assignment description is using R as one typical programming langauge and data analysis environment. Students are free to use any other alternative language such as Python, Java, C++, Matlab, Octave, SPSS, ... which is suitable to solve the given tasks.
Students partipating in the courses to model river Rhine are free to replace the given Test Data Files with the provided HYMOG data files for the measurements at Ruhrort and Wesel.

General Objective

Modern sensor technology in combination with Internet/Web technology opens new opportunities for measurements and related "big data" handling in water related projects. One important type of data are time series of scalar physical state variables such as temperature, humidity, discharge, radiation, precipitation, moisture, ...

Different sensor and data analysis systems are using different data format to store time series data, most traditional formats are ASCII formats such as CSV based structures for spreadsheet applications. Typical tasks in hydroinformatics projects are the implementation of tool(s) to read, to process and to analyse such time series data files.

Objective of the 1st assignment is to write R script(s) to read time series files with a specific format, to pre-process and to analyse the time series as well as to generate a analysis report including suitable diagrams/plots.

Test Data Files

This assignment provides three different time series files with different physical state variable but same format. The data is exported from the DWD Weste-XL service in CSV format:

The time series data files are hourly values for the year 2010 from a measurement station nearby Cottbus (geo-location is specified in the files).

Assignment Targets - Working Steps

Target of the 1st assignment is to write R script(s) to handle the provided test data time series (all three data files).

The assignment work is structured in four parts:

  1. Reading the data files in R script
  2. Pre-Processing of the time series
  3. Analysis (basic) of the time series
  4. Reporting/Plotting of the time series

Reading time series data files

Please write suitable R script(s) to read the time series data from the CSV data files into a suitable R data structure (e.g. arrays/vectors, data.frame, zoo). Please consider suitable data types for the time/date information and the scalar value information. The R data structure should be reduced by extraction towards the time/date and the scalar value information (column 3 and 4 in the CSV files).

Pre-Processing

Please check with R script(s) all three time series towards gaps or other irregularities, assuming a regular time series with 1 hour time step.
Gaps and irregularities should be reported on the console.
Pre-process the three time series by analysing the key information value range (min and max) and mean value for the related scalar value information.

Analysis

Please analyse the time series month-wise for 2010 by calculation of the min, max and mean value for each month in 2010.
Precipitation time series requires additional total sum of precipitation for the related time window.
Transform the time series from hourly time steps to daily and weekly time series.

Reporting/Plotting

Please create suitable plots for the given time series:

Please report all results (numbers and plots) in a time series report. This can be done manually by copy & paste towards an office package.

Examples for the implementation different working steps by R Scripts will be presented in the lectures.

Assignment Report

The assignment report contains the implemented R script and the report of the performed working steps.