This repository contains workshop material.
In this repository, I will walk through a series of exercises whose purpose is to make the process of handling multiuple files easier via the use of Python dictionaries, tables and list comprehensions. I will provide examples, files and a series of commands for participants to explore. Throughout this workhop and for pedagogical purposes, I will be using Jupyter notebooks.
If you have an SFU login account, you can skip installing Python and simply access one of SFU’s Jupyter servers
Ensure you have installed Python on your local computer. Throughout these exercises, I will be using Python 3.6. One of the easiest ways to install Python on your local computer is by downloading and installing Anaconda. I strongly recommend installing Anaconda as it includes Jupyter Notebooks, Pandas and other scientific packages that will be used throughout this workshop.
Imagine your job is to extract specific information from a file A with over 600 entries. Such information acts as an identifier. Your job is to use the identifiers from file A to extract coordinates and key words from file B. What is the catch? File B has thousands of entries and the coordinates and keywords in each entry act as another identifier. You will use the identifiers found on file B to extract specific subsets of data from at least a dozen of other files (some with hundreds or thousands of lines). Once you get the data, your job is to run open source software and generate results. Once you get results, you clean the output and plot your findings.
In this workshop I will take a subset of a dataset I have been working on and share how tabulating data, using list comprehension, dictionaries and dataframes simplified the tasks mentioned above. Although the scope of this project calls for more, I will only focus on the aspects that involved using these tools.
In this section I will provide a short description of each exercise.
Refresh your memory of for loops, if statements, reading files and storing content into a data structure.
Basic use of list comprehension in small data structures and in file content.
Examples of how to use dictionaries to manipulate files and file content.
Short intro to dataframes, examples involving functions and plotting results.
Rundown of tools used throughout this workshop.
Challenge: Write a Python script using the tools we learned throughout this workshop.
I thank Dr. Cedric Chauve for his encouragement and involvement in my development. This workshop was inspired on the basis of my learning experiences with him. I also thank him for letting me use a portion of data we have been working on.