The words you are searching are inside this book. To get more targeted content, please make full-text search by clicking here.

Learn Python coding for the earth science program

Discover the best professional documents and content resources in AnyFlip Document Base.
Search
Published by u2103308, 2024-03-01 21:40:16

Python In Earth Science

Learn Python coding for the earth science program

4.3. Built-in Mathematical Functions 45 10. round(x,n): Returns the floating point value of x rounded to n digits after the decimal point. In [200]: 1 round (2.675 ,2) Out[200]: 1 2.67 11. around(A,n): Returns the floating point array A rounded to n digits after the decimal point. In [201]: 1 around (C ,2) Out[201]: 1 array ([[ 2.53 , 2.56] , [ 5.37 , 5.46]]) 12. range([x],y[,z]) : This function creates lists of integers in an arithmetic progression. It is primarily used in for loops. The arguments must be plain integers. • If the step argument is omitted, it defaults to 1. • If the start argument (x) is omitted, it defaults to 0. • The full form returns a list of plain integers [x, x + z, x + 2*z, ···,y-z]. • If step (z) is positive, the last element is the ‘start (x) + i * step (z)’ just less than ‘y’. • If step (z) is negative, the last element is the ‘start (x) + i * step (z)’ just greater than ‘y’. • If step (z) is zero, ValueError is raised. In [202]: 1 range (10) Out[202]: 1 [0 ,1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9] In [203]: 1 range (1 ,11) Out[203]: 1 [1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9 ,10] In [204]: 1 range (0 ,20 ,5) Out[204]: 1 [0 ,5 ,10 ,15]


4.3. Built-in Mathematical Functions 46 In [205]: 1 range (0 , -5 , -1) Out[205]: 1 [0 , -1 , -2 , -3 , -4] In [206]: 1 range (0) Out[206]: 1 [ ] 13. arange(x,y[,z]) : This function creates arrays of integers in an arithmetic progression. Same as in range(). In [207]: 1 arange (10) Out[207]: 1 array ([0 ,1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9]) In [208]: 1 arange (1 ,11) Out[208]: 1 array ([1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9 ,10]) In [209]: 1 arange (0 ,20 ,5) Out[209]: 1 array ([0 ,5 ,10 ,15]) In [210]: 1 arange (0 , -5 , -1) Out[210]: 1 array ([0 , -1 , -2 , -3 , -4]) In [211]: 1 arange (0) Out[211]: 1 array ([ ], dtype = int64 ) 14. zip(A,B): Returns a list of tuples, where each tuple contains a pair of ith element of each argument sequences. The returned list is truncated to length of shortest sequence. For a single sequence argument, it returns a list with 1 tuple. With no arguments, it returns an empty list. In [212]: 1 zip (A , B) Out[212]: 1 [( array ([ -2 , 2]) , array ([2 , 2]) ) , ( array ([ -5 , 5]) , array ([5 , 5]) )]


4.4. Matrix operations 47 15. sort(): Sorts the array elements in smallest to largest order. In [213]: 1 D= array ([10 ,2 ,3 ,10 ,100 ,54]) In [214]: 1 D. sort () In [215]: 1 D Out[215]: 1 array ([2 , 3, 10 , 10 , 54 , 100]) 16. ravel(): Returns a flattened array. 2-D array is converted to 1-D array. In [216]: 1 A. ravel () Out[216]: 1 array ([ -2 , 2, -5, 5]) 17. transpose(): Returns the transpose of an array (matrix) by permuting the dimensions. In [217]: 1 A. transpose () Out[217]: 1 array ([[ -2 , -5] , [ 2 , 5]]) 18. diagonal(): Returns diagonal matrixs for pecified diagonals. In [218]: 1 A. diagonal () Out[218]: 1 array ([ -2 , 5]) 4.4. MATRIX OPERATIONS The linear algebra module of Numpy provides a suit of Matrix calculations. 1. Dot product: In [219]: 1 a= rand (3 ,3) 2 b= rand (3 ,3) 3 dot_p = dot (a ,b) Xwhere a and b are two arrays.


4.5. String Operations 48 2. Cross product: In [220]: 1 a= rand (3 ,3) 2 b= rand (3 ,3) 3 cro_p = cross (a ,b ) Xwhere a and b are two arrays. 3. Matrix multiplication: In [221]: 1 a= rand (2 ,3) 2 b= rand (3 ,2) 3 mult_ab = matmul (a , b) In [222]: 1 shape ( mult_ab ) Out[222]: 1 (2 ,2) 4.5. STRING OPERATIONS Lets assume a string s as, In [223]: 1 s='sujan koirala ' 1. split(): Splitting the strings. It has one required argument, a delimiter. The method splits a string into a list of strings based on the delimiter. In [224]: 1 s. split () Out[224]: 1 ['sujan ', 'koirala '] Xblank space as delimiter. creates a list with elements separated at locations of blank space. In [225]: 1 s. split ('a') Out[225]: 1 ['suj ', 'n koir ', 'l', ''] X’a’ as delimiter. creates a list with elements separated at locations of ’a’ 2. lower() and upper(): Changes the string to lower case and upper case respectively.


4.6. Other Useful Functions 49 In [226]: 1 s='Sujan Koirala ' 2 s Out[226]: 1 'Sujan Koirala ' In [227]: 1 s. lower () Out[227]: 1 'sujan koirala ' In [228]: 1 s. upper () Out[228]: 1 'SUJAN KOIRALA ' 3. count(): Counts the number of occurrences of a substring. In [229]: 1 s. count ('a') Out[229]: 1 3 XThere are 3 a’s in string s. 4. Replace a substring: In [230]: 1 s2 =s. replace ("Su", " Tsu") 5. List to String: In [231]: 1 a_list =[ 'a','b','c'] 2 a_str =" and ". join ( str (x) for x in a_list ) 3 a_str Out[231]: 1 'a and b and c' 4.6. OTHER USEFUL FUNCTIONS 1. astype('type code'): Returns an array with the same elements coerced to the type indicated by type code in Table 3.1. It is useful to save data as some type. In [232]: 1 A. astype ('f') Out[232]: 1 array ([[ -2. , 2.] ,[ -5. , 5.]])


4.6. Other Useful Functions 50 2. tolist(): Converts the array to an ordinary list with the same items. In [233]: 1 A. tolist () Out[233]: 1 [[ -2 , 2] , [ -5 , 5]] 3. byteswap(): Swaps the bytes in an array and returns the byteswapped array. If the first argument is 'True', it byteswaps and returns all items of the array inplace. Supported byte sizes are 1, 2, 4, or 8. It is useful when reading data from a file written on a machine with a different byte order. For details on machine dependency, refer this. To convert data from big endian to little endian or viceversa, add byteswap() in same line where ‘fromfile’ is used. If your data is made by big endian.


5 ESSENTIAL PYTHON SCRIPTING Control statements, structure of a Python program, and system commands. 51


5.1. Control Flow Tools 52 5.1. CONTROL FLOW TOOLS This section briefly introduces common control statements in Python. The control statements are written in the block structure and do not have end statement. The end of a block are expressed by indentation. 5.1.1. IF STATEMENT The if statement is used to test a condition, which can have True of False values. An example if block is: In [234]: 1 if x < 0: 2 print x ,'is a negative number ' 3 elif x > 0: 4 print x ,'is a negative number ' 5 else : 6 print , 'x is zero ' Xcan have zero or more elif, and else statement is also optional. If statement can also be checked if a value exists within an iterable such as list, tuple, array or a string. In [235]: 1 a_list =[ 'a','d','v' ,2 ,4] 2 if 'd' in a_list : 3 print a_list . index ('d') Out[235]: 1 1 In [236]: 1 str ='We are in a python Course ' 2 if 'We ' in str : 3 print str Out[236]: 1 We are in a Python Course 5.1.2. FOR STATEMENT The for statement iterates over the items of any sequence (a list or a string), and repeats the steps within the for loop. In [237]: 1 words = ['cat ', 'window ', ' defenestrate '] 2 for _wor in words : 3 print _wor , words . index ( _wor ) ,len( _wor ) Dictionary can be iterated using items, keys, or values.


5.1. Control Flow Tools 53 In [238]: 1 words = {1: 'cat ' ,2: 'window ' ,3: ' defenestrate '} 2 for _wor in words . items () : 3 print _wor , len ( _wor ) Out[238]: 1 (1 , 'cat ') 2 2 (2 , 'window ') 2 3 (3 , ' defenestrate ') 2 5.1.3. WHILE STATEMENT Similar to if statement, but it does not repeat until the end of the loop. The while loop ends when a condition is met. In [239]: 1 count = 0 2 while ( count < 2) : 3 print 'The count is:', count 4 count = count + 1 5 print " Good bye !" Out[239]: 1 The count is: 0 2 The count is: 1 3 Good bye ! 5.1.4. BREAK AND CONTINUE The break statement breaks out of the smallest enclosing for or while loop. The continue statement continues with the next iteration of the same loop. In [240]: 1 for n in range (2 , 10) : 2 for x in range (2 , n ): 3 if n % x == 0: 4 print n , 'equals ', x , '*', n/x 5 break 6 else : 7 # loop fell through without finding a factor 8 print n , 'is a prime number ' Out[240]: 1 2 is a prime number 2 3 is a prime number 3 4 equals 2 * 2 4 5 is a prime number 5 6 equals 2 * 3 6 7 is a prime number 7 8 equals 2 * 4 8 9 equals 3 * 3


5.2. Python Functions 54 In [241]: 1 for num in range (2 , 10) : 2 if num % 2 == 0: 3 print " Found an even number ", num 4 continue 5 print " Found a number ", num Out[241]: 1 Found an even number 2 2 Found a number 3 3 Found an even number 4 4 Found a number 5 5 Found an even number 6 6 Found a number 7 7 Found an even number 8 8 Found a number 9 5.1.5. RANGE As shown in previous chapters and examples, range is used to generate a list of numbers from start to end at an interval step. In Python 2, range generates the whole list object, whereas in Python 3, it is a special range generator object that does not use the memory redundantly. 5.2. PYTHON FUNCTIONS Sections of a Python program can be put into an indented block (a container of code) defined as a function. A function can be called "on demand". A basic syntax is as follows: In [242]: 1 def funcname ( param1 , param2 ): 2 prod = param1 * param2 3 print prod 4 return In [243]: 1 type ( funcname (2 ,3) ) Out[243]: 1 6 2 NoneType • def identifies a function. A name follows the def. • Parameters can be passed to a function. These parameter values are substituted before calculation.


5.3. Python Modules 55 • return: Returns the result of the function. In the above example, return will be an empty NoneType object. If the return command includes arguments, the result can be passed onto the statement that calls the function. Also, the default values of the parameters can also be set. If the function is called without any arguments, the default values are used for calculation. Here is an example. In [244]: 1 def funcname ( param1 =2 , param2 =3) : 2 prod = param1 * param2 3 return prod In [245]: 1 funcname () Out[245]: 1 6 In [246]: 1 funcname (3 ,4) Out[246]: 1 12 In [247]: 1 type ( funcname (2 ,3) ) Out[247]: 1 int For details on defining function, refer here. 5.3. PYTHON MODULES A module is a file containing Python definitions and statements. A module file (with .py ending) provides a module named filename. This module name is available as __name__ once the module is imported.


5.3. Python Modules 56 In [248]: 1 #!/ Users / skoirala / anaconda / envs / pyfull / bin / python 2 import numpy as np 3 def lcommon (m , n): # def function 's name with parameters as , funcname ( parameter ) 4 if m > 0 and n >0 and type (m) == int and type (n ) == int : # int means integer 5 a =[] # an empty list 6 for i in range (1 , n +1) : #i is from 1 to n_th +1 ; that is [1 , 2, ... , (n -1) +1] 7 M=m *i 8 for k in range (1 , m +1) : 9 N=n *k 10 if M == N: #M, N is common multiple of m and n 11 a= np . append (a ,M) # input the common multi - ple of m and n into list a 12 return ( min (a) ) # return the minimum value in list a 13 else : 14 return (" error ") 15 def computeHCF (x , y): 16 17 # choose the smaller number 18 if x > y: 19 smaller = y 20 else : 21 smaller = x 22 for i in range (1 , smaller +1) : 23 if (( x % i == 0) and (y % i == 0) ): 24 hcf = i 25 26 return hcf • After a file which includes a function is created and saved, the function can be used in interactive shell within the directory (with the file) or in other files in the same directory as a module. XIf the saved filename is samp_func.py, the function can be called in from another program file in the same directory. In [249]: 1 import samp_func as sf 2 print sf . lcommon (3 ,29) Out[249]: 1 87.0 XIf the program is run, you can get the number 87, which is the least common multiple of 3 and 29.


5.4. Python Classes 57 The module file can also be run as an standalone program if the following block of code is added to the end. In [250]: 1 if __name__ == " __main__ ": 2 import sys 3 lcommon (int( sys . argv [1]) ,int ( sys . argv [2]) ) 4 computeHCF (int( sys . argv [1]) ,int ( sys . argv [2]) ) Also, the variables defined in the module file can be accessed as long as it is not within functions of the module. In [251]: 1 somevariable =[ '1' ,2 ,4] In [252]: 1 import samp_func as sf 2 print sf . somevariable Out[252]: 1 ['1' ,2 ,4] A list of all the objects from the module can be obtained by using dir() as In [253]: 1 dir ( samp_func ) 5.4. PYTHON CLASSES As Python is a fully object oriented language, it provides class that allows you to create (instantiate) an object. A class is something that just contains structure – it defines how something should be laid out or structured, but doesn’t actually fill in the content. This is useful when a set of operation is to be carried out in several instances, and provides a distinction for every object created. The following is an example class taken from here. In [254]: 1 import math 2 class Point : 3 def __init__ ( self , x , y): 4 self .x = x 5 self .y = y 6 7 def __str__ ( self ): 8 return " Point (%d, %d)" % ( self .x , self . y) 9 10 def distance_from_origin ( self ): 11 return math . sqrt ( self .x **2 + self .y **2) • It is customary to name class using upper case letters.


5.5. Additional Relevant Modules 58 • __init__ and self are critical to create an object. • self is the object that will be created when the class is called • __init__ creates the object self and assigns the attributes x and y to it. In [255]: 1 p1 = Point (1 ,4) 2 p2 = Point (2 ,3) 3 p1 .x Out[255]: 1 1 In [256]: 1 p1 . distance_from_origin () Out[256]: 1 4.123105625617661 In [257]: 1 p2 . distance_from_origin () Out[257]: 1 3.605551275463989 Some simple and easy to understand examples of class are provided in: • http://www.jesshamrick.com/2011/05/18/an-introduction-to-classes-and-inheritancein-python/ • https://jeffknupp.com/blog/2014/06/18/improve-your-python-python-classes-andobject-oriented-programming/ 5.5. ADDITIONAL RELEVANT MODULES This section briefly introduces modules that are necessary while executing a Python program in the most efficient way. 5.5.1. SYS MODULE This module provides access to some variables used or maintained by the Python interpreter. • argv: Probably, the most useful of sys methods. sys.argv is a list object containing the arguments while running a python script from command lines. The first element is always the program name. Any number of arguments can be passed into the program as strings, e.g., sys.argv[1] is the second argument and so on.


5.5. Additional Relevant Modules 59 • byteorder: Remember the byteswap? sys.byteorder provides info on the machine Endianness. • path: The default path that Python searches is stored in sys.path. If you have written modules and classes, and want to access it from anywhere, you can add path to sys.path as, In [258]: 1 sys . path . append ('path to your directory ') A full set of sys methods is provided here. 5.5.2. OS MODULE This module provides a unified interface to a number of operating system functions. There are lots of useful functions for process management and file object creation in this module. Among them, it is especially useful to use functions for manipulating file and directory, which are briefly introduced below. For details on ‘OS module’, click here. Before using file and directory commands, it is necessary to import os module as, In [259]: 1 import os 2 os . getcwd () Xsame as pwd in UNIX. Stands for present working directory and displays the absolute path to the current directory. In [260]: 1 os . mkdir ('dirname ') Xsame as mkdir in UNIX. Makes a new directory. dirname can be absolute or relative path to the directory you want to create. In [261]: 1 os . remove ('filename ') Xsame as rm in UNIX. Removes a file. In [262]: 1 os . rmdir ('dirname ') Xsame as rm -r in UNIX. Removes a directory. In [263]: 1 os . chdir ('dirpath ') Xsame as cd in UNIX. Change directory to the location shown by dirpath. dirpath can be absolute or relative.


5.5. Additional Relevant Modules 60 In [264]: 1 os . listdir ('dirpath ') Xsame as ls in UNIX. Lists all files in a directory located at dirpath. In [265]: 1 os . path . exists ('filepath ') XA very useful os function that checks if a file exists. Returns True if it exists and False if not. If you want to know more about these functions, follow this. 5.5.3. ERRORS AND EXCEPTIONS There are two types of errors in Python, in general: Syntax Errors and Exceptions. Syntax error are caused by errors in Syntax. Exceptions may be caused by Value, Name, IO, Type, etc. A full list of exceptions in Python is here. Python has builtin functions (try and except) to handle the exceptions. An example is below: In [266]: 1 while True : 2 try : 3 x = int ( raw_input (" Please enter a number : ")) 4 break 5 except ValueError : 6 print " Oops ! That was no valid number . Try again ..." XTries to convert x to int. If a string is passed, it will raise ValueError and then goes to next iteration. The above is the simplest example, and there are many other more "sophisticated" ways to handle exceptions here.


6 ADVANCED STATISTICS AND MACHINE LEARNING Using advanced statistics. 61


6.1. Quick overview 62 6.1. QUICK OVERVIEW In this section, we will focus on a practical example to demonstrate the implementations of some advanced statistics, specifically machine learning algorithms, to perform gap filling of eddy covariance data. The concept is to take some gappy data and fill the holes using the meteorological variables associated with the missing values, then compare the methods. It should be noted that we will not go into depth about the statistical methods themselves, but just give an example of the implementation. Indeed, in most cases we will use the default hyper-parameters, which in nice for an overview, but bad practice overall. One should always try to understand a method when implementing it. 6.1.1. REQUIRED PACKAGES This exercise will require the following packages (all should be available via "conda install..."): • numpy • scipy • pandas • scikit-learn • statsmodels • netCDF4 (needs hdf4) 6.1.2. OVERVIEW OF DATA Our sample dataset (provided by me), is a processed eddy covariance file, such as what you would find from the FLUXNET database. If you are unfamiliar with eddy covariance, don’t panic, just think of it as a fancy weather station that measures not just the meteorological data, but how things come and go from the ecosystem (such as water and carbon). This file is formatted where in half hourly resolution, so it gives a value for each variable measured every half hour, or 48 points per day (17,520 points per year). One problem with eddy covariance datasets is that they tend to have


6.2. Import and prepare the data 63 missing values, or gaps, due to equipment failures or improper measuring conditions. So to fix this, we can predict the missing values, or gap-fill the dataset. This particular dataset has about 40% of the data missing. As we are not the first to deal with gappy eddy covariance datasets, there is a current "standard" method involving sorting all the values into a look-up table, where values from a similar time-span and meteo conditions are binned, and the gaps are filled with mean from the bin. We will try to fill the gaps using three statistical methods: random forest, neural networks, and a multi-linear regression. 6.2. IMPORT AND PREPARE THE DATA We will try to organize this project somewhat like you would a real project, which means we will have a number of ".py" files in our project, as well as our data files. So to start, find a nice cozy place in your file system (maybe something like in "Documents" or "MyPyFiles") and create a new folder (maybe called "AdvStat"). In our nice, new, cozy folder, we can first copy the sample dataset, which should have the file extension ".nc". Now we can make three files, one named "Calc.py", one named "Regs.py", and one named "Plots.py". These files can be created and/or opened into the Spyder IDE to make things a bit easier to work with, or simply in your favorite text editor. Now, starting in the "Calc.py" file, we can import numpy and netCDF4 to start us off. We will import the variables we are interested in and convert them into a numpy array. Because this provided file has over 300 variables, we will create a dictionary containing only a subset of variables that we are interested in based on a list, namely: 1 IncludedVars =[ 'Tair_f ','Rg_f ','VPD_f ','LE_f ','LE ', 2 'year ','month ','day ','hour '] So to build our dictionary, we can start with an empty dictionary (remember "") called "df". Then we can loop through our IncludedVars and use each item in the list as a key for df, and pair each key with a numpy array from the netCDF: 1 df [ var ]= np . array ( ncdf [ var ]) . flatten () [48*365:] You may notice two things: first is that we not only turn our netCDF variable into a numpy array, but we also call "flatten". This is because the netCDF has three


6.3. Setting up the gapfillers 64 dimensions (time, lat, lon), but as this is only one site, the lat and lon dimensions don’t change, so we can just flatten the array to one dimension. Second is that we are already slicing the data from 48*365 onwards. This is because the first year is only a partial year, so we not only have some gaps in the fluxes, but in all the data, which will mess us up a bit. Thankfully for you, I have been through this dataset and can tell you to skip the first year. Now, this netCDF is fairly well annotated, so if you would like more information on a variable, simply ask: 1 ncdf [ var ] Some highlights are that we will be trying to gap-fill the "LE" variable (Latent Energy, a measure of the water flux), which we can compare to the professionally filled version "LE_f". As this is a regression problem, we need to get things into an "X" vs "Y" format. For the X variables we will use the following: 1 XvarList =[ 'Tair_f ','Rg_f ','VPD_f ','year ','month ','day ','hour '] With our list, we can then create a 2 dimensional array in the form number-ofsamples by number-of-features. We can do this by first creating a list of the arrays, then calling np.array and transposing. If we want to be fancy, we can do this in one line as: 1 X= np . array ([ df [ var ] for var in XvarList ]) .T and like magic we are all ready to go with the Xvars. The Y variable is also easy, it is just equal to LE, which if we remeber is stored in our dictionary as df["LE"]. However, we will do a little trick that will seem a bit silly, but will make sense later. Lets first store our Y variable name as a string, then set Y as: 1 yvarname ="LE" 2 Y= df [ yvarname ] I promise this will come in handy. One final task is to figure out where the gaps are, but we will come to the in the next section, which is... 6.3. SETTING UP THE GAPFILLERS Now we can move on to our second python file in our cozy folder: "Regs.py". This file will hold some of our important functions that help us complete our quest of gapfilling.


6.3. Setting up the gapfillers 65 The only package to import will be numpy. After the import, we can make a very simple function called "GetMask" that will find our gaps for us. As we extracted the data from the netCDF, all gaps are given the value -9999, so our function will simply return a boolean array where all gapped values are True. I tend to be a bit cautious, so I usually look for things such as: 1 mask =(Y < -9000) but you could easily say (Y==-9999). Don’t forget to return our mask at the end of the function! Now, so we don’t forget, we can go ahead and use this function in our "Calc.py" file right away. First we need to tell "Calc.py" where to find the "GetMask", so in "Calc.py" we simply 1 import Regs and we can set our mask as: 1 mask = Regs . GetMask ( df [ yvarname ]) Easy as that! Now, we will want to keep everything tidy, so go ahead and also save our mask into our dictionary (df) as something like "GapMask". Now, lets go back to "Regs.py" and make a second function. This function will take all the machine learning algorithms that we will use from the SKLearn package and gap fill our dataset, so lets call it "GapFillerSKLearn" and it will take four input variables: X,Y,GapMask, and model. As this function will be a bit abstract, let add some documentation, which will be a string right after we define the function. I have made an example documentation for our function here:


6.3. Setting up the gapfillers 66 1 def GapFillerSKLearn (X ,Y , GapMask , model ): 2 """ 3 GapFillerSKLearn (X,Y, GapMask , model ) 4 5 Gap fills Y via X with model 6 7 Uses the provided model to gap fill Y via the X vairiable 8 9 Parameters 10 ---------- 11 X : numpy array 12 Predictor variables 13 Y : numpy array 14 Training set 15 GapMask : numpy boolean array 16 array indicating where gaps are with True 17 18 Returns 19 ------- 20 Y_hat 21 Gap filled Y as numpy array 22 """ Now that the function is documented, we will never forget what this function does. So we can now move on to the actual function. The reason we can write this function is because the SKLearn module organizes all of it’s regressions in the same way, so the method will be called "model" whether it is a random forest or a neural net. In all cases we fit the model as: 1 model . fit ( X [~ GapMask ],Y [~ GapMask ]) where we are fitting only when we don’t have gaps. In this case the (tilda) inverts the boolean matrix, making all Trues False and all Falses True, which in our case now gives True to all indeces where we have original data. Next we can build our Y_hat variable as an array of -9999 values by first creating an array of zeros and subtracting -9999. This way, if we mess up somewhere, we can see the final values as a -9999. Now, we can fill the gaps with by making a prediction of the model with the Xvars as 1 Y_hat [ GapMask ]= model . predict ( X[ GapMask ]) where we are no longer using the tilda ( ) because we want the gap indices. We can return our Yhat at theendo f our f unc t i onandmoveback toour "C al c.py"f i le.


6.4. Actually gapfilling 67 6.4. ACTUALLY GAPFILLING With our X, Y, mask, and filling functions built, we can actually do some calculations. For this, we will need to import some more packages, namely: 1 from sklearn . ensemble import RandomForestRegressor 2 from sklearn . neural_network import MLPRegressor 3 import statsmodels . api as sm where our random forest (RandomForestRegressor) and neural network (MLPRegressor, or Multi-layer Perceptron) is from the SKLearn package and our linear model will be from the statsmodel package. As everything is set up, we can immediately call our SKLearn gap filler function as: 1 df [ yvarname +'_RF ']= Regs . GapFillerSKLearn (X ,Y , mask , RandomForestRegressor () ) and likewise for the MLPRegressor (just remember to change the df key!). Note that there are many, many options for both RandomForestRegressor and MLPRegressor that should likely be changed, but as this is a quick overview, we will just use the defaults. If you were to add the options, such as increasing to 50 trees in the random forest, it would look like this 1 df [ yvarname +'_RF ']= Regs . GapFillerSKLearn (X ,Y , mask , RandomForestRegressor ( n_estimators =50) ) Unfortunately we cannot use the same function for the linear model, as statsmodels uses a slightly different syntax (note that SKLearn also has an implementation for linear models, but it’s good to be well rounded). The statsmodels portion will look strikingly similar to our "GapFillerSKLearn" function, but with some key differences: 1 X_ols = sm . add_constant ( X) 2 df [ yvarname +'_OLS ']= Y 3 model = sm . OLS (Y [~ mask ], X_ols [~ mask ]) 4 results = model . fit () 5 df [ yvarname +'_OLS '][ mask ]= results . predict ( X_ols [ mask ]) Basically, we have to add another row to our array that acts as the intercept variable, then we run the same set of commands, but the pesky X’s and Y’s are switched in the fit command, making it too different to adapt for our "GapFillerSKLearn" function. Now, our script is basically done, and we can actually run it (in in Spyder, just press f5).


6.5. And now the plots! 68 Depending on the speed of your computer, it may take a few seconds to run, more than you might want to wait for over and over. Therefor, before we move on to the "Plots.py" file, it would be a good idea to save the data so we don’t have to run it every time. For this, we will use the "pickle" package. "pickle" does a nice job of saving python objects as binary files, which Sujan loves, so after we import the package, we can dump our pickle with: 1 pickle . dump ( df , open ( yvarname +" _GapFills . pickle ", "wb" ) ) You can notice that we save the file with our yvarname, which you will see can come in handy. 6.5. AND NOW THE PLOTS! Now we can finally move on to our "Plots.py" file, where we will need the numpy, pandas, and pickle packages. To start, we will keep things simple and just do a comparison of each gap filling method to the standard "LE_f" from the datafile. After comparing these, we will use a kernel density estimate to look at the distribution of our gap-filled values compared to the real, measured values. So in total we will have four figures. First, we will use the exact same mysterious trick that we have been using where we set the yvarname: 1 yvarname ="LE" Again, mysterious and will be useful I promise. Now we will need to load the datafile we just created from "Calc.py", but this time instead of using a dictionary, as the data is all neatly named and every vector is the same length, we can use the magic of Pandas! So as we load our pickle, we can directly convert it to a Pandas DataFrame with 1 df = pd . DataFrame . from_dict ( pickle . load ( open ( yvarname +" _GapFills . pickle ", "rb" ) )) Now, in the python or ipython console, you can explore "df" a little bit and see that it is a nice and orderly DataFrame, which R users will feel right at home in. And with this DataFrame, we can do much of our initial plotting directly, so we didn’t even have to import Matplotlib.


6.5. And now the plots! 69 6.5.1. SCATTER PLOTS! As we have three different methods to compare, we can write the plotting steps as a function so we aviod doing all that copy and pasting. Lets call our function "GapComp" and it will take the input variables df, xvar, yvar, and GapMask. First thing we will do is make our scatter plot of the gap filled values. Pandas is actually bundled with much of the plotting functionally built in, so the plot becomes one line: 1 fig = df [ GapMask ]. plot . scatter (x = xvar ,y= yvar ) Notice that we will be using our boolean array "GapMask" to index the entire DataFrame, this is the magic of Pandas. Now, we could call it a day, but what fun is a scatter plot without some lines on it. So, we will add the results of a linear regression between our gap filling and the "LE_f" using the "linregress" function from "scipy.stats" (go ahead and add it to the import list). "linregress" gives a nice output of a simple linear regression including all the standard stuff: 1 slope , intercept , r_value , p_value , std_err = linregress ( df [ GapMask ][ xvar ], df [ GapMask ][ yvar ]) Now that we have fit a model to our models, we can plot our line. We will need an x variable that can fill our line, which we can use the "numpy.linspace" command as 1 x= np . linspace ( df [ GapMask ][ yvar ]. min () , df [ GapMask ][ yvar ]. max () ) And finally, we can print our line with a nice label showing both our equation and the r 2 value with 1 fig . plot (x ,x* slope + intercept , label ="y ={0:0.4}* x +{1:0.4} , r ^2={2:0.4} " . format ( slope , intercept , r_value **2) ) 2 fig . legend () And that finishes our function. We can now plot all of our models with a neat little for loop: 1 for var in [" _RF ",'_NN ','_OLS ']: 2 GapComp (df , yvarname + var , yvarname +"_f",df . GapMask )


6.6. Bonus points! 70 6.5.2. DISTRIBUTIONS WITH KDE With the first three plot done, we can move on to our kernel density plots. again Pandas will make our lives easier as instead of "df.plot.scatter" we use "df.plot.kde". Remember we want to compare both our gap filling techniques and the "LE_f" with the distribution of the real dataset. We can start with plotting the filled dataset using only the filled values ("df.GapMask"). One fancy trick of Pandas is you can pass a list of columns, and it will plot all of them. However, because of our mysterious magic trick with "yvarname", we have to build this list with a little loop, which looks like 1 [ yvarname + ending for ending in (" _RF ",'_NN ','_OLS ',"_f") ] Now we can pass this fancy list, either as a named variable, or in a one-liner if we are even fancier, to the command 1 KDEs = df [ df . GapMask ][ ThisFancyList ]. plot . kde () where our plot is saved as the variable KDEs. Now, we have to plot our final KDE from the "LE" column, but we can no longer call it using "KDEs.plot" like we did for our line in the "GapComp" function. What we have to do then is tell the "df.plot.kde" command which plot we want it in. For this, we pass the "ax=" argument like so 1 df [~ df . GapMask ][[ yvarname ]]. plot . kde ( ax = KDEs ) and viola, our plotting is complete! There, some advanced statistics, easy as cake. 6.6. BONUS POINTS! For some bonus points, you can gap fill another variable called "NEE". NEE stands for net ecosystem exchange, and it measures how carbon comes and goes from the ecosystem. All you have to do is extract it from the netCDF (both the NEE and NEE_f), then switch out all the times you reference LE (hint, we can finally use the magic trick).


7 DATA VISUALIZATION AND PLOTTING An introduction to plotting using matplotlib and Bokeh 71


7.1. Plotting a simple figure 72 The first part of this chapter introduces plotting standard figures using matplotlib and the second part introduces interactive plotting using Bokeh. For comprehensive set of examples with source code used to plot the figure using matplotlib, click here. For the same for Bokeh, click here. 7.1. PLOTTING A SIMPLE FIGURE Read the data in the data folder using: In [267]: 1 import numpy as np 2 dat = np . loadtxt ('data /FD - Precipitation_Ganges_daily_kg . txt ') [:365] First, a figure object can be defined. fisize is the figure size in (width,height) tuple. The unit is inches. In [268]: 1 from matplotlib import pyplot as plt 2 plt . Figure ( figsize =(3 ,4) ) In [269]: 1 plt . plot ( dat ) 2 plt . show () There are several keyword arguments such as color, style and so on that control the appearance of the line object. They are listed here. The line and marker styles in matplotlib are shown in Table 7.1. For axis labels and figure title: In [270]: 1 plt . xlabel ('time ') 2 plt . ylabel ('Precip ', color ='k', fontsize =10) 3 plt . title ('One Figure ') The axis limits can be set by using xlim() and ylim() as: In [271]: 1 plt . xlim (0 ,200) 2 plt . ylim (0 ,1 e14 ) In [272]: 1 plt . text (0.1 ,0.5 , 'the first 2 text ', fontsize =12 , color ='red ', rotation =45 , va ='bottom ') 3 plt . text (0.95 ,0.95 , 'the second text ', fontsize =12 , color ='green ',ha =' right ', transform = plt . gca () . transAxes ) 4 plt . figtext (0.5 ,0.5 , 'the third text ', fontsize =12 , color ='blue ') The color and fontsize can be change. For color, use color= some color name such as 'red' or color= hexadecimal color code such as '#0000FF'. For font size, use fontsize=number (number is > 0). Also, grid lines can be turned on by using


7.2. Multiple plots in a figure 73 Table 7.1: Line and marker styles Line style Marker style Linestyle Lines Marker Signs 'Solid' — 'o' Circle 'Dashed' −− 'v' Triangle_down 'Dotted' ··· '∧' Triangle_up '<' Triangle_left '>' Triangle_right 's' Square 'h' Hexagon '+' Plus 'x' X 'd' Diamond 'p' pentagon Also, grid lines can be turned on by using In [273]: 1 plt . grid ( which ='major ', axis ='x',ls =':',lw =0.5) To set the scale to log In [274]: 1 plt . yscale ('log ') 7.2. MULTIPLE PLOTS IN A FIGURE Matplotlib has several methods to make subplots within a figure. Here are some quick examples of using the ’mainstream’ subplots. In [275]: 1 selVars =' Precipitation Runoff '. split () 2 nrows =2 3 ncols =1 4 plt . Figure ( figsize =(3 ,4) ) 5 for _var in selVars : 6 dat = np . loadtxt ('data /FD -'+ _var +' _Ganges_daily_kg . txt ') [:365] 7 spI = selVars . index ( _var ) +1 8 plt . subplot ( nrows , ncols , spI ) 9 plt . plot ( dat )


7.3. Plot with Dates 74 7.3. PLOT WITH DATES The datetime module supplies classes for manipulating dates and times. This module comes in handy when calculating temporal averages, such as monthly mean from daily time series. When these date objects are combined with dates functions of matplotlib, time series data can be plotted with axis formatted as dates. First import the necessary modules and functions. timeOp is a self made module consisting of functions to convert daily data to monthly or yearly data. In [276]: 1 import timeOp as tmop # a self made module to compute monthly data from daily data considering calendar days and so on 2 import numpy as np 3 import matplotlib as mpl 4 from matplotlib import pyplot as plt 5 from matplotlib import dates 6 import datetime 7 dat1 = np . loadtxt ('data /FD - Precipitation_Amazon_daily_kg . txt ') Now, date objects can be created using datetime module. In the current file, the data is available from 1979-01-01 to 2007-12-31. Using these date instances, a range of date object can be created by using step of dt, that is again a timedelta object from datetime. In [277]: 1 sdate = datetime . date (1979 ,1 ,1) 2 edate = datetime . date (2008 ,1 ,1) 3 dt = datetime . timedelta ( days =30.5) 4 dates_mo = dates . drange ( sdate , edate , dt ) Using the functions within tmop module, monthly and year data are created. In [278]: 1 dat_mo = np . array ([ np . mean ( _m ) for _m in tmop . day2month ( dat1 , sdate ) ]) 2 dat_y = np . array ([ np . mean ( _y ) for _y in tmop . day2year ( dat1 , sdate ) ]) Next up, we create axes instances on which the plots will be made. These axes objects are the founding blocks of all subplots like object in Python and form the basics for having as many subplots as one wants in a figure. It is defined by using axes command with [lower left x, lower left y, width, and height] as an argument. The co-ordinates and sizes are given in relative terms of figure, and thus, they vary from 0 to 1. In [279]: 1 ax1 = plt . axes ([0.1 ,0.1 ,0.6 ,0.8]) 2 ax1 . plot_date ( dates_mo , dat_mo , ls = '-', marker = None ) XWhile plotting dates, plot_date function is used with the date range as the


7.4. Scatter Plots 75 x variable and data as the y variable. Note that the sizes of x and y variables should be the same. Automatically, the axis is formatted as years. In [280]: 1 ax2 = plt . axes ([0.75 ,0.1 ,0.25 ,0.8]) 2 ax2 . plot ( dat_mo . reshape ( -1 ,12) . mean (0) ) 3 ax2 . set_xticks ( range (12) ) 4 ax2 . set_xticklabels ([ 'Jan ','Feb ','Mar ','Apr ','May ','Jun ','Jul ','Aug ', 'Sep ','Oct ','Nov ','Dec '], rotation =90) 5 plt . show () XSometimes, it is easier to set the ticks and labels manually. In this case, the mean seasonal cycle is plotted normally, and the xticks are changed to look like dates. Remember that with proper date range object, this can be achieved automatically with plot_date as well. XMatplotlib has a dedicated ticker module that handles the location and formatting of the ticks. Even though we dont go through the details, we recommend everyone to read and skim through the ticker page. 7.4. SCATTER PLOTS Let’s read the data and import the modules first: In [281]: 1 import numpy as np 2 from matplotlib import pyplot as plt 3 dat1 = np . loadtxt ('data /FD - Precipitation_Ganges_daily_kg . txt ') [:365] 4 dat2 = np . loadtxt ('data /FD - Runoff_Ganges_daily_kg . txt ') [:365] 5 dat3 = np . loadtxt ('data /FD - Evaporation_Ganges_daily_kg . txt ') [:365] Once the data is read, we can open a figure object and start adding things to it. In [282]: 1 plt . Figure ( figsize =(3 ,4) ) 2 plt . scatter ( dat1 , dat2 , facecolor = 'blue ', edgecolor = None ) 3 plt . scatter ( dat1 , dat3 , marker = 'd', facecolor ='red ', alpha =0.4 , edgewidth =0.7) 4 plt . xlabel ('Precip ( $kg \ d^{ -1}$)') 5 plt . ylabel ('Runoff or ET ($\\ frac {kg }{d})$', color ='k', fontsize =10) 6 plt . grid ( which ='major ', axis ='both ',ls =':',lw =0.5) 7 plt . title ('A scatter ') Xscatter has a slightly different name for colors. The color of the marker, and the lines around it can be set separately using facecolor or edgecolor respectively. It also allows changing the transparency using alpha argument. Note than the width of the the line around the markers is set by edgewidth and not linewidth like in plot.


7.5. Playing with the Elements 76 In [283]: 1 plt . legend (( 'Runoff ','ET ') , loc ='best ') 7.5. PLAYING WITH THE ELEMENTS Until now, it’s been a dull and standard plotting library. The figure comprises of several instances or objects which can be obtained from several methods, and then modified. This makes customization of a figure extremely fun. Here are some examples of what can be done. • The Ugly lines: The boxes around figures are stored as splines, which is actually a dictionary object with information of which line, and their properties. In the rem_axLine function of plotTools, you can see that the linewidth of some of the splines have been set to zero. In [284]: 1 import plotTools as pt 2 pt . rem_axLine () • Getting the limits of the axis from the figure. Use gca() method of pyplot to get x and y limits. In [285]: 1 ymin , ymax = plt . gca () . get_ylim () 2 xmin , xmax = plt . gca () . get_xlim () • Let’s draw that 1:1 line. In [286]: 1 plt . arrow ( xmin , ymin , xmax , ymax , lw =0.1 , zorder =0) • A legendary legend: Here is an example of how flexible a legend object can be. It has a tonne of options and methods. Sometimes, becomes a manual calibration.


7.6. Map Map Map! 77 In [287]: 1 leg = plt . legend (( 'Runoff ','ET ') , loc =(0.05 ,0.914) , markerscale =0.5 , scatterpoints =4 , ncol =2 , fancybox = True , handlelength =3.5 , handletextpad =0.8 , borderpad =0.1 , labelspacing =0.1 , columnspacing =0.25) 2 leg . get_frame () . set_linewidth (0) 3 leg . get_frame () . set_facecolor ( 'firebrick ') 4 leg . legendPatch . set_alpha (0.25) 5 texts = leg . get_texts () 6 for t in texts : 7 tI = texts . index (t ) 8 # t. set_color (cc[tI ]) 9 plt . setp ( texts , fontsize =7.83) 7.6. MAP MAP MAP! This section explains the procedure to draw a map using basemap and matplotlib. 7.6.1. GLOBAL DATA Let’s read the data that we will use to make the map. The data is stored as a big endian plain binary. It consists of float32 data values, and has unknown number of times steps, but it is at a spatial resolution of 1◦ . In [288]: 1 import numpy as np 2 datfile ='runoff .1986 -1995. bin ' 3 data = np . fromfile ( datfile , np . float32 ) . byteswap () . reshape ( -1 ,180 ,360) 4 print ( np . shape ( data ) ) Once the data is read, first a map object should be created using basemap module. In [289]: 1 from mpl_toolkits . basemap import Basemap 2 _map = Basemap ( projection ='cyl ', \ 3 llcrnrlon = lonmin , \ 4 urcrnrlon = lonmax , \ 5 llcrnrlat = latmin , \ 6 urcrnrlat = latmax , \ 7 resolution = 'c') 1. Set the projection and resolution of the background map: Xresolution: specifies the resolution of the map. 'c', 'l', 'i', 'h', 'f'or None can be used. 'c'(crude), 'l'(low), 'i'(intermediate), 'h'(high) and 'f'(full). XThe lontitude and latitude for lower left corner and upper right corner can


7.6. Map Map Map! 78 be specified by llcrnrlon, llcrnrlat, urcrnrlon and urcrnrlat: Xllcrnrlon: LONgitude of Lower Left hand CoRNeR of the desired map. Xllcrnrlat: LATitude of Lower Left hand CoRNeR of the desired map. Xurcrnrlon: LONgitude of Upper Right hand CoRNeR of the desired map. Xurcrnrlat: LATitude of Upper Right hand CoRNeR of the desired map. In the current case, the latitude and longitude of the lower left corner of the map are set at the following values: In [290]: 1 latmin = -90 2 lonmin = -180 3 latmax =90 4 lonmax =180 2. To draw coastlines, country boundaries and rivers: In [291]: 1 _map . drawcoastlines ( color ='k', linewidth =0.8) Xcoastlines with black color and linewidth 0.8. In [292]: 1 _map . drawcountries ( color ='brown ', linewidth =0.3) Xdraws country boundaries. In [293]: 1 _map . drawrivers ( color ='navy ', linewidth =0.3) Xdraws major rivers 3. To add longitude and latitude labels: In [294]: 1 latint =30 2 lonint =30 3 parallels = np . arange ( latmin + latint , latmax , latint ) 4 _map . drawparallels ( parallels , labels =[1 ,1 ,0 ,0] , dashes =[1 ,3] , linewidth =.5 , color ='gray ', fontsize =3.33 , xoffset =13) 5 meridians = np . arange ( lonmin + lonint , lonmax , lonint ) 6 _map . drawmeridians ( meridians , labels =[1 ,1 ,1 ,0] , dashes =[1 ,3] , linewidth =.5 , color ='gray ', fontsize =3.33 , yoffset =13) Xarange: Defines an array of latitudes (parallels) and longitude (meridians) to be plotted over the map. In the above example, the parallels (meridians) are drawn from 90◦S to 90 ◦N in every 30◦ (from -180◦ to 180◦ in every 30◦ ).


7.6. Map Map Map! 79 Xcolor: Color of parallels (meridians). Xlinewidth: Width of parallels (meridians). If you want to draw only axis label and donât want to draw parallels (meridians) on the map, linewidths should be 0. Xlabels: List of 4 values (default [0,0,0,0]) that control whether parallels are labelled where they intersect the left, right, top or bottom of the plot. For e.g., labels=[1,0,0,1] will cause parallels to be labelled where they intersect the left and bottom of the plot, but not the right and top. Xxoffset: Distance of latitude labels against vertical axis. Xyoffset: Distance of longitude labels against horizontal axis. In the example program, the lines and ticks around the map are also removed by In [295]: 1 import plotTools as pt 2 pt . rem_axLine ([ 'right ','bottom ','left ','top ']) 3 pt . rem_ticks () Now the data are plotted over the map object as: In [296]: 1 from matplotlib import pyplot as plt 2 fig = plt . figure ( figsize =(9 ,7) ) 3 ax1 = plt . subplot (211) 4 _map . imshow ( np . ma . masked_less ( data . mean (0) ,0.) , cmap = plt . cm . jet , interpolation ='none ', origin ='upper ', vmin =0 , vmax =200) 5 plt . colorbar ( orientation ='vertical ', shrink =0.5) 6 ax2 = plt . axes ([0.18 ,0.1 ,0.45 ,0.4]) 7 data_gm = np . array ([ np . ma . masked_less ( _data ,0) . mean () for _data in data ]) 8 plt . plot ( data_gm ) 9 data_gm_msc = data_gm . reshape ( -1 ,12) . mean (0) 10 pt . rem_axLine () 11 ax3 = plt . axes ([0.72 ,0.1 ,0.13 ,0.4]) 12 plt . plot ( data_gm_msc ) 13 pt . rem_axLine () 14 plt . show () XA subplot can be combined with axes in a figure. In this case, a global mean of runoff and its mean seasonal scyle are plotted at axes ax2 and ax3, respectively. 7.6.2. CUSTOMIZING A COLORBAR • To specify orientation of colorbar,


7.6. Map Map Map! 80 In [297]: 1 colorbar () Xdefault orientation is vertical colorbar on right side of the main plot. In [298]: 1 colorbar ( orientation ='h') Xwill make a horizontal colorbar below the main plot. • To specify the area fraction of total plot area occupied by colorbar: In [299]: 1 colorbar () Xdefault fraction is 0.15 (see Fig. ??). In [300]: 1 colorbar ( fraction =0.5) X50% of the plot area is used by colorbar (see Fig. ??). • To specify the ratio of length to width of colorbar: In [301]: 1 colorbar ( aspect =20) Xlength:width = 20:1. Various other colormaps are available in python. Fig. 7.1 shows some commonly used colorbars and the names for it. More details of the options for colorbar can be found here. Figure 7.1: Some commonly used colormaps For a list of all the colormaps available in python, click here.


Click to View FlipBook Version