International Institute of Information Technology, Pune
Department of Computer Engineering
Systems Programming & Operating Systems
Unit – III 1
Case Study: Overview of LEX and YACC
Prof. Deptii Chaudhari
Assistant Professor
Department of Computer Engineering
LEX & YACC
• What is Lex?
• Lex is officially known as a "Lexical Analyser".
• It's main job is to break up an input stream into more usable elements.
• Or in, other words, to identify the "interesting bits" in a text file.
• What is Yacc?
• Yacc is officially known as a "parser".
• In the course of it's normal work, the parser also verifies that the input is
syntactically sound.
• YACC stands for "Yet Another Compiler Compiler". This is because this
kind of analysis of text files is normally associated with writing compilers.
Deptii Chaudhari, Dept of Computer Engineering, Hope Foundation’s International Institute of Information Technology, I²IT P-14,Rajiv Gandhi Infotech Park 2
MIDC Phase 1, Hinjawadi, Pune – 411057 Tel - +91 20 22933441/2/3 | www.isquareit.edu.in | [email protected]
Deptii Chaudhari, Dept of Computer Engineering, Hope Foundation’s International Institute of Information Technology, I²IT P-14,Rajiv Gandhi Infotech Park 3
MIDC Phase 1, Hinjawadi, Pune – 411057 Tel - +91 20 22933441/2/3 | www.isquareit.edu.in | [email protected]
LEX Program Structure
Definitions %{
C global variables, prototype, Comments
%}
Production %% ------------------------------------%%
Rules
User Subroutine
Section
(Optional)
Deptii Chaudhari, Dept of Computer Engineering, Hope Foundation’s International Institute of Information Technology, I²IT P-14,Rajiv Gandhi Infotech Park 4
MIDC Phase 1, Hinjawadi, Pune – 411057 Tel - +91 20 22933441/2/3 | www.isquareit.edu.in | [email protected]
• In the rules section, each rule is made up of two parts : a pattern and an action
separated by whitespace.
• The lexer that lex generates will execute the action when it recognizes the
pattern.
• The user subroutine section, consists of any legal C code.
• Lex copies it to the C file after the end of the lex generated code.
• Lex translates the Lex specification into C source file called lex.yy.c which
we compile and link with lex library –ll.
• Then we can execute the resulting program to check that it works as we
expected.
Deptii Chaudhari, Dept of Computer Engineering, Hope Foundation’s International Institute of Information Technology, I²IT P-14,Rajiv Gandhi Infotech Park 5
MIDC Phase 1, Hinjawadi, Pune – 411057 Tel - +91 20 22933441/2/3 | www.isquareit.edu.in | [email protected]
Example
%{
#include <stdio.h>
%}
%%
[0123456789]+ printf("NUMBER\n");
[a-zA-Z][a-zA-Z0-9]* printf("WORD\n");
%%
• Running the Program
$ lex example_lex.l
gcc lex.yy.c –ll
./a.out
Deptii Chaudhari, Dept of Computer Engineering, Hope Foundation’s International Institute of Information Technology, I²IT P-14,Rajiv Gandhi Infotech Park 6
MIDC Phase 1, Hinjawadi, Pune – 411057 Tel - +91 20 22933441/2/3 | www.isquareit.edu.in | [email protected]
Pattern Matching Primitives
Metacharacter Matches
. any character except newline
\n newline
* zero or more copies of the preceding expression
+ one or more copies of the preceding expression
? zero or one copy of the preceding expression
^ beginning of line
$ end of line
a|b a or b
(ab)+ one or more copies of ab (grouping)
"a+b" literal "a+b" (C escapes still work)
[] character class
Deptii Chaudhari, Dept of Computer Engineering, Hope Foundation’s International Institute of Information Technology, I²IT P-14,Rajiv Gandhi Infotech Park 7
MIDC Phase 1, Hinjawadi, Pune – 411057 Tel - +91 20 22933441/2/3 | www.isquareit.edu.in | [email protected]
Pattern Matching Examples
Expression Matches
abc abc
abc* ab abc abcc abccc ...
abc+ abc, abcc, abccc, abcccc, ...
a(bc)+ abc, abcbc, abcbcbc, ...
a(bc)? a, abc
[abc] one of: a, b, c
[a-z] any letter, a through z
[a\-z] one of: a, -, z
[-az] one of: - a z
[A-Za-z0-9]+ one or more alphanumeric characters
[ \t\n]+ whitespace
[^ab] anything except: a, b
[a^b] a, ^, b
[a|b] a, |, b
a|b a, b
Deptii Chaudhari, Dept of Computer Engineering, Hope Foundation’s International Institute of Information Technology, I²IT P-14,Rajiv Gandhi Infotech Park 8
MIDC Phase 1, Hinjawadi, Pune – 411057 Tel - +91 20 22933441/2/3 | www.isquareit.edu.in | [email protected]
Operation of yylex()
• When lex compiles the input specification, it generates the
C file lex.yy.c that contains the routine int yylex(void).
• This routine reads the input string trying to match it with
any of the token patterns specified in the rules section.
• On a match associated action is executed.
• When we call yylex() function, it starts the process of
pattern matching.
• Lex keeps the matched string into the address pointed by
pointer yytext.
• Matched string's length is kept in yyleng while value of
token is kept in variable yylval.
Deptii Chaudhari, Dept of Computer Engineering, Hope Foundation’s International Institute of Information Technology, I²IT P-14,Rajiv Gandhi Infotech Park 9
MIDC Phase 1, Hinjawadi, Pune – 411057 Tel - +91 20 22933441/2/3 | www.isquareit.edu.in | [email protected]
%{ $ cc lex.yy.c -ll
int com=0; $ ./a.out
%} Write a C program
%% #include<stdio.h>
"/*"[^\n]+"*/" {com++;fprintf(yyout, " ");} int main()
%% {
int main()
{ int a, b;
printf("Write a C program\n"); /*float c;*/
yyout=fopen("output", "w"); printf(“Hi”);
yylex(); /*printf(“Hello”);*/
printf("Comment=%d\n",com); }
return 0; Comment=2
} $ cat output
#include<stdio.h>
int main()
{
int a, b;
printf(“Hi”);
}
Deptii Chaudhari, Dept of Computer Engineering, Hope Foundation’s International Institute of Information Technology, I²IT P-14,Rajiv Gandhi Infotech Park 10
MIDC Phase 1, Hinjawadi, Pune – 411057 Tel - +91 20 22933441/2/3 | www.isquareit.edu.in | [email protected]
Lex Predefined Variables
Deptii Chaudhari, Dept of Computer Engineering, Hope Foundation’s International Institute of Information Technology, I²IT P-14,Rajiv Gandhi Infotech Park 11
MIDC Phase 1, Hinjawadi, Pune – 411057 Tel - +91 20 22933441/2/3 | www.isquareit.edu.in | [email protected]
YACC
• YACC is a parser generator that takes an input file with
an attribute-enriched BNF (Backus – Naur Form) grammar
specification.
• It generates the output C file y.tab.c containing the
function int yyparse(void) that implements its parser.
• This function automatically invokes yylex() everytime it
needs a token to continue parsing.
Deptii Chaudhari, Dept of Computer Engineering, Hope Foundation’s International Institute of Information Technology, I²IT P-14,Rajiv Gandhi Infotech Park 12
MIDC Phase 1, Hinjawadi, Pune – 411057 Tel - +91 20 22933441/2/3 | www.isquareit.edu.in | [email protected]
Deptii Chaudhari, Dept of Computer Engineering, Hope Foundation’s International Institute of Information Technology, I²IT P-14,Rajiv Gandhi Infotech Park 13
MIDC Phase 1, Hinjawadi, Pune – 411057 Tel - +91 20 22933441/2/3 | www.isquareit.edu.in | [email protected]
Structure of YACC Program
Definitions %{
Context free grammar C global variables, prototype, Comments
& action for each %}
production %% ------------------------------------%%
Subroutines/Functions
Deptii Chaudhari, Dept of Computer Engineering, Hope Foundation’s International Institute of Information Technology, I²IT P-14,Rajiv Gandhi Infotech Park 14
MIDC Phase 1, Hinjawadi, Pune – 411057 Tel - +91 20 22933441/2/3 | www.isquareit.edu.in | [email protected]
Arithmatic.l
%{
#include<stdio.h>
#include "y.tab.h"
extern int yylval;
%}
%% How To Run:
[0-9]+ { $yacc -d arithmatic.y
yylval=atoi(yytext); $lex arithmatic.l
return NUMBER;
}
[\t] ; $gcc lex.yy.c y.tab.c
[\n] return 0; $./a.out
. return yytext[0];
%%
int yywrap()
{
return 1;}
Deptii Chaudhari, Dept of Computer Engineering, Hope Foundation’s International Institute of Information Technology, I²IT P-14,Rajiv Gandhi Infotech Park 15
MIDC Phase 1, Hinjawadi, Pune – 411057 Tel - +91 20 22933441/2/3 | www.isquareit.edu.in | [email protected]
References
• https://www.epaperpress.com/lexandyacc/
• John. R. Levine, Tony Mason and Doug Brown - Lex and Yacc‖, O'Reilly
Deptii Chaudhari, Dept of Computer Engineering, Hope Foundation’s International Institute of Information Technology, I²IT P-14,Rajiv Gandhi Infotech Park 16
MIDC Phase 1, Hinjawadi, Pune – 411057 Tel - +91 20 22933441/2/3 | www.isquareit.edu.in | [email protected]
THANK YOU
For further details, please contact
Deptii Chaudhari
[email protected]
Department of Computer Engineering
Hope Foundation’s
International Institute of Information Technology, I²IT
P-14,Rajiv Gandhi Infotech Park
MIDC Phase 1, Hinjawadi, Pune – 411057
Tel - +91 20 22933441/2/3
www.isquareit.edu.in | [email protected]
17