Common Gateway Interface
Web Technologies
Piero Fraternali
Outline
• Architectures for dynamic content
publishing
– CGI
– Java Servlet
– Server-side scripting
– JSP tag libraries
Motivations
• Creating pages on the fly based on the user’s
request and from structured data (e.g.,
database content)
• Client-side scripting & components do not
suffice
– They manipulate an existing document/page, do
not create a new one from strutured content
• Solution:
– Server-side architectures for dynamic content
production
Common Gateway Interface
• An interface that allows the Web Server to launch
external applications that create pages dynamically
• A kind of «double client-server loop»
What CGI is/is not
• Is is not
– A programming language
– A telecommunication protocol
• It is
– An interface between the web server and tha applications that
defines some standard communication variables
• The interface is implemented through system variables, a
universal mechanism present in all operating systems
• A CGI program can be written in any programming language
Invocation
• The client specifies in the URI the name of
the program to invoke
• The program must be deployed in a
specified location at the web server (e.g.,
the cgi-bin directory)
– http://my.server.web/cgi-bin/xyz.exe
Execution
• The server recognizes from the URI that
the requested resource is an executable
– Permissions must be set in the web server for
allowing program execution
– E.g., the extensions of executable files must
be explicitly specified
• http://my.server.web/cgi-bin/xyz.exe
Execution
• The web server decodes the paramaters
sent by the client and initializes the CGI
variables
• request_method, query_string, content_length,
content_type
• http://my.server.web/cgi-bin/xyz.exe?par=val
Execution
• The server lauches the program in a new
process
Execution
• The program executes and «prints» the
response on the standard output
Execution
• The server builds the response from the
content emitted to the standard output and
sends it to the client
Handling request parameters
• Client paramaters can be sent in two ways
– With the HTTP GET method
• parameters are appended to the URL (1)
• http://www.myserver.it/cgi-bin/xyz?par=val
– With the HTTP POST method
• Parameters are inserted as an HTTP entity in the
body of the request (when their size is substantial)
• Requires the use of HTML forms to allow users
input data onto the body of the request
–
(1) The specification of HTTP does not specify any maximum URI length,
practical limits are imposed by web browser and server software
HTML Form
<HTML>
<BODY>
<FORM
action="http://www.mysrvr.it/cgi-bin/xyz.exe"
method=post>
<P> Tell me your name:<p>
<P><INPUT type="text"
NAME="whoareyou"> </p>
<INPUT type="submit"
VALUE="Send">
</FORM>
</BODY>
</HTML>
Structure of a CGI program
Read environment variable
Execute business logic
Print MIME heading
Print HTML markup
"Content-type: text/html"
Parameter decoding
Read variable
Request_method
Read variable
Query_string
Read variable
content_length
Read content_length
bytes from the
standard input
CGI development
• A CGI program can be written in any programming language:
–
–
–
–
–
–
C/C++
Fortran
PERL
TCL
Unix shell
Visual Basic
• In case a compiled programming language is used, the
source code must be compiled
– Normally source files are in cgi-src
– Executable binaries are in cgi-bin
• If instead an interpreted scripting language is used the source
files are deployed
– Normally in the cgi-bin folder
Overview of CGI variables
• Clustered per type:
– server
– request
– headers
Server variables
• These variables are always available, i.e.,
they do not depend on the request
– SERVER_SOFTWARE: name and version of
the server software
• Format: name/version
– SERVER_NAME: hostname or IP of the
server
– GATEWAY_INTERFACE: supported CGI
version
• Format: CGI/version
Request variables
• These variables depend on the request
– SERVER_PROTOCOL: transport protocol name
and version
• Format: protocol/version
– SERVER_PORT: port to which the request is
sent
– REQUEST_METHOD: HTTP request method
– PATH_INFO: extra path information
– PATH_TRANSLATED: translation of PATH_INFO
from virtual to physical
– SCRIPT_NAME: invoked script URL
– QUERY_STRING: the query string
Other request variables
• REMOTE_HOST: client hostname
• REMOTE_ADDR: client IP address
• AUTH_TYPE: authentication type used by
the protocol
• REMOTE_USER: username used during the
authentication
• CONTENT_TYPE: content type in case of
POST and PUT request methods
• CONTENT_LENGTH: content length
Environment variables: headers
• The HTTP headers contained in the request
are stored in the environment with the prefix
HTTP_
– HTTP_USER_AGENT: browser used for the
request
– HTTP_ACCEPT_ENCODING: encoding type
accepted by the client
– HTTP_ACCEPT_CHARSET: charset accepted by
the client
– HTTP_ACCEPT_LANGUAGE: language
accepted by the client
CGI script for inspecting variables
#include <stdlib.h>
#include <stdio.h>
int main (void){
printf("content-type: text/html\n\n");
printf("<html><head><title>Request variables</title></head>");
printf("<body><h1>Some request header variables:</h1>");
fflush(stdout);
printf("SERVER_SOFTWARE: %s<br>\n",getenv("SERVER_SOFTWARE"));
printf("GATEWAY_INTERFACE: %s<br>\n",getenv("GATEWAY_INTERFACE"));
printf("REQUEST_METHOD: %s<br>\n",getenv("REQUEST_METHOD"));
printf("QUERY_STRING: %s<br>\n",getenv("QUERY_STRING"));
printf("HTTP_USER_AGENT: %s<br>\n",getenv("HTTP_USER_AGENT"));
printf("HTTP_ACCEPT_ENCODING: %s<br>\n",getenv("HTTP_ACCEPT_ENCODING"));
printf("HTTP_ACCEPT_CHARSET: %s<br>\n",getenv("HTTP_ACCEPT_CHARSET"));
printf("HTTP_ACCEPT_LANGUAGE: %s<br>\n",getenv("HTTP_ACCEPT_LANGUAGE"));
printf("HTTP_REFERER: %s<br>\n",getenv("HTTP_REFERER"));
printf("REMOTE_ADDR: %s<br>\n",getenv("REMOTE_ADDR"));
printf("</body></html>");
return 0;
}
Example output
Problems with CGI
• Performance and security issues in web server to
application communication
• When the server receives a request, it creates a new
process in order to run the CGI program
• This requires time and significant server resources
• A CGI program cannot interact back with the web server
• The process of the CGI program is terminated when
the program finishes
• No sharing of resources between subsequen calls (e.g., reuse
of database connections)
• No main memory preservation of the user’s session (database
storage is necessary if session data are to be preserved)
• Exposing to the web the physical path to an
executable program can breach security
Riferimenti
• CGI reference:
– http://www.w3.org/CGI/
• Security and CGI:
– http://www.w3.org/Security/Faq/index.html
Esempio completo
Form.html
1. Prima
richiesta
2. Recupero
risorsa
3. Risposta
5. Set variabili
d'ambiente e
chiamata
4. Seconda
richiesta
7. Invio
risposta
6. Calcolo Mult.cgi
risposta
Mult.c
Mult.cgi
Form.html
Precedentemente
compilato in...
La form (form.html)
<HTML>
<HEAD><TITLE>Form di
moltiplicazione</TITLE><HEAD>
<BODY>
URL
chiamata
<FORM ACTION="http://www.polimi.it/cgi-bin/run/mult.cgi">
<P>Introdurre i moltiplicandi</P>
<INPUT NAME="m" SIZE="5"><BR/>
<INPUT NAME="n" SIZE="5"><BR/>
<INPUT TYPE="SUBMIT" VALUE="Moltiplica">
</FORM>
<BODY>
</HTML>
Vista in un
browser
#include <stdio.h>
#include <stdlib.h>
int main(void){
Lo script
Istruzioni di
stampa della
risposta
sull'output
char *data;
long m,n;
printf("%s%c%c\n", "Content-Type:text/html;charset=iso-88591",13,10);
Recupero di
printf("<HTML>\n<HEAD>\n<TITLE>Risultato
valori dalle
moltiplicazione</TITLE>\n<HEAD>\n");
variabili
printf("<BODY>\n<H3>Risultato moltiplicazione</H3>\n");
d'ambiente
data = getenv("QUERY_STRING");
if(data == NULL)
printf("<P>Errore! Errore nel ricevere i dati dalla form.</P>\n");
else if(sscanf(data,"m=%ld&n=%ld",&m,&n)!=2)
printf("<P>Errore! Dati non validi. Devono essere numerici.</P>\n");
else
printf("<P>Risultato: %ld * %ld = %ld</P>\n",m,n,m*n);
printf("<BODY>\n");
return 0;
}
Compilazione e test locale
Set manuale della
• Compilazione:
$ gcc -o mult.cgi mult.c
• Test locale:
variabile
d'ambiente
contenente la
query string
$ export QUERY_STRING="m=2&n=3"
$ ./mult.cgi
• Risultato:
Content-Type:text/html;charset=iso-8859-1
<HTML>
<HEAD>
<TITLE>Risultato moltiplicazione</TITLE>
<HEAD>
<BODY>
<H3>Risultato moltiplicazione</H3>
<P>Risultato: 2 * 3 = 6</P>
<BODY>
Considerazioni su CGI
• Possibili problemi di sicurezza
• Prestazioni (overhead)
– creare e terminare processi richiede tempo
– cambi di contesto richiedono tempo
• Processi CGI:
– creati a ciascuna invocazione
– non ereditano stato di processo da invocazioni
precedenti (e.g., connessioni a database)
Riferimenti
• CGI reference:
http://hoohoo.ncsa.uiuc.edu/cgi/overview.ht
ml
• Sicurezza e CGI:
http://www.w3.org/Security/Faq/wwwsf4.ht
ml
Scarica

PPT - SUNFACE.in