This article will give you a brief overview of the process to extract information from survey files in Microsoft Word format using PERL and Win32:OLE.
1. Introduction
This article will give you a brief overview of the process to extract information from survey files in Microsoft Word format. Article assumes the following environment satisfied:
- You are using a Microsoft Windows based system.
- You have Microsoft Word installed on the machine.
- You have PERL installed on the machine. (If not, you can download PERL for Windows from ActiveState website).
If you already have your survey completed, and you are looking for a way to extract the data, please go ahead and jump to section 3. Otherwise, continue reading through section 2.
2. Preparing Your Survey Files
Having your survey form fields labeled correctly will facilitate the data extraction process. In the example below, we have four check-boxes associated with a single survey question.

Â
In order to label a check-box, double click on it, which will show you the Check-box Properties dialog. Under the Field Settings group, you will see a parameter named Bookmark. This parameter should have default value such as Check1. Change this to a more meaningful value, such as Q1_Computer, which will make much more sense when we are extracting the data. Repeat the same procedure for the rest of the check-boxes in your survey and make sure that the labels are unique.

Â
Your survey file is now ready to get distributed to your audience. After you get your survey files back, continue reading through section 3 which will show you the basics to extract the data from survey files.
3. Extracting Data From Survey Files
In this section we will be writing a custom PERL script to extract the data from the survey files. Go to your favorite text editor (vi, emacs, notepad, edit, etc.), copy and paste the following code skeleton, or download survey.pl.
#
# @(#) survey.pl 1.0 03/13/04
#
# Copyright (c) 2004
# Ali Onur Cinar &060;cinar(a)zdo.com&062;
#
# License:
#
# Permission to use, copy, modify, and distribute this software and its
# documentation for non-commercial use and without fee is hereby granted
# provided that the above copyright notice appear in all copies and that
# both the copyright notice and this permission notice and warranty
# disclaimer appear in supporting documentation, and that the name of
# Ali Onur Cinar not be used in advertising or publicity pertaining to
# distribution of the software without specific, written prior permission.
#
use Win32::OLE;
use Cwd;
@questions = # questions
(
{
'name' => 'Question 1', # question's name
'maxAnswers' => 1, # for single question
'noAnswers' => 0,
'choices' => # answer choices
{
'Q1_A' => 0,
'Q1_B' => 0
}
},
{
'name' => 'Question 2',
'maxAnswers' => 2,
'noAnswers' => 0,
'choices' =>
{
'Q2_A' => 0,
'Q2_B' => 0,
'Q2_C' => 0,
'Q2_D' => 0
}
}
);
$cwd = getcwd(); # current directory
$cwd =~ s!/!\\\\!g; # path separators
for $file (glob("*.doc")) # for each survey
{
$doc = Win32::OLE->GetObject("$cwd\\$file") # open survey file
or die Win32::OLE->LastError();
for $question (@questions) # for each question
{
$nAnswers = 0; # init num of answers
for $choice (keys(%{$question->{'choices'}})) # go trough answers
{
if ($doc->FormFields($choice)->Result eq 1) # if choice is selected
{
$question->{'choices'}->{$choice}++; # add it to count
$nAnswers++; # num of answers
}
last if $nAnswers ge $question->{'maxAnswers'}; # stop if max answers
} # reached
if ($nAnswers eq 0) # if no answer found
{ # increment the no
$question->{'noAnswers'}++; # answer count
}
}
$doc->close(); # close survey file
}
for $question (@questions) # for each question
{
print $question->{'name'}."\n"; # question name
for $choice (keys(%{$question->{'choices'}})) # for each choice
{
printf(" %-15s\t%-5d\n", $choice, # answer count
$question->{'choices'}->{$choice});
}
printf(" %-15s\t%-5d\n\n", "Not Answered", # no answer count
$question->{'noAnswers'});
}
The structure given above defines a single survey question named Question 1, which has 2 choices, A and B. The maxAsnwers parameter defines how many of these choices can be selected together to form a single answer. At this point you don't have to worry about the rest of the parameters.
In order to process your survey, you should define each of your questions and their possible choices here in this data structure. The choices should have the same names that you defined in Bookmark field while you were preparing the survey file in section 2.
If you didn't prepare your survey file based on section 2, then you should open one of your survey files, and go trough the properties of each choice and mark the name in Bookmark field somewhere. These names should then be used to prepare the data structure.
When you are ready, copy all of the survey files and the survey.pl PERL script to the same directory. Start the program by typing:
c:\survey\> perl survey.pl
You should then get an output similar to the following depending on your survey:
Question 1
 Q1_B                 2
 Q1_A                 1
 Not Answered         1
Question 2
 Q2_A                 0
 Q2_B                 2
 Q2_C                 0
 Q2_D                 1
 Not Answered         0
You can download a few test survey files as survey_docs.zip, and corresponding survey.pl as an example application.
The survey.pl PERL script can be enchanced to process form fields other than check-boxes, such as text-fields with numbers with a minor modification in code.
Â