Logo 
Search:

MS Office Forum

Ask Question   UnAnswered
Home » Forum » MS Office       RSS Feeds

What is a text file parser? How does it work, basically?

  Asked By: Marjorie    Date: Sep 29    Category: MS Office    Views: 737
  

Reading about this has given me an idea. As a family genealogist, I am
always interested in obituaries for the information they provide. Could /
Would such a such accept something like this for input:

[Copied from the Indianapolis Star, copyright Indianapolis Star]
Mardenna Johnson Hunter 96, passed away December 22, 2006. Mardenna had many
interests. She painted many works of art in oil and watercolor, and won
acclaim in a national historical art contest with one of her paintings.
Mardenna also studied genealogy. She wrote three books on early families
(Singletons, Pattons and Howland) and helped others with their family trees.
She conducted classes on how to trace your ancestry, with her most popular
talk being how to trace back to Adam and Eve (Hunter, Johnson and Related
Families). Mardenna was born at home June 23, 1910, in Indianapolis, to
Emsley W. Johnson Sr., attorney and founder of Speedway State Bank. Her
mother was Katherine Griffin Johnson, who was an elementary school
principal. Mardenna had one brother, the late Emsley W. Johnson Jr., Judge
of Superior Court Room 4. She graduated from Shortridge High School in 1928
and received a Bachelor's of Science degree in 1932 from Butler University.
She was a member of Kappa Alpha Theta sorority. Mardenna married Curtis
Hunter in 1937, and he preceded her in death in 2004. She was owner and
officer of Hunter Homes, Choice Properties, and General Hunter Apartments.
She also worked at Speedway State Bank. Mardenna traveled extensively,
visiting her ancestral home in Scotland, England and Wales. She also
traveled to Canada, Mexico, China, Central America and South America. She
was an 83 year member of Tabernacle Presbyterian Church. Her other
organization memberships included the Indiana Mayflower Society,
Indianapolis Propylaeum, People of Vision, Keep Indiana Beautiful, The Doers
Club, The Indiana Historical Society, Indianapolis Symphony Society, Women's
Department Club, Clowes Hall Women's Association and the Caroline Scott
Harrison Chapter, Daughters of the American Revolution. Mardenna is survived
by her son, Winston L. Hunter (Mary Beth); daughters, Virginia H. Browning
(Scott J.) and Diana J. Hunter; grandchildren, Charlton S. Browning, Audrey
E. Neucks, and Carson W. Hunter; and great-grandchildren, Taylor E.
Browning, Lauren M. Browning, Hunter Scott Browning, and Jacob J. Neucks.
Funeral services will be at 11 a.m. Thursday, December 28, 2006 at
Tabernacle Presbyterian Church, with visitation on from 4 p.m. to 7 p.m.
Wednesday, December 27, 2006 at Flanner and Buchanan Funeral Center - Broad
Ripple. In lieu of flowers, donations may be made to Tabernacle Presbyterian
Church, 418 E. 34 St., Indianapolis, IN 46205-3795. Burial will be in Crown
Hill Cemetery.
[End]

Then from that, “text mine” it for the following information:
Name: Mardenna Johnson Hunter
Age: 96
Death date: December 22, 2006
Given birth date: None
Est. birth date: 1910
Spouse Name: Not found
Brother: Emsley W. Johnson Jr.
Education: Butler University, 1932
Son: Winston L. Hunter (Wife of son: Mary Beth)
Daughter: Browning, Virginia H.
Daughter: Hunter, Diana J.
Grandchild: Browning, Charlton S.
Grandchild: Flanner, Carson
Grandchild: Neuks, Audrey E.
Ggrandchild: Browning, Hunter
Ggrandchild: Browning, Lauren M.
Ggrandchild: Browning, Taylor E.
Ggrandchild: Neuks, Jacob J.
Funeral service site: Flanner and Buchanan Funeral Center - Broad Ripple
Cemetery: Crown Hill Cemetery
Burial Date: December 27, 2006
Organizations: Indiana Mayflower Society, Indianapolis Propylaeum, People of
Vision, Keep Indiana Beautiful, The Doers Club, The Indiana Historical
Society, Indianapolis Symphony Society, Women's Department Club, Clowes Hall
Women's Association and the Caroline Scott Harrison Chapter, Daughters of
the American Revolution
Sorority: Kappa Alpha Theta


Does text parsing have the ability to do something like that? Wow, if a text
parser will do that, this is something I want to learn. This would be a
wonderful way of summarizing obit information to just the items of
genealogical interest. What should I do to learn about parsing? Gleaning
information like that would increase the accuracy of information a
genealogist types into his / her genealogy program.

Share: 

 

3 Answers Found

 
Answer #1    Answered By: Cais Nguyen     Answered On: Sep 29

text  parser basically takes a continuous flow of text-based input
and breaks it down or extracts it into various pieces. The key to
that extraction is having recognizable (and consistent) delimiters or
patterns.

I think the biggest problem you'd have would be whether or not various
obituaries have a consistent enough pattern to extract the data,
especially across various publications. You could certainly look for
key phrases like "passed away" or "born" or "survived by" to parse the
data into pieces, then parse those apart with other delimiters. For
example, once you found "survived by", a semi-colon could be used to
delimit between each type of survivor, as in:

That last line could then be parsed by the comma delimiter, giving you:

Clean it up to remove the "and" and covert the first word of
"grandchildren" to "Grandchild" and you'd end up with:

The number of variations on names may make it difficult to determine
exactly what the last name is. In this case, the pattern is
straightforward, but if you start adding "Jr." or "III" or two word
last names like "Le Clair", it gets more difficult.

However, I would suspect most algorithms you need already exist on the
net. I suspect such a routine has already been needed for one thing
or another...

 
Answer #2    Answered By: Jaspreet Kapoor     Answered On: Sep 29
 
Answer #3    Answered By: Elaine Stevens     Answered On: Sep 29

That is great. Thank you.................

 
Didn't find what you were looking for? Find more on What is a text file parser? How does it work, basically? Or get search suggestion and latest updates.




Tagged: