Category Archives: General Discussion

Posts about history in general!

GUI Project

After a couple of weeks of planning and development, I finished my Alpha build for my textual Analysis tool. Here is the link to my Git hub repository, feel free to clone it and give it a try!

 

https://github.com/CompSci0404/history396GraphicalProject

Exploration into Python

The last couple of weeks have been centered around learning about how programming can be beneficial for historians when it comes to conducting research. Specifically, the programming language called Python shows promise when wanting to use it to conduct historical work. Python is considered as a high-level programming language. This means that it is easy for humans to understand it, while also being easy for a computer to understand. Python sticks out on its own being very easy for beginners to pick up and use because python is constructed to be more human readable. Being able to easily read and understand python, increases the likely hood people will use it as a tool. The desirable trait of being easily understood is what makes python such a strong language because anyone can pick it up and begin programming.

My personal experiences with Python programming stems back further than just these last two weeks. I initially worked at a summer camp in the summer of 2018 where I was tasked with creating python scripts. Coming from a dominate C++ and Java background it was a bit of challenge to write code in a less robotic language such as python. Where python really shines compared to other programming languages is its removal of data types. Variables in the majority computer programming language require a data type, a data type is a certain piece of information that can be stored in the computer. The information can range from sentences and words called strings, to integer and fractional numbers.  Without the requirement for data typed variables, it opens the doors to store any kind of data in a variable. The ability to store any kind of data in variables leads to unique solutions to problems. The majority of the programming historian lessons focuses on this unique approach to programming. like a lot of lessons explore the concept of data manipulation. Humans do this all the time when a person reads a book, they are processing the information from the book, then they manipulate it to form opinions. The process of reading texts can speed up, which is shown within programming historian lessons. specifically, the stylometry lesson, where 86 federalist papers are processed by the computer and compared for specific writing styles. Interacting with these lessons, I feel like the programming historian does an excellent job at explaining how to proceed in creating pieces of code that work efficiently to complete larger analysis jobs that would take historians a long time to complete.

 

DicOfAuthor = {} # a data structure which will contain the author, and a list of all the papers for this author.

listOfAuthors = [] # a list which will be gained from the view.


def __init__(self, Authors, dataDir):
listOfFiles = []

authorFoundInPaper = False

self.listOfAuthors = Authors

counter = 0

for file in os.listdir(dataDir):

if (file.endswith(".txt")):


listOfFiles.append(file)

singleFilePath = dataDir + '/'

while (counter < len(listOfFiles)):	# counts through all the authors in the given directory.

singleFilePath += listOfFiles[counter]

with open(singleFilePath) as singleF:	# opens a file and searches for the correct author in order to store it in the dictionary.

for words in singleF:

for author in self.listOfAuthors:


if(author in self.DicOfAuthor):

self.DicOfAuthor[author].append(listOfFiles[counter])

authorFoundInPaper = True


else:

self.DicOfAuthor[author] = [listOfFiles[counter]]
authorFoundInPaper = True

#self.DicOfAuthor[self.listOfAuthors[counter]] = counter # we have now created a key for the data, time to now add papers.
singleFilePath = dataDir + '/'
counter += 1

def returnKeySizeOfDictionary(self):

return len(self.DicOfAuthor)


def toString(self):

return print(self.DicOfAuthor)

if __name__ == '__main__':
# for testing of this class!

ListOfAuthors = ["Madison", "Hamilton", "Jay"]

dataDir = "C:/Schoolprojects/history396/history396GraphicalProject/Data/data"

test = authorConstruction(ListOfAuthors, dataDir)

print(test.returnKeySizeOfDictionary())

print(test.DicOfAuthor["Madison"], "\nnext author and their writings1:\n",

test.DicOfAuthor["Hamilton"], "\nnext author and their writings2:\n",

test.DicOfAuthor["Jay"], "\nnext author and their writings3:\n")

test.toString()

A Python script I wrote which would search through files and compare if an author would match the specific file.

Programming is a skill that historians should be perusing, especially in the next century. Social history is the process of acquiring the story of the everyday person. To bring their experiences to light to aid in painting a bigger picture of a certain time in history. In the early 1900s, this process could be accomplished by finding prime sources. However not everyone kept records, and if they were kept they had to withstand the test of time before historians could possibly find it and record it. With the turn of the century, records are now being kept digitally, they are stored on large databases in which can be accessed. It would take a single person hundreds of years to access and read through all the content that has been stored on the web. That’s why it is important for historians to learn how to program. The only way to process huge amounts of data would be to build a special kind of algorithm which allows large amounts of data to be processed. Once data is easily able to be searched through, the ability to paint more accurate historical information will be presented. On a much smaller scale, this type of activity is clearly demonstrated in a programming historian lesson. The lesson From Html to List of Words Part One. This lesson explores the concept of taking websites content and being able to extract the data from the website, then to further store that data into a list and explore the text that the website contains.  This type of searching and sorting allows historians to quickly search through Benjamin Bowsey 1780 criminal trial transcript, extracting the information from it. Being able to actively interact with a piece of code to extract information from a historical document highlights the power that Python has.

Another reason as to why historians should be learning how to program is because the only person who knows how to think historically is a person who is a historian.  The software is built by programmers, which in most cases, not a historian which built the tool. This means that the majority of tools developed does not have a historical use in mind. Instead, the tool was developed for another audience which was adapted and is used by a historian. If Historians began learning how to program the tools that were built by historians would be better adapted to target the purposes of historical research.

Exploration of Historical GIS

Historical GIS it the practice of combining History with Geography. When combining both studies, it allows researchers to explore the change in maps through a certain time period. Being able to explore the map as time progresses, opens the doors to new research questions. This is shown through Geoff Cunfers article Scaling the Dust Bowl. Where the author used GIS technology to challenge the original cause of the dust bowl. Originally the Dust Bowl was thought to be caused by farmers overusing the land, plowing up the native grasslands which kept the dirt intact.  However, after some research and the aid of Historical GIS software, Geoff Cunfers and his team discover a different narrative as to the cause of the Dust Bowl. Instead what was discovered was that throughout the period the regions which had the worst dust storms were places that suffered major draught. With the aid of new technology, it gives current historians better tools to explore historical questions on a much larger scale.

One feature which gives the Historical GIS some appeal is the ability to visually represent data. After following the lessons plan found on the website called Geospatial Historian the lesson teaches the readers how to represent data of the great plains in the 1930s. The data that is given is a map of the great plains within the United States of America. The interesting feature of this map is the ability to change different aspects of the map. Firstly, if the user wanted to display different layers on the map, they easily can by selecting it on the right side of the screen. On the great plains map, the user can toggle between turning on states, or counties, or if the user wanted to keep both on they could select both. The user can also color the map and arrange map layers to be able to easily determine the difference between counties and the states.

representation of counties or states.

The data given to the reader will be used later in the lesson. The data is extracted from a zip file, which can then be uploaded and selected through GIS software programs such as ArcMap, to visually represent the data. Being able to visually represent data in an easy to read graphical picture really helps aid and support research being done. In the software ArcMap, the data is uploaded through special files called gdb files. This file contains data that can be used within ArcMap. On the ArcMap user interface, the data is represented in a fashion which is similar to Microsoft Excel. This data can be selected and manipulated to show different values of data on the map. For example, if the user wanted to display the number of women between the ages of 18-25 all that would need to be done is selecting which fields in the gdb file that is wanted to be displayed on the overall map.

The One major flaw after using the software is the coloring of data and the presentation of colors. Different shades of red can be hard to distinguish between, and at times can feel like it makes the data that the researcher is trying to show harder to read. A good example of this is representing the number of Norwegians within the Great Plains in the 1930s in the United States of America. It would seem that the use of different colors would greatly help in improving the readability of the maps.

Norwegians located in great plains 1930’s

Esri is a company which specializes in GIS, specifically they created an advanced type of GIS software called ArcGIS Online. Some of the features that can be used with ArcGIS is creating a mapping which has a built-in function called smart mapping. Smart mapping suggests to the user interacting with the software the best colors and models to include into the map. This allows for an easier to read map compared to ArcMap.  Similar to ArcMap, ArcGIS allows users to upload custom data to the mapping service. Esri also contains other mapping software which could prove useful to historians. Historians provide a narrative as to why a certain event happened in history. They work closely with people and achieves to construct reasons as to why events happen. With the technology presented by Esri, story mapping allows historians to present maps in a storytelling method. Story map gives users the tools to incorporate research data and allows the user to construct methods to present the research in a much easier format to follow and understand. It is a form of public history, similar to that of a museum. In a museum, the data is presented through panels and is supported by the object that it is explaining. This allows for people interacting with the museum to gain a much deeper understanding of the object that is being presented. Story map allows the creator to input data and then support that data through images, maps, and other useful media to get their points across.

 

bibliography

  1.  Cunfer, Geoff. “Scaling the Dust Bowl” in Placing History: How Maps, Spatial Data, and GIS Are Changing Historical Scholarship. Anne Kelly Knowles, ed. (Redlands: ESRI Press, 2008), 95-121. http://esripress.esri.com/storage/esripress/images/133/knowles.pdf.
  2. Cunfer, Geoff. “ArcGIS Lesson 1: Mapping Great Plains Population.” Geospatial Historian: Open HGIS Lessons and Resources.Accessed March 9th 2019. https://geospatialhistorian.wordpress.com/lessons/arcgis-lesson-1-mapping-great-plains-population/.
  3. Esri. “Story Maps”. Accessed March 7th 2019. https://storymaps.arcgis.com/en/.

MALLET: A Text Analysis Tool

MALLET is an intriguing piece of software developed by Andrew MacCallum, with the aid of several different graduate students and staff. MALLET is an open source project. Meaning that this tool is allowed to be used freely in research and commercial use so long as the users of the tool credit MALLET appropriately. MALLET is also a machine learning program. Machine learning is a type of advanced Artificial Intelligence or AI for short. In blatant terms, the AI will basically learn as the program runs. As the AI learns from its mistakes it will readjust its searching algorithm, in order to give better and smarter outputs. MALLET allows users to quickly extract information from documents and turn it into text-based files.  Layered upon the text conversion MALLET also supports the classification of data. Through the process of sequence tagging, MALLET allows users to tag certain data to help refine searching through large amounts of data.

One major weakness that is apparent in the MALLET system is the process of setting up and using the actual tool. The way the download process is set up, it is apparent that the user interacting with the tutorial needs to have a Computer Science background.  An example of this is if a user clicked on the Download hyperlink, the website would redirect the user to a new page with information on how to download MALLET. Well if a user wanted to download the windows version they would download a zip file. Now that the user has a zip file, it needs to be extracted and then the user needs to change the environment variable. At this stage, this is where the download tutorial becomes grey with confusion.  Where does a user find the environment variable and what is an environment variable? Well, an environment variable is a command that can be run on a system. The command needs to be set up by adding the location of that command to the computers file path on the user’s computer through command line instructions. The tutorial on the website does not explain these fundamental questions when using MALLET for the first time. If a user wants to use MALLET for research and they do not have a Computer Science background it would be difficult for the user to get the maximum benefits from the service. This is a present theme found on the MALLET website, as it is evident that the webpage is directed towards a Computer Science orientated audience.

Another weakness of MALLET is the enforcing use of the command line. Command line is very useful, it allows for Computer Scientists to quickly perform tasks without having to go through graphical windows, which at times can be tedious. However, there is a reason why the modern operating system uses modern graphical technology. It is simply because it is easy to interact with a graphical menu then a black window full of text that can only be interacted with through special commands.  This feeds off the concern that MALLET is not very user-friendly. It does not matter how advanced an algorithm is if it cannot be used by a human effectively.  Graphics have been a major push in computing, it enables everyday people to interact with systems without having to worry about the complex operations that are being done under the hood of the computer.

Strengths of the MALLET digital tool is the use of the machine learning algorithms to quickly process unlabeled text, into useable data. For a digital historian, this is huge, as it allows historians to input a large number of files and search for a certain topic. The term on the MALLET website that is used for searching through large quantities of unlabeled text is called topic modeling. To the MALLET system, a topic is a cluster of words that can be combined to search in a large sample size of data. The topic then will allow for searching for similar meanings of words in other documents which allow for connecting data together to display similarities. This is important for historians as it allows them to search for similarities in large amounts of text very quickly. The MALLET system is designed with the ease of finding data quickly and effectively offering specialized command calls to optimize search results. The code that is used for implementing optimization is interesting. To fully implement an effective optimization algorithm the code must accept values from all parameters in the search. How MALLET proceeded to encode optimization is by keeping a track of all the search parameters. The term for topic optimization is called hyperparameter optimization. Hyperparameter optimization allows for the searcher to magnify the results to ensure that certain topics considered to be more outstanding than others.

 

public class OptimizerExample implements Optimizable.ByGradientValue {

    // Optimizables encapsulate all state variables, 
    //  so a single Optimizer object can be used to optimize 
    //  several functions.

    double[] parameters;

    public OptimizerExample(double x, double y) {
        parameters = new double[2];
        parameters[0] = x;
        parameters[1] = y;
    }

    public double getValue() {

        double x = parameters[0];
        double y = parameters[1];

        return -3*x*x - 4*y*y + 2*x - 4*y + 18;

    }

    public void getValueGradient(double[] gradient) {

        gradient[0] = -6 * parameters[0] + 2;
        gradient[1] = -8 * parameters[1] - 4;

    }

    // The following get/set methods satisfy the Optimizable interface

    public int getNumParameters() { return 2; }
    public double getParameter(int i) { return parameters[i]; }
    public void getParameters(double[] buffer) {
        buffer[0] = parameters[0];
        buffer[1] = parameters[1];
    }

    public void setParameter(int i, double r) {
        parameters[i] = r;
    }
    public void setParameters(double[] newParameters) {
        parameters[0] = newParameters[0];
        parameters[1] = newParameters[1];
    }
}
Code from MALLET displaying the use of an Optimization class to implement a quadratic function.

In the end, MALLET is a very powerful tool when it comes to conducting searches on vast amounts of data. It can prove very useful for searching for commonalities amongst data very quickly. However, the in-between interaction of using the program could be and should be improved to ensure as much ease as possible when it comes to working with MALLET.  In the end, in the computing world, it is always a struggle between complexity and being able to easily use the software. Major companies spend millions sometimes even billions trying to perfect the balance between user-friendly software and optimized software.

              Bibliography

  1. McCallum, Andrew Kachites. “MALLET: A Machine Learning for Language Toolkit.” http://mallet.cs.umass.edu. 2002.
  2. Mimno, David, Charles Sutton, Gaurav Chandalia, and Al Hough. “MAchine Learning for LanguagE Toolkit.”MALLET. Accessed Feburary 6th 2019. http://mallet.cs.umass.edu/about.php.

Reviewing of a Digital History Project: Wearing Gay History

 

Wearing Gay History started out as a graduate student project from George Mason University. Eric Gonzaba took the initiative to digitize a collection of LGBT themed t-shirts from several different archives. The website has a number of different goals that they wish to highlight through the digital publication of these t-shirts.  However, the main goal of the project is to bring awareness to the LGBT community; in hopes that it can shed light on a somewhat ignored history. They hope to bring visibility to LGBT groups outside LGBT hubs such as San Francisco and New York.

The founders contribute a large number of digital goods, which is used to display an impressive number of LGBT themed shirts. Shirts can be found through a slick user interface by a simple click of a t-shirt button found near the top of the web page. If the user were to click on a t-shirt they would be welcomed with information on that t-shirt. Once the desired shirt is found, it contains a picture of that shirt and a geological location of where that shirt was worn. This is where this web site really excels.  As it offers a glimpse of where these shirts originate from, to give the user a wider glimpse of the magnitude of the LGBT community around the world. However, after some digging around, it seems that the information displayed here does not really give insight into the LGBT history. For example, let’s say a user were to click on the Roze Front 1982 Amersfoort t-shirt, the web page contains a picture of the shirt and very little information on the historical importance of the shirt. Considering one of the goals of this graduate project was to shed light on a hidden history of LGBT communities around the world; It would seem like they dropped the ball when it comes to actually show the historical importance of all the shirts. After a couple of google searches later it seemed apparent that the importance of this t-shirt was due to its connection to an event called pink Saturday. Pink Saturday is a Dutch gay pride parade that takes place every year. One of the purposes of digital content is for the convenience of the user, it allows for users to quickly gather information within a convenient location for the user.

“Roze Front 1982 Amersfoort,” Wearing Gay History, accessed January 29, 2019, http://wearinggayhistory.com/items/show/4675.

The t-shirt map is a wonderful addition to the Wearing Gay History website. It is easily found, and accessible for the user to access. The use of a geographic information system (GIS), allows for users to easily see which t-shirts belong to which areas around the world. Users can search for a t-shirt by clicking on shirts on the side of the map. The really interesting part about this feature is the ability to easily explore the map. Investigating the map, reviled that eighteen shirts that are found, are located in Saskatchewan. Which is really quite interesting, considering how small Saskatchewan is compared to other places around the world. Mapping the actual geographic location, of the t-shirts really help visualize the actual number of all the t-shirts around the world. It puts into perspective the vast number of LGBT communities that exist in different parts of the world.

 

“Dignity Canada Dignite,” Wearing Gay History, accessed January 29, 2019, http://wearinggayhistory.com/items/show/3853.

The t-shirt map is only one way to search for objects. The search engine that is used on the website is also impressive as it allows for users to quickly search different shirts. The user can refine their search by a number of different and precise conditions when searching for a specific item. The website also allows for browsing by tag, the web site has a list of different terms to help define and narrow searching. For example, if a user wanted to search for the term HIV & AIDS which is the biggest tag to search by, the search would display a variety of different shirts. Examining the shirts by tag purposes the question, why is there so many shirts that fit into the HIV & AIDS tag? A logical answer to this question would imply that shirts with HIV themes are really big issues within the LGBT community. One of the reasons why fashion exists is to try and distinguish ourselves from each other. The point of fashion is to be unique, there is a reason why the term,  fashion statement, exists; it is to state a point that is being made. Homophobia is linked to HIV, the shirts in the HIV section speak a story of trying to combat this negative perspective of HIV and LGBT communities.

The wearing gay history website is an interesting history project which successfully brings attention to the users the presence of LGBT communities around the world. The website has an impressive number of t-shirts which are recorded through the means of digital technology. With the aid of GIS technology, the data that is recorded on this site could lead to new historical questions surrounding LGBT history.

 

 

 

“Action = Life,” Wearing Gay History, accessed January 29, 2019, http://wearinggayhistory.com/items/show/4649. HIV & AIDS themed t-shirt.

Bibliography

  1. Gonzaba, Eric, Nolan. “Wearing Gay History.”Accessed Januray 28th,         2019. http://wearinggayhistory.com/.