Thursday, January 18, 2018

Blog Questionnaire - 1/17/18

We recently finished our responses for the INSPIRE Shakespeare Blog Award. They helped us think about the impact maintaining this blog, with all of our dedicated readers, has had on our research. We thought it would be interesting to share our answers with you all as well, especially since these are joint answers with input from both of us. Thank you all of accompanying us on this amazing journey, and we hope you enjoyed reading about our research!

What have you learned by blogging about your research?
Blogging our research helped us thoroughly explore our topic while maintaining a strict schedule. We wanted to put out a post about once every week, so we mapped out our expected update schedule, allowing us to also schedule which sections of our paper we hoped to complete. These deadlines allowed us to comfortably finish our work, without fear of falling behind. Additionally, we learned to collaborate as we divided blog posts and ideas between each other. By dividing up the blog posts, we were able to ensure both of us had a mutual understanding of the ideas discussed, and if there were any misunderstandings, or ideas we may have missed. Maintaining this blog also helped us think about how to present our research as well, especially in an informal, casual manner. This was especially useful when we approached potential mentors with our ideas. Finally, blogging helped us explore ideas as we could evaluate what was important to put in our paper, considering feedback we received from readers. What we were able to explore thoroughly on the blog were topics we knew we could also delve deeper into in our paper, allowing us to publish a more in-depth analysis paper.

Why do you think you deserve to win the INSPIRE Blog Award?
Blogging to both of us was a new experience, especially since neither of us have ever maintained a blog before or are in journalism. We used the blog uniquely as a tool to grow as writers and learn how to present our work. We also used the blog as a networking tool. Not only did we reach out to other potential competitors (who hopefully were INSPIREd by our topic), we also presented our blog to fellow peers, our mentor, and others who gave insightful feedback. When creating the schedule of blog posts, we made sure that our blog holisitcally reflected our project, from milestone updates (https://hackingthemalware.blogspot.com/2018/01/placehold-11018.html), to simple techniques to avoid social engineering that we found while researching (https://hackingthemalware.blogspot.com/2017/12/happy-holidays-online-shopping-safety.html). Furthermore, our blog somewhat represented an official record to the public of our project, an important artifact to maintain in the midst of rapid technological advancement, especially with AI. We would love to receive this award as a bittersweet finale to our project, but most importantly, for documenting our findings to impact science, technology, and society.

How has your research experience shaped your career or academic aspirations?
A large part of the reason we decided to try humanities research was to compare the experience to STEM research, which we are familiar with ( https://hackingthemalware.blogspot.com/2017/11/introduction-113017.html). One of our surprises was realizing that our philosophy research was as time-consuming and intensive, if not more, as STEM research. Through this research, we also found many bridges between humanities and STEM, and focused on how these two fields are interconnected. For example, in our research, we learned that the network security industry primarily focuses on creating firewalls. However, 95% of malware attacks are based off of social engineering, which attacks the user themselves, not their system. Hence, we found it much more important to protect consumers from a social angle rather than a technical viewpoint, even designing a browser-based phishing identifier using deep learning to accomplish this.

Humanities research with such strong connections to cryptography and machine learning has certainly opened a new door for us in thinking about computer science. We both intend to pursue CS, and our interest is stronger than ever, especially with the implications we have learned with this research. However, we aspire to continue exploring the humanities side of CS and technology as well, as our experience has shown us that looking at problems like network security from a different angle (predominance of viruses vs. social engineering) can show what would have a greater impact on society and what is more important to create.



Once more, thank you all for supporting us! We hope you enjoyed this insight, as well as our blog.

-James and Sohini

Wednesday, January 17, 2018

"Examining​ ​Ethical​ ​Issues​ with​ ​Malware​" Wrapup and Reflection - 1/7/17

As we're currently polishing our final paper, Sohini and I are both close to achieving a new personal milestone in our academic journey-writing our first humanities research report. Throughout the two months of our investigation in ethical issues with malware, not only have we developed a deeper understanding and appreciation for the values humanities research brings, but also a new worldview in examining problems. Whereas both of us are used to crunching numbers and making graphs for research, our examination in ethics required extensive literature review, discussing ideas among each other, and asking those in our local communities for their opinion on our research topic to gauge a holistic evaluation.

Both of us found this research enjoyable, and a relaxing break from trying to model systems with challenging mathematical equations. This process also allowed us to gain a broader perspective of computer science and artificial intelligence; living in the Silicon Valley, we are often encapsulated within the mentality that all types of technological innovation is beneficial, yet people in other places, even just outside the Bay Area, have starkingly different views. Where people in Silicon Valley live off of innovation, those in rural areas may see this as a threat to their jobs and personal stability. From this research, we both learned an important lesson for our own future pursuits in computer science: the importance ethics has behind governing ones and zeroes.

The most challenging part of our research was when we tried to come into an agreement of our different views through debating. By examining this topic through Greek philosopher's Pyrrho's lens, the boundaries between good and evil suddenly became indistinguishable. Pyrrho states that good and evil can only be assessed on a relative scale from an observer's perspective. In this case, the side making and distributing the malware can be seen as good or evil. If the observer was on this side, they could easily make the argument that distributing malware is analogous to distributing capital and making profit as income, whereas an electronics consumer sees these hackers as a threat to cyberspace.

All in all, Sohini and I learned a vast amount about the implications malware has in our digital age, and how it might proliferate expediently. With malware distribution techniques similar to those of a capital market,  the future of malware in cyberspace can only be concluded as spontaneous and uncertain. However, through improving our current anti-virus and malware prevention software and systems, a safe interconnected world glimmers in the near future.

-James

Monday, January 8, 2018

The Research Process "Examining​ ​Ethical​ ​Issues​ with​ ​Malware​" - 1/8/18

Examining​ ​Ethical​ ​Issues​ with​ ​Malware​ ​and​ ​Designing​ ​a​ Browser-Based Phishing​ ​Identifier​ ​using​ Deep​ ​Learning

This is the finalized title of our project. It's a comprehensive amalgamation of my personal interests - tech - with new areas I'm unfamiliar with - humanities. This is part one of two posts, where I will address our research process.

The first step we took after finalizing our topic was to create a very rough outline of the paper. Here's our first idea:

[REDACTED]
Of course, this has changed a considerable amount. Right after creating it, our first steps were to start completing some research pertaining to each section. I remember taking a red-eye flight to Boston (actually to visit MIT!) and instead of sleeping, just compiling links upon links of possible sites to pull information from. In fact, the end result was four entire pages of just links, single spaced. On the way back, I actually crawled through them, pulling information from the links. James and I compiled the information. We further solidified our sections by creating a set of tags for our research. Here are those tags:


The research we compiled and later edited and shortened now takes up nearly eight pages. It was interesting to learn about the beginnings of the Internet, and reading about the Morris worm, it seems incredible that one virus (laughed out of MIT ;)) could infect 10% of all computers, a number that seems huge now. We found many more laws pertaining to malware, computer fraud, and phishing than we knew or expected there to be. I, personally, have been startled by the huge number of types of malware, enough to start taking a course on Internet safety (SecurityIQ at InfoSec Institute).

We condensed our research into three different viewpoints, and we explored case studies involving social engineering (see the next post). Finally, James and I discussed how we felt about malware and the ethics behind it, having studied and research it thoroughly. We created a thesis, and then moved on to creating an identifier.

-Sohini

The Research Process "Designing​ ​a​ Browser-Based Phishing​ ​Identifier​ ​using​ Deep​ ​Learning" - 1/8/18

Examining​ ​Ethical​ ​Issues​ with​ ​Malware​ ​and​ ​Designing​ ​a​ Browser-Based Phishing​ ​Identifier​ ​using​ Deep​ ​Learning

This is the finalized title of our project. It's a comprehensive amalgamation of my personal interests - tech - with new areas I'm unfamiliar with - humanities. This is part two of two posts, where I will address our research process.

I have worked previously with network security and cryptography, taking a summer course and a later, more math-based course, on this subject, so I'm familiar with malware and the tech aspects of it. From Diffie-Hellman to El Gamal, the number theory behind malware has long intrigued me, but I was stunned when I learned that the vast majority of successful malware attacks come from social engineering. (This can be seen in both a positive and negative light - yes, firewalls are working, and our computer keeps out intruders. But that means the attacks on us are the successful ones.) Social engineering, as the name indicates, is a type of attack that relies heavily on human interaction, trying to trick people into allowing malware in. Examples of this are phishing emails (think spam filters), and people can learn to avoid infecting their computers through courses and by learning how to identify potential attacks. I became interested in learning about this other side of network security - this human side.

As we've mentioned before in this blog, James and I met at a summer research program, where we worked together in a machine learning lab (specifically, computer vision). Here, I first was intrigued by the beauty of artificial intelligence. The term used to conjure complex, even intimidating, images of thousands of lines of code and huge, clunky GPUs. While the latter is certainly true - I used my GPU over the summer as a footrest - the charm of AI and machine learning comes, in my opinion, from its simplicity in its similarities to humans and the way we learn, which is most often trial and error. Just as we learn through our mistakes, machine learning teaches computers to become accurate by adjusting their parameters as they measure their amounts of error.

So if people could be taught how to avoid social-engineering-based malware, could computers be taught this as well? After all, both are rooted in trial-and-error. To research this connection further, we decided to look at trends in social engineering, and specifically two: word frequencies, and image to word count ratios. After looking through several papers, we found several addressing the most common words found in phishing emails, and several other discussing how social engineers coerced people into giving up their most valuable information. We found PhishSim, given through SecurityIQ at InfoSec Institute. This held a gauntlet of phishing email templates, which we then stripped to just uniform text and ran through a word frequency program.

While creating out templates, we began to look up what deep learning model to use. We whittled possibilities to just two. The first was a Naive Bayes classifier, which assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature. However, we realized that some words may appear together, like free and money, but not free and shipment. Our templates were separated by type, which was an important variable not considered by the Naive Bayes. On the other hand, in a Logistic Regression, the outcome is measured with a dichotomous variable ( only two possible outcomes). The goal is to find the best fit to describe the relationship between the variables. Our two outcomes would be phishing, or not phishing, and we could input words (word frequencies) and numbers (image:word count) as characteristics of interest, which combined, would teach a computer when to output what. We decided that the Logistic Regression would be the best fit for our identifier.

Future steps are outlined in our paper - creating a data set based on the templates and then programming the actual logistic regression. The finished product would be our Browser-Based Phishing​ ​Identifier​ ​using​ Deep​ ​Learning.

Creating this identifier has been incredible, to me, because of the intersection of my interests. Bridging computer science, cryptography, and artificial intelligence, there is also an element of humanities. Learning about new types of deep learning models (we used convolutional neural networks for computer vision) was a nostalgic callback to what I did over the summer, but also a strong step in continuing to learn about machine learning. I also learned about considering the human aspects, especially when creating this type of identifier. I had to learn to think like the user - what would be alarms in a phishing email? For example, the image to word count ratio was an idea of mine I didn't see in other papers, but for me, seeing a marketing email advertising a product but no product images would be a huge red flag. Especially looking through promotional emails, most don't even include more than 10 words of text. Considering this human part was something I enjoyed as well, and I look forward to more projects in the humanities.

See the previous post.

-Sohini

Thursday, January 4, 2018

Using Deep Learning to Examine Browser-Based Malware - 1/4/18



We're almost done with our project! As we begin to wrap everything up, we are now deep into our final-stage: using a deep-learning based software to determine the presence of browser-based malware. After examining numerous examples of browser-based phishing, we found that there was often a correlation between word frequency with the type of phishing. For example, if a site contained the word "account" numerous times, the phisher's malicious intent would likely be trying to compromise your banking information. Consider this phishing site (Source: Google Images):
Image result for bank phishing

Here obviously, the phisher attempts to recreate the Bank of America identity verification page, which would likely contain the word "account" numerous times. However, this is also present in emails (Source: Google Images):
Image result for bank phishing email
Here, the word "account" appears the most once again. In this example, the phisher attempts to send an email to the victim impersonating Bank of America. In fact, the phisher even changes the email headers so that the victim would presume its legitimacy.

For our machine learning approach, we use a logistical classifier to determine whether or not a site has been modified for phishing. A logistical classifier's outcome is measured with a dichotomous variable (in which there are only two possible outcomes) through using a best fitting model to describe the relationship between the dichotomous characteristic of interest and the set of independent variables. In this case, the dichotomous characteristic is whether or not a site has been modified for phishing while the independent variable is the frequency of each word.

While many may beg the question that a phisher would make the impersonated site identical to the actual site, this is still a valid approach. Most phishers provide reassurance somewhere in their phishing site that the victim is doing everything "correctly". For example, in the email above, the phisher wrote "After a few clicks, just verify the information you entered is correct" is actually quite common. For people that are not too familiar with phishing, this statement may often mislead them to thinking that retyping their credentials multiple times is because of heightened security measures after their account was supposedly compromised.

We're excited to see how our classifier will turn out! We'll update you shortly!

-James

Tuesday, January 2, 2018

Connotation of "Hacker" - 1/2/18

What image does the word "hacker" or "hacking" create in your mind? Perhaps you see someone desperately typing away at their computer, or think of the words "Access Granted."

Google Image Results for "hacker"


Hack first came to be associated with computers and machines at MIT itself. In a transcript from a meeting of the Tech Model Railroad Club in April of 1955, there is a quote that states: “Mr. Eccles requests that anyone working or hacking on the electrical system turn the power off to avoid fuse blowing.”

The word's connotation relative to machines started off positive. The term meant just working on a problem in a creative way, relative to MIT.

In the 1960s, the definition expanded out of MIT to computer scientists and engineers in general. In fact, it held positive connotations, as evidenced by the definitions for "hacker" in the Jargon File (launched in 1975). Here are the eight definitions:



The majority of the definitions given here are approving, like 4. "A person who is good at programming quickly." But the negative connotation in 8 seems to have won out on the long run. Especially in the media and outside of the tech world, "hacker" is used maliciously. The first time the word "hacker' appears in Times reads: "Computer hackers often sell the stolen codes to other students for a few dollars."

The Computer Fraud and Abuse Act popularized this negative connotation as well, in a political sense (again outside of the tech world). It has been used in the prosecution of people like Julian Assange and Aaron Swartz.

However, its positive connotation lingers in tech culture, especially to identify others. The juxtaposition of the word's meaning proves to draw a sharp line between techies and those outside the tech world, and it will prove interesting to see how the word continues to evolve.

-Sohini