2012年11月28日星期三

Social Network Analysis & PageRank, HITS


Social Network Analysis

These two weeks, we have learned an effective method to find the pattern of interaction between actors in a social network, named social network analysis.

Social network analysis is the mapping and measuring of relationships and flows between people, groups, organizations, computers, URLs, and other connected information/knowledge entities. The nodes in the network are the people and groups while the links show relationships or flows between the nodes. SNA provides both a visual and a mathematical analysis of human relationships. To understand networks and their participants, we evaluate the location of actors in the network. Measuring the network location is finding the centrality of a node. These measures give us insight into the various roles and groupings in a network -- who are the connectors, mavens, leaders, bridges, isolates, where are the clusters and who is in them, who is in the core of the network, and who is on the periphery.

Two core notions in network analysis are centrality and prestige—as we will see, both are used in the link analysis for the Web. Informally, a node (or page or actor or resource) is central if it is involved in many links.
To have an actual understanding of the different categories of centrality, we see a sociogram first:

We look at a social network -- the "Kite Network" above -- developed by David Krackhardt, a leading researcher in social networks. Two nodes are connected if they regularly talk to each other, or interact in some way. Andre regularly interacts with Carol, but not with Ike. Therefore Andre and Carol are connected, but there is no link drawn between Andre and Ike. This network effectively shows the distinction between the three most popular individual centrality measures: Degree Centrality, Betweenness Centrality, and Closeness Centrality.

Degree Centrality: Social network researchers measure network activity for a node by using the concept of degrees -- the number of direct connections a node has. In the kite network above, Diane has the most direct connections in the network, making hers the most active node in the network. She is a 'connector' or 'hub' in this network. Common wisdom in personal networks is "the more connections, the better." This is not always so. What really matters is where those connections lead to -- and how they connect the otherwise unconnected! Here Diane has connections only to others in her immediate cluster -- her clique. She connects only those who are already connected to each other.
Betweenness Centrality: While Diane has many direct ties, Heather has few direct connections -- fewer than the average in the network. Yet, in may ways, she has one of the best locations in the network -- she is between two important constituencies. She plays a 'broker' role in the network. The good news is that she plays a powerful role in the network, the bad news is that she is a single point of failure. Without her, Ike and Jane would be cut off from information and knowledge in Diane's cluster. A node with high betweenness has great influence over what flows -- and does not -- in the network. Heather may control the outcomes in a network. That is why I say, "As in Real Estate, the golden rule of networks is: Location, Location, Location."
Closeness Centrality: Fernando and Garth have fewer connections than Diane, yet the pattern of their direct and indirect ties allow them to access all the nodes in the network more quickly than anyone else. They have the shortest paths to all others -- they are close to everyone else. They are in an excellent position to monitor the information flow in the network -- they have the best visibility into what is happening in the network.
Network Centralization: Individual network centralities provide insight into the individual's location in the network. The relationship between the centralities of all nodes can reveal much about the overall network structure.

A very centralized network is dominated by one or a few very central nodes. If these nodes are removed or damaged, the network quickly fragments into unconnected sub-networks. A highly central node can become a single point of failure. A network centralized around a well-connected hub can fail abruptly if that hub is disabled or removed. Hubs are nodes with high degree and betweeness centrality.

A less centralized network has no single points of failure. It is resilient in the face of many intentional attacks or random failures -- many nodes or links can fail while allowing the remaining nodes to still reach each other over other network paths. Networks of low centralization fail gracefully.

Besides, I take a simple description about the basic concept of prestige.
Degree prestige: A page is prestigious if it receives many in-links. The idea is that actors who are prestigious tend to receive many nominations or choices.
Proximity Prestige: Proximity prestige considers how proximate is to the actors in its influence domain. Define proximity as closeness that focuses on distances to rather than from each actor.
Rank prestige: An important aspect that we have ignored so far in the notions of prestige is the importance of the pages that link to i. Surely, if i receives links from important pages, it is probably important itself.

I am very interested in the ranking algorithm in information retrieval, such as PageRank or Hits. When analyzing social network, we will always deal with the web pages. So there are lots of similar methods or ideas as link analysis in information retrieval.
Link-based techniques for analyzing social networks enhance text-based retrieval and ranking strategies. As we shall see, social network analysis was well established long before the Web, in fact, long before graph theory and algorithms became mainstream computer science. Therefore, later developments in evolution models and properties of random walks, mixing rates, and eigensystems (Motwani & Raghavan, 1995) may make valuable contributions to social network analysis, especially in the context of the Web.
First I want to talk about PageRank, not in detail of the algorithm, just some core ideas. While PageRank and HITS were first presented in the same year (1998), PageRank has emerged as the dominant link analysis model for web search and mining. Like rank prestige, PageRank looks at the number of inlinks, together with the importance of these inlinks. It yields a static ranking, that is computed off-line, for each page, and does not depend on the search queries.
From the perspective of prestige, the following intuitions are used to derive the PageRank algorithm:
  • A link from a page pointing to another page is an implicit conveyance of authority to the target page. Hence, the more in-links a page i has, the more prestige it has.
  • Pages that link to page i have their own prestige scores. A page with a higher prestige score pointing to i is more important than a page with a lower prestige score pointing to i.
  • Since a page may point to many other pages, its prestige score should be shared among all the pages that it points to.
In Google, the crawled graph is first used to pre-compute and store the PageRank of each page. Note that the PageRank is independent of any query or textual content. When a query is submitted, a text index is used to first make a selection of possible response pages. Then an undisclosed ranking scheme that combines PageRank with textual match is used to produce a final ordering of response URLs.  All this makes Google comparable in speed, at query time, to conventional text- based search engines.

PageRank is an important ranking mechanism at the heart of Google, but it is not the only one: keywords, phrase matches, and match proximity are also taken into account, as is anchor text on pages linking to a given page. Search Engine Watch (www.searchenginewatch.com) reports that during some weeks in 1999, Google’s top hit to the query “more evil than Satan” returned www.microsoft.com, probably because of anchor text spamming. This embarrassment was fixed within a few weeks. The next incident occurred around November 2000, when Google’s top response to a rather offensive query was www.georgewbushstore.com. This was traced to www.hugedisk.com, which hosted a page that had the offensive query words as anchor text for a hyperlink to www.georgewbushstore.com.Although the details of Google’s combined ranking strategy are unpublished, such anecdotes suggest that the combined ranking strategy is tuned using many empirical parameters and checked for problems using human effort and regression testing. The strongest criticism of PageRank is that it defines prestige via a single random walk uninfluenced by a specific query. A related criticism is of the artificial decoupling between relevance and quality, and the ad hoc manner in which the two are brought together at query time, for the sake of efficiency.

There is something different of HITs from PageRank. HITS stands for hypertext induced topic search. Unlike PageRank, HITS depends on a query. When the user submits a query, HITS first expands the list of pages returned by the search engine and then produces two rankings of the expanded set of pages: authority ranking and hub ranking.

An authority is a page with many in-links; it may have good content on some topic, and hence it is linked to by many. A hub is a page with many out-links, and serves as a good organizer of the information on a certain topic. The idea behind HITS is that a good hub points to many authorities, and a good authority is pointed to by many good hubs.

How does HITS collect pages to be ranked? Given a search query q, HITS collects a set of pages as follows:
1. Submit the query to a search engine. Collect the t (typically, t = 200) highest ranked pages, which we assume to be highly relevant to q. This set is called the root set W.
2. Expand W by including any page pointed to by a page in W and any page that points to a page in W. The result is a larger set called S. As the set can be very large, the algorithm restricts its size by allowing each page in W to import at most m pages pointing to it into S. The set S is called the base set.

The entire process is generically called topic distillation. User studies (Chakrabarti et al., 1999) have shown that reporting hubs is useful over and above reporting authorities, because they provide useful annotations and starting points for users to start exploring a topic.

2012年11月6日星期二

Some Answers of Class Material and Cloud Collaboration

In the last class, we used the reading material “Social Cloud Computing: A Vision for Socially Motivated Resource Sharing” to practice the individual work and the collaboration work through Google Doc. First we read the article and answer the questions individually. Here is my answer:
Question1. What is the definition of Social Cloud?
Answer: A Social Cloud is a resource and service sharing framework utilizing relationships established between members of a social network.

Question2. What are the possible applications of a Social Cloud?
Answer:
  •  A Social Computation Cloud
  •   A Social Storage Cloud
  •   A Social Collaborative Cloud
  •   A Social Cloud for Public Science
  •   An Enterprise Social Cloud
Then, we logged in Google Doc and share the article and our answers. We each used different color to present our perspective in real-time. One posted a question there, and others can answer it below it. The two activities well presents the difference about epistemic cognition, through the individual work, we thought the questions ourselves and through the following group work, we interacted with each other about different perspectives so that we further understood the questions well.
We also learned the concept of the cloud collaboration, which triggers more and more attentions in the social network.
Cloud collaboration defines a set of web-based technologies that allow business users to leverage the power of the cloud to collaborate, share files and work on projects in a secure, scalable and flexible online environment. Cloud collaboration makes content accessible at any time, irrespective of location, providing a significant competitive advantage for any organization that adopts it.

ClCollabration

        As it is shown in the left, cloud technologies have undoubtedly become the most effective way of sharing files, but this is just the tip of the iceberg when it comes to the functionality and benefits of cloud collaboration. A comprehensive enterprise tool will allow users to share files, but also to collaborate, provide feedback, exchange comments and ideas, and work together to get things done in the most efficient way. Features such as files management, whiteboards and discussion boards are key elements of a true cloud collaboration tool and collectively make a powerful resource for any organization.
        Perhaps the most exciting cloud collaboration offering from a product perspective is Google Apps, which includes Mail, Docs, Groups, Sites, and Video. While certainly lagging in some of the functionality delivered by Microsoft, Google continues to add new features at a blistering pace. The company has its own enterprise customer roster and has been actively promoting Google Apps through its Gone Google campaign.
While traditionalists claim that Google’s offerings lack the sophisticated capabilities of Exchange or Office, many see them as light years ahead on the collaboration side. Anyone who has jointly edited a Google Doc should be able to attest to that. And as the world seems to move away from the benefits of fancy font formatting to the speed and efficiency of easy sharing, Google might be in the best position to capitalize on the cloud collaboration race. But perhaps the dark horse is Google’s mobile strategy. Android and the NexusOne phone already appear to be more innovative than Windows Mobile competition, and the integration with Google Apps could dramatically accelerate business adoption.

2012年10月15日星期一

Social psychology and cognition

As we have learned in the past two weeks, I will make some comments that focus on the social psychology and social cognition with social media.
Social psychology is the part of psychology that studies human interaction. It studies how human thoughts, feelings and behaviors are influenced by the actual, imagined, or implied presence of others.
Social media has been so popular to people all around the world. They give the way for people to interact with others in the easiest most convenient way. You can post whatever you want including your pictures and videos. More and more people get addicted to Social media which has significant changed people’s life style. From the points of social psychology, there are some reasons that most people seem to get hooked up in social network media.
1. Human beings are social animals.
They would like to have new friends as often as they like. Most people today have their own media network account and thus if you want to make friends, social networks would be the best place to find one.
2. Game applications
This is the strongest magnet of Facebook. Games usually bends down the age limit bracket for users. Games attract even children. These games are also designed to take so much of your time. Most games give free gift everyday so that gamers have to log in everyday for these gifts not to go to waste. These games are business. Gamers are given the option to pay real money for in-game items and privileges.
3. Advertisements
Due to the number of users all around the world, advertisers have an eye on these media network sites. Advertising in social networks site are super effective and most of all you can advertise at a very low cost. You can just put descriptions of your product and post it for everyone to read. You may also put sample images or even advertising videos.
And I have read an interesting article mentioned about what people look for in social media from a psychology standpoint. They are 7 “A”.
1) To Be Acknowledged.
2) To Gain Attention.
3) To Be Approved Of.
4) To Be Appreciated.
5) To Be Acclaimed.
6) To Feel Assured.
7) To Be A Part Of.

Social cognition is the cognitive processes and structures that influence and are influenced by social behavior. Studies of social cognition focus on how cognition is affected by both wider and more immediate social contexts; how cognition affects our social behavior.
Social Cognitive Theory (SCT) describes learning in terms of the interrelationship between behavior, environmental factors, and personal factors. It also provides the theoretical framework for interactive learning used to develop both Constructivism and Cooperative Learning.
According to SCT, the learner acquires knowledge as his or her environment converges with personal characteristics and personal experience. New experiences are evaluated vis-a-vis the past; prior experiences help to subsequently guide and inform the learner as to how the present should be investigated.
Because SCT is based on understanding an individual’s reality construct, it is especially useful when applied to interventions aimed at personality development, behavior pathology, and health promotion.
Self-regulation:
Self-regulation is what allows a person to control his or her response or behavior when confronted with externally imposed stimuli. Feedback is an externally imposed control that works with a person’s self-regulatory capability in order to make adjustments to behavior. Online learning materials can use feedback techniques to reinforce behavioral change and help learners achieve self-efficacy. For example, when performing a task correctly, the learner can be advised that his or her performance is correct. Conversely, immediate corrective feedback can be given when needed. As the learner’s ability increases, the feedback can become more detailed and sophisticated, which allows the learner to refine and master the task. When learning to drive, for example, the student initially needs to get the vehicle on the road. As the student progresses, however, he or she needs to achieve specific speed limits and signaling requirements to achieve safe and efficient driving habits.

Self-efficacy:
Learning is a function of the extent to which individuals are able to reflect upon and internalize their own successes and failures. Self-efficacy is achieved when the learner identifies his or her ability to perform. Using interactivity in online learning provides a mechanism that allows the learner to apply knowledge accurately and reliably and therefore increase his or her confidence. For example, it is possible to read a book about driving a car, but it is not until the learner actually drives successfully that learning is complete. Interactive, online educational materials can provide extensive, repetitive practice until mastery – and thus self-efficacy – is achieved.



2012年9月24日星期一

A Simple Overview and some feelings about Social Networking!

A Simple Overview and some feelings about Social Networking!

From the first two weeks of lecture learning, I have a clear cognition about what is the concept and differences among social media, social networking, and social computing. 
Social media emphasizes on people's participation and interactions, people use social media as a shift to discover, read and share news and information. It’s a fusion of sociology and technology.  Social networking emphasizes building social relationship over the web. It is included in Social media. Social computing focusing on computing technologies enables social networking. It makes Social Informatics to Social Intelligence. Social networking is facilitated by social computing technologies. In the past, I have mixed the concepts of social media and social network; have not heard of social networking, now I have all understood these differences. I have also known a social technology "Mckinsey" and how it effects on simulate people's behaviors in social networks. I got in touch with some examples and applications of social tasks, such as Location-centered social interactions, Open source, Crowd sourcing, Social brainstorming. Social task is a large view of collaborative work in a social network.

To be honest, I am not a net-addictor. But I have got a real feeling of how social networking changes our daily life and makes social relationship wider. In the primary school, every day I made telephones with my friends. When I entered in middle school, we began to use the mobile phone, use SMSs to contact our friends, and at the same time we used QQ, and learned to use blog to write some personal feelings. Xiaonei.com was popular when we were in the university. We can communicate with friends in real-time, and easily find friends current status, new logs, and new pictures. We can easily find old friend through the social network, which maybe not seen for several years. Afterwards, Weibo came to our life, and the applications can be used in our mobile, so that we can see the latest news from our friends at anytime and anywhere. Through Facebook, we can easily communicate with foreign friends. Social networking really changes our life a lot.

This log is just a simple understanding and some feelings of social networking. I would like to do some further research maybe in security problem or privacy in the social network. If someone is interested in this field, welcome for further discussion!