The Nuvi Hub product is a dedicated Postgres database for an individual company or set of companies (belonging to the same account). This database contains all of the mentions collected for monitors of the owning company/companies. Once configured, Nuvi Hub begins collecting mentions from that point forward. Previously collected data can be backfilled on request. The Nuvi Hub database architecture is designed to provide optimal query times for the most common queries and is unopinionated in that it offers the raw data in a format so that the client must decide how best to consume it. Nuvi Hub contains all the data related to _mentions_ collected *only* (not any other data stored by Nuvi like usage, companies, monitors, etc).
Architecture
The tables provided with all of their fields are documented below for quick reference. For further understanding of the schema, its data types, defaults, and indexes, simply connect to your Postgres instance and query the schema.
Authors
Each post has an author. This table represents the authors of the posts collected.
id: Unique id
username: The username of the author on the source network
network_id: Foreign key reference to the networks table indicating which network the post came from
real_name: When provided, this is the display name of the author
is_verified: A boolean indicating whether the author has a verified account on his/her network
influencer_score: A proprietary score from 1-100 indicating how influencial the author is
followers_count: The number of followers this author has on this network
male_probability: The probability that this author is male (as inferred using a proprietary algorithm)
female_probability: The probability that this author is female (as inferred using a proprietary algorithm)
picture_url: A URL to the profile image of the author
profile_url: A URL to the profile of the author
bio: The biography provided by the user (if the network supports it)
Networks
This shortlist represents each of the network's posts were collected from.
id: Unique id
name: The name of the network
Post_categories
Each post can be categorized in zero to many pre-determined categories.
id: Unique id
post_id: The foreign key to the posts table
category: The plain text name of the category
Post_contents
This table represents the actual _content_ of a post and is separated from the posts table to minimize disk reads. It has a one-to-one relationship with posts.
id: Unique id
post_id: The foreign key to the posts table
content: The actual raw post content
Post_emojis
All the emojis that are contained in collected posts are separated and placed in this table. Each post will have zero to many emojis associated with it.
id: Unique id
post_id: The foreign key to the posts table
emoji: The actual emoji character parsed out of the post
Post_entities
Entities are the concepts (nouns) found within each post. Each post may have zero to many entities.
id: Unique id
post_id: The foreign key to the posts table
entity: The word/entity found in the post
Post_hashtags
All the hashtags found within a post. There can be zero to many per post.
id: Unique id
post_id: The foreign key to the posts table
hashtag: The text of the hashtag (excluding the # symbol)
Post_tags
All the tags found assigned to a post via auto-tagging. There can be zero to many per post.
id: Unique id
post_id: The foreign key to the posts table
tag: The text of the tag
Post_images
This table includes the URLs to all the images associated with a post. There may be zero to many per post.
id: Unique id
post_id: The foreign key to the posts table
image_url: The URL to the image on the network
Post_intents
This table includes text that represents the intent of the author of a post. For example, we may infer that an author intends to buy or sell something from the text. The highest-scoring intent predictions are included here. There may be zero to many intents for each post.
id: Unique id
post_id: The foreign key to the posts table
intent: The text of the intent ('buy', 'sell', etc)
Post_mentions
This table includes all the references to other users within a post (i.e. an @nuvi mention on a tweet).
id: Unique id
post_id: The foreign key to the posts table
network_id: The network the post and mention refer to
username: The username that was mentioned
profile_url: The URL to the profile of the user
Post_scores
There are various NLP scores that can be given to a post. At the moment we score sentiment and subjectivity. This table includes those scores.
id: Unique id
post_id: The foreign key to the posts table
sentiment_score: The probability that the author of the post was feeling positive (if the score is a positive number), or negative (if the score is a negative number)
subjectivity_score: The probability that the post content is subjective vs objective. This is similar to fact vs opinion.
vulgarity_score: The probability that the post content is vulgar. 0 is least vulgar, 1 is most vulgar.
future_tense_score: The probability that the post content is dealing with the future.
present_tense_score: The probability that the post content is dealing with the present.
past_tense_score: The probability that the post content is dealing with the past.
Post_themes
Themes refer to significant phrases parsed out of the posts using NLP. Each post may have zero to many themes.
id: Unique id
post_id: The foreign key to the posts table
theme: The text phrase found in the post
Post_urls
Many posts contain links. This table stores all the URLs parsed out of the post. May have a zero to many relationships.
id: Unique id
post_id: The foreign key to the posts table
url: The actual URL parsed from the post
Post_videos
This table includes the URLs to all the videos associated with a post. There may be zero to many per post.
id: Unique id
post_id: The foreign key to the posts table
video_url: The URL to the video on the network
Posts
The table containing all the metadata about a post (but not the raw content which is stored in post_contents).
id: Unique id which is linked to from most other tables.
social_source_uid: A unique id for the post from the social network (which is referenced by parent_social_source_uid whenever a re-post or retweet occurs).
network_id: The network the post was published on.
social_monitor_uid: The unique id of the social monitor created through Nuvi that this post belongs to. If the same post is collected on two different monitors there will be two copies of it in this database–one for each monitor.
post_created_at: The date and time the post was published.
is_reshare: A boolean indicating whether this post is a retweet, share, etc. If it _is_, the parent_social_source_uid field will point to another post's social_source_uid indicating which post it is a reshare of.
is_comment: Indicates whether this post is merely a comment on another post. If it _is_, parent_social_source_uid will also point to the original post.
is_matched_by_parent: indicates whether this post was matched by the monitor because of its _parent's_ content (when true) or by its _own_ content (when false). For instance, the monitor may have a keyword of "happy birthday" and may match a post containing those words (i.e. "Happy birthday, Lisa!"). A _comment_ on that post may read "We love you!" and will still match because of its _parent_, not because of its own text. This happens when the corresponding flag is set during monitor creation.
parent_social_source_uid: When a post is a child of another post this field points to the parent post's social_source_uid.
author_id: A foreign key to the authors table which contains information about the author of the post.
activity_title: Some posts on some networks have titles. This field contains the text of that title.
activity_url: The original URL where this post can be found.
language: The two-character language code that this post was inferred to be written in.
region_code: When provided by the network this two-character code provides insight about the location the post was published.
country_code: When provided by the network this two-character code provides insight about the location the post was published.
latitude: When provided, this contains the latitude of the location where the post was published.
longitude: When provided, this contains the longitude of the location where the post was published.
Comments
0 comments
Article is closed for comments.