Welcome to the Invelos forums. Please read the forum rules before posting.

Read access to our public forums is open to everyone. To post messages, a free registration is required.

If you have an Invelos account, sign in to post.

    Invelos Forums->DVD Profiler: Contribution Discussion Page: 1... 6 7 8 9 10 ...15  Previous   Next
Credit Name Parsing
Author Message
DVD Profiler Unlimited RegistrantStar ContributorNexus the Sixth
Contributor since 2002
Registered: March 13, 2007
Reputation: High Rating
Sweden Posts: 3,188
Posted:
PM this userView this user's DVD collectionDirect link to this postReply with quote
Quoting Addicted2DVD:
Quote:
Not sure I understand what is gotten from this? I mean how does it help matters at all? I mean parsing still would have to be done for the original 3 name fields.


Parsing would be totally optional. Just fill in the one-name field, the other three fields would still be available for those anal enough to bother about name parsing. 
First registered: February 15, 2002
DVD Profiler Desktop and Mobile RegistrantStar Contributorhal9g
Who is John Galt?
Registered: March 13, 2007
Reputation: High Rating
United States Posts: 6,635
Posted:
PM this userEmail this userView this user's DVD collectionDirect link to this postReply with quote
Quoting Addicted2DVD:
Quote:
I don't know how practical this would be... but what I would like to see...

When updating a profile go strictly with as credited in the credits and not even have to worry about linking here.

But also have a separate contribution system/area for working on the actual linking of names. This way it is as straight forward as possibly can be for contributing profiles. And yet a way to contribute to linking as well.

Yes it would be a separate contribution.... but I think it would make contributing easier for profiles... and then still have an easy... straight forward way to contribute to linking as well.


You're on the right track here, except it needs to optionally available for contributors who want to do linking while contributing the profile itself.
Hal
DVD Profiler Desktop and Mobile RegistrantStar Contributorhal9g
Who is John Galt?
Registered: March 13, 2007
Reputation: High Rating
United States Posts: 6,635
Posted:
PM this userEmail this userView this user's DVD collectionDirect link to this postReply with quote
Quoting DJ Doena:
Quote:
Quoting Addicted2DVD:
Quote:
Quoting T!M:
Quote:
Quoting DJ Doena:
Quote:
If you make the ID optional (see my post) you don't have that problem. You don't have linking either but you wouldn't have that anyway if you press "new ID" all the time.

Granted.


Question here though... if the ID is optional. What goes into the online when you contribute? If they have to have an ID for the online.. how would these ones go into the system with no ID?


Imagine the ID just like the birth year. You don't need the birth year to contribute a cast list, do you?
And you wouldn't need an ID to contribute either. And when someone else downloads the profile he gets it without ID.


Quoting hal9g:
Quote:
Quoting DJ Doena:
Quote:
I could imagine the following system the accomodate both sides:

...


This would mean you would have to be on-line and connected to the online database in order to do linking.  I'm not sure that's the best approach.


You could make a two-step approach. He could look it up locally (e.g. He already has Bruce Willis movies where Bruce Willis has an ID). And only if he doesn't find what he's looking for, he goes online.

The problem is that you can't have it all offline AND sufficient data at the same time. Sure, you could have a list with all the actors and their ID. But even that would be a huge file. For example: IMDb lists over 4 million different people in cast & crew. Take 100 bytes for each and you get a 380 megabyte file alone. And with that you don't have a single movie association. Sure you have two Kevin Smith's with two IDs but you still don't know which is the silent one.


If the file that needed to be downloaded locally was ONLY those cast and crew who have variants or "duplicate names/different person" in the online database, then that file would be significantly smaller than downloading the entire cast/crew list.  Having it local makes much more sense than having to go online to check each and every cast/crew entry, and will much much faster.
Hal
DVD Profiler Unlimited RegistrantStar ContributorT!M
Profiling since Dec. 2000
Registered: March 13, 2007
Reputation: Highest Rating
Netherlands Posts: 8,667
Posted:
PM this userDirect link to this postReply with quote
Quoting DJ Doena:
Quote:
Imagine the ID just like the birth year. You don't need the birth year to contribute a cast list, do you?
And you wouldn't need an ID to contribute either. And when someone else downloads the profile he gets it without ID.

Yet when I download a profile with a cast entry that needs a birth year, but doesn't have one, then it merges with the "lowest" entry that I have in my database - no questions asked. That behaviour is a disaster, since "latching onto the lowest birth year" is just a guess, and is wrong 50% of the time. Since it happens without warning, this behaviour has actually caused much incorrect data being inadvertently submitted. So you'll understand that I wouldn't want this working the same way.

So how would you suggest this works when I download a profile which consists of 80 actors with no ID, while 70 of those actors are already listed in my database with the proper ID? Am I going to end up with double entries in my database for all those people, and will I have to perform manual merging/cleanup after each and every profile that I download? Or are they going to merge automatically? If they do, how will it do that correctly, and not make the same mistake as now happens with birth years? Will it ask me if I'm okay with each merge seventy times, once for each name? Will it provide me with some additional information about the different people involved so that I may make an informed choice? If so, wouldn't it be better if the contributor went through this, instead of everyone who downloads the profile? Because then it's one person doing the work, rather than, say, a hundred people downloading that profile all having to do the exact same thing, independent from each other... 
 Last edited: by T!M
DVD Profiler Unlimited RegistrantStar ContributorAddicted2DVD
Registered: March 13, 2007
Reputation: Highest Rating
United States Posts: 17,318
Posted:
PM this userEmail this userView this user's DVD collectionDirect link to this postReply with quote
Quoting hal9g:
Quote:
Quoting Addicted2DVD:
Quote:
I don't know how practical this would be... but what I would like to see...

When updating a profile go strictly with as credited in the credits and not even have to worry about linking here.

But also have a separate contribution system/area for working on the actual linking of names. This way it is as straight forward as possibly can be for contributing profiles. And yet a way to contribute to linking as well.

Yes it would be a separate contribution.... but I think it would make contributing easier for profiles... and then still have an easy... straight forward way to contribute to linking as well.


You're on the right track here, except it needs to optionally available for contributors who want to do linking while contributing the profile itself.


If that could be worked out... all the better!
Pete
DVD Profiler Desktop and Mobile RegistrantStar ContributorDJ Doena
Registered: May 1, 2002
Registered: March 14, 2007
Reputation: Highest Rating
Germany Posts: 6,738
Posted:
PM this userEmail this userVisit this user's homepageView this user's DVD collectionDirect link to this postReply with quote
Quoting T!M:
Quote:
Quoting DJ Doena:
Quote:
Imagine the ID just like the birth year. You don't need the birth year to contribute a cast list, do you?
And you wouldn't need an ID to contribute either. And when someone else downloads the profile he gets it without ID.

Yet when I download a profile with a cast entry that needs a birth year, but doesn't have one, then it merges with the "lowest" entry that I have in my database - no questions asked. That behaviour is a disaster, since "latching onto the lowest birth year" is just a guess, and is wrong 50% of the time.

So how would you suggest this works when I download a profile which consists of 80 actors with no ID, while 70 of those actors are already listed in my database with the proper ID? Am I going to end up with double entries in my database for all those people, and will I have to perform manual merging/cleanup after each and every profile that I download? Or are they going to merge automatically? If they do, how will it do that correctly, and not make the same mistakes as is now happening with the birth years? Will it ask me if I'm okay with each merge seventy times, once for each name? Will it provide me with some additional information about the different people involved so that I may make an informed choice? If so, wouldn't it be better if the contributor went through this, instead of everyone who downloads the profile? Because then it's one person doing the work, rather than, say, a hundred people downloading that profile all having to do the exact same thing, independent from each other... 


The birth year comparison was just an analogy to show Pete that he doesn't have to link anything if he doesn't want to.

As I said in my original proposal, if an actor entry on a profile does not have an ActorID assigned, it does not link with anyone else. It just exists in the context of that one profile.

But you can assign an ActorID to it and you can contribute that information in the normal contribution process.
And when someone else downloads that profile he will of course get the ActorID as well. And if he already has profiles with actors with that ID, they'll link.

The entire "guesswork linking" will be abandoned.

Imagine the current Edit Cast screen. On the left side are only people with IDs. There is not a single non-ID entry on the left side. If you want to enter an entry to the movie without an ID, you simply press a button, type first, middle, last name & role and you have your entry - without ID. Contributable, but links to no one else.
Karsten
DVD Collectors Online

 Last edited: by DJ Doena
DVD Profiler Unlimited RegistrantStar Contributorscotthm
Registered: March 20, 2007
Reputation: Great Rating
United States Posts: 2,847
Posted:
PM this userView this user's DVD collectionDirect link to this postReply with quote
Quoting hal9g:
Quote:
With the simple linking system, one person would have to contribute the link to her "new" name one time and all profiles in everyone's local db would be linked automatically!

What about when someone contributes a typo?  Then we'd have a "name" that doesn't exist in any film credit assigned to someone.  Do we need to worry about cleaning these up, and how would it be done?

---------------
DVD Profiler Unlimited RegistrantStar ContributorAce_of_Sevens
Registered: December 10, 2007
Reputation: High Rating
Posts: 3,004
Posted:
PM this userView this user's DVD collectionDirect link to this postReply with quote
Quoting DJ Doena:
Quote:
Another idea for the "Request new ID" system.

Imagine a new movie comes out. It has this new rising star actor in it - let's call him Karsten  - who has never played in a movie before.

The Blu-ray will be sold world-wide from Jan 31st. Now 15 different people want to create a profile for their locality. All of them look Karsten up and find nothing. We surely don't want that they create 15 IDs for the same guy, just because he wasn't in the system and contributions are open for voting for 6 days.


Having a profile for the film which is downloaded to each profile rather than separate cast list for each locality should mostly fix this.

Here's a proposal which should deal with all the problems of previous proposals:
Go to a film based system. Have unique identifiers for each film which won't change. These can be simple, like a title and a year with an optional extra field to separate different cuts or movies with the same title and release year. We'd just need a consistent rule for how to enter TV titles. These entries would contain cast, crew, genres and film reviews.

Each DVD entry would have the release-specific info, plus a list of films, which would auto-populate their data into the correct fields. (We'd need some way to deal with genres like accessories, which are not really tied to a film)

Here's where I break from previous proposals. Each cast and crew member would be assigned a unique ID that does not change. This ID would not be arbitrary but either be their current name in the DB plus a birth-year or somethign with a 1:1 equivalence (this would probably be prefereable to prevent confusion). This would make it easy to auto-convert as we don't need to worry about one person submitting Robin Wright Penn as ID 00125612 and another submitting 51157801. It could convert based on some rule that always changes Robin Wright Penn to the same thing. The good part is, when she does a few more movies and her common name goes back to Robin Wright, we don't need to change everything. The ID stays the same. Someone would either have to go into the DB and change her common name, which would then auto-populate to all the hundreds of profiles she's in. The ID itself wouldn't change. The correspondence to credited name is just to help with conversion, not something that needs to be adjusted if names change.

Ken would need to come up with a system to convert names to IDs and to implement it, well, also a system to search for commonly credited names (and a few other tools) to make sure future links were entered correctly, but these are not super-complicated. We would also need to fix all the people who are currently merged in the DB or who have incompletely entered dates of birth or common names, but we will need to do that with any system. It has the following advantages:

* Straight-forward conversion. Minimizes chances of accidental duplication of people.
* Can actually support properly linking all a persons credits and no one else's in a much more universal and maintainable fashion than we have now.
* Minimizes contributor work.
* most importantly for Ken, this should reduce contributions by 70%-80% after the inital conversion, saving lots of server work and man-hours for screeners.
DVD Profiler Desktop and Mobile RegistrantStar Contributorhal9g
Who is John Galt?
Registered: March 13, 2007
Reputation: High Rating
United States Posts: 6,635
Posted:
PM this userEmail this userView this user's DVD collectionDirect link to this postReply with quote
Quoting scotthm:
Quote:
Quoting hal9g:
Quote:
With the simple linking system, one person would have to contribute the link to her "new" name one time and all profiles in everyone's local db would be linked automatically!

What about when someone contributes a typo?  Then we'd have a "name" that doesn't exist in any film credit assigned to someone.  Do we need to worry about cleaning these up, and how would it be done?

---------------


It would be nice to believe that the voting system/screeners would catch these, but realistically, I  suspect some could get through.  Someone auditing these profiles would need to fix it later.  Ken could run a maintenance program periodically against the online database to find any entries in the "linking database" for which no corresponding entries exist in any online profile and then delete it from the "linking database".  This could potentially de-link someone locally when the updated "linking table" is downloaded locally, so a warning would have to accompany any deletions so the user would know it happened and be able to fix it locally.
Hal
DVD Profiler Desktop and Mobile RegistrantStar ContributorExiled
Registered: March 14, 2007
Reputation: High Rating
United Kingdom Posts: 79
Posted:
PM this userDirect link to this postReply with quote
This is the most exciting news in a long time! 

Firstly, I fully support the move to given name and family name. The flag for reverse name order will need to be per credit line to capture the film credits accurately, and contributable. A button or tick box to set or clear the flag for all credits in a profile will speed up the work for Asian profiles.

Secondly, the only way to fix the linking issues is to move to an online database that is film and people centric, as has been suggested by several posters already. No half-measures in release 3.8, please. I'd rather wait for a full overhaul in 4.0. Unicode support will have to wait for 5.0 

Here's how I imagine it would work:

1. Each film shall exist only once in the online db.
2. Each film can have one or more associated credit sets, to cater for credit variations. The initial key for the credit sets shall be the credit language. This will be sufficient for co-productions and other films with localised credits and animation dubbed in various languages, which I believe make up the vast majority of films with credit variants. Additionally, there could be 'Director's cut', 'TV version' and so on.
3. Each DVD profile shall link to a specific movie for production year, original title, genres and possibly production companies, and to a specific credit set for cast and crew.
4. It shall be possible for one DVD profile to link to several movies, for multi-feature discs.
5. When creating a new profile the user shall be prompted if the film exists in the online db (original title/production year) and given the choice to select this film and optionally download the data or create a new film entry - similar to the existing prompt when changing EAN/locality.
6. Each person shall only exist once in the online db. This also applies to people who have both cast and crew credits.
7. When editing a profile, the user shall have the option to link individual cast/crew with existing entries in the online db.
8. When a contribution is approved, cast/crew that have not already been linked by the contributor shall be linked with whatever entry in the online db has the most credits under the same name. I would be fine with the alternative as well: that they become new entries, but I feel the former option requires less subsequent administration and is also in line with the way DVDP works now.
9. Submissions that change movie data (apart from credits) shall be voted on by all users with profiles for the same film. Submissions that change credits, shall be voted on by all users with profiles using that specific credit set.
10. There shall be a way to combine and separate people in the online db. The way Librarything does this for authors is a good example of how this could work. This doesn't necessarily need to be available from DVDP but could be a web only function. Combine/Separate proposals should be up for voting by members with profiles containing the affected people.
11. A similar combine/separate function shall be available for films.
12. If the concept of common name is still needed for display purposes (I'm not convinced it is), the common name shall be determined automatically by the online db and changes propagated in the daily updates.
13. The unique keys for films and cast/crew shall be assigned automatically and shall not be visible to the user. The user shall not have to deal with IDs, roman numerals or similar. Instead, when selecting a film or person amongst several with the same or similar names, it shall be based on the production year and film credits when selecting a film and a film resume (what films he/she is credited in) for people.
14. There shall be a sub-forum for each film to discuss that particular film's credits. Each user shall only have access to the ones for the films that appear in his online collection.

In terms of data migration, I would be in favour of not attempting to create the new film profiles automatically, but rather let them be created/submitted by conscientious contributors. Existing profiles can remain in the old structure, but any updates would force them to the new data model.

Apologies for the long post - I got a bit carried away 

Dag
DVD Profiler Unlimited RegistrantStar ContributorT!M
Profiling since Dec. 2000
Registered: March 13, 2007
Reputation: Highest Rating
Netherlands Posts: 8,667
Posted:
PM this userDirect link to this postReply with quote
Quoting Dag Ove:
Quote:
In terms of data migration, I would be in favour of not attempting to create the new film profiles automatically, but rather let them be created/submitted by conscientious contributors.

Starting from scratch, having to resubmit everything that has been established so far, doesn't exactly tie in with the condition Ken stated:

Quoting Ken Cole:
Quote:
We will not be throwing the baby out with the bath water.  Any replacement system must support the base functionality of the current system, and must maintain the linking work that has already been put into our database.

Additionally, I'm very much against the idea that "existing profiles can remain in the old structure, but any updates would force them to the new data model." Effectively, that would mean that in, say, five year's time, there are still going to be thousands of old-style profiles left, just like we're now still stuck with IMDb-mined profiles ported over from Intervocative that have never been touched since, including many ones that, once upon a time, have been dropped in an incorrect locality, and are still quietly sitting there, messing up the CLT numbers. With your suggestion to leave them untouched until someone updates them, many of them are bound to never get updated to the new format... Far from ideal, IMHO.

Suffice to say that I'd like a new system to affect all profiles, not just the ones that are regularly updated.
 Last edited: by T!M
DVD Profiler Desktop and Mobile RegistrantStar ContributorExiled
Registered: March 14, 2007
Reputation: High Rating
United Kingdom Posts: 79
Posted:
PM this userDirect link to this postReply with quote
Quoting T!M:
Quote:
Quoting Dag Ove:
Quote:
In terms of data migration, I would be in favour of not attempting to create the new film profiles automatically, but rather let them be created/submitted by conscientious contributors.

Starting from scratch, having to resubmit everything that has been established so far, doesn't exactly tie in with the condition Ken stated:

Quoting Ken Cole:
Quote:
We will not be throwing the baby out with the bath water.  Any replacement system must support the base functionality of the current system, and must maintain the linking work that has already been put into our database.

Additionally, I'm very much against the idea that "existing profiles can remain in the old structure, but any updates would force them to the new data model." Effectively, that would mean that in, say, five year's time, there are still going to be thousands of old-style profiles left, just like we're now still stuck with IMDb-mined profiles ported over from Intervocative that have never been touched since, including many ones that, once upon a time, have been dropped in an incorrect locality, and are still quietly sitting there, messing up the CLT numbers. With your suggestion to leave them untouched until someone updates them, many of them are bound to never get updated to the new format... Not ideal, IMHO.

Suffice to say that I'd like a new system to affect all profiles, not just the ones that are regularly updated.

Yes, good point!

What I meant was that instead of an automated process randomly picking one DVD profile to become the basis for the film-centric profile for that film or trying to merge the information in numerous profiles, I would prefer that actual contributors submit the initial profile based on an existing DVD profile that he knows is good. This way, we would build upon existing work whilst introducing at least some level of quality assurance into the process.

For abandonded profiles there could be a grace period, i.e. after a certain periodprofiles that have not been moved to the new data model through a contribution will undergo some kind of automated migration.

In any case, I'm not too fussed how migration is done, as long as we move to a film and people centric database. This would mean such a huge improvement in data quality!

Dag
DVD Profiler Unlimited RegistrantStar ContributorAddicted2DVD
Registered: March 13, 2007
Reputation: Highest Rating
United States Posts: 17,318
Posted:
PM this userEmail this userView this user's DVD collectionDirect link to this postReply with quote
I don't know about anyone else... but at this point I would love to hear something more from Ken to see if he is willing to go in this direction. Everyone here is getting all excited about going to film based credits instead of the profile based. And I have yet to see Ken say he was even willing to do that at this point.

The only thing Ken has said about how to go forward is...

Quote:
We will not be throwing the baby out with the bath water.  Any replacement system must support the base functionality of the current system, and must maintain the linking work that has already been put into our database.  Ideally any replacement system will also improve upon the current system:


Can anyone (Ken or maybe other programmer?) say if film based credits would support the base function of the current system. As Ken said...

"Any replacement system must support the base functionality of the current system"

Since I know nothing about programming I personally have no idea. I would love to see it mind you... but at the same time I am wondering if we are getting away from what Ken requested. I just don't want to get too excited about something if we don't have a chance of seeing it any time soon.
Pete
DVD Profiler Desktop and Mobile RegistrantStar ContributorTaro
Registered: February 23, 2009
Reputation: High Rating
Belgium Posts: 1,580
Posted:
PM this userView this user's DVD collectionDirect link to this postReply with quote
Quoting DJ Doena:
Quote:
As I said in my original proposal, if an actor entry on a profile does not have an ActorID assigned, it does not link with anyone else. It just exists in the context of that one profile.

But you can assign an ActorID to it and you can contribute that information in the normal contribution process.
And when someone else downloads that profile he will of course get the ActorID as well. And if he already has profiles with actors with that ID, they'll link.

The entire "guesswork linking" will be abandoned.

Imagine the current Edit Cast screen. On the left side are only people with IDs. There is not a single non-ID entry on the left side. If you want to enter an entry to the movie without an ID, you simply press a button, type first, middle, last name & role and you have your entry - without ID. Contributable, but links to no one else.

Lot's of interesting ideas being bounced around. I like where DJ Doena as it reconciles well with the worries Addicted2DVD had. So basically, if I sum up your idea:

- keep the existing contribution system, where only text is submitted (credited as info, so to speak)
- possibility to lateron assign ID's to that those text fields, which are donwloaded then and provide linking
- no more automated linking where identical names are auto-linked
- possibility to contribute ID's immediately when submitting online, for those contributors that want to

For the last point, we would need to have a different filtering system in DVDP, when I enter a name, it should show the existing variants, with ID and possibility to check the other profiles they are credited in.

Only question I still have: how do we determine an actor is new and should be assigned a new ID to? That's something I haven't quite figured out with that system.


Also, I think it would be useful to have a second contribution system, where we can add new actor ID's, link ID's, unlink erroneously linked ID's etc. Here, we don't submit profiles but only actor/crew info.
Blu-ray collection
DVD collection
My Games
My Trophies
DVD Profiler Desktop and Mobile RegistrantStar ContributorDJ Doena
Registered: May 1, 2002
Registered: March 14, 2007
Reputation: Highest Rating
Germany Posts: 6,738
Posted:
PM this userEmail this userVisit this user's homepageView this user's DVD collectionDirect link to this postReply with quote
Quoting Taro:
Quote:
Only question I still have: how do we determine an actor is new and should be assigned a new ID to? That's something I haven't quite figured out with that system.


As with common names: You have to know something about the actual person behind the name. How much you have to know depends on your optimism. If you have a John Jones as Sound Technician and there's a sound guy in the DB by that name you could simply assume it's the same guy. And if he's not the same guy then he's either too unimportant for anybody else to notice or someone else - who knows that there are two sound John Joneses - will seperate them.
Karsten
DVD Collectors Online

DVD Profiler Desktop and Mobile RegistrantStar Contributorhal9g
Who is John Galt?
Registered: March 13, 2007
Reputation: High Rating
United States Posts: 6,635
Posted:
PM this userEmail this userView this user's DVD collectionDirect link to this postReply with quote
Quoting Taro:
Quote:

Only question I still have: how do we determine an actor is new and should be assigned a new ID to? That's something I haven't quite figured out with that system.


If we need to add a new actor under this system, and have to submit a request for a new ID first, unless it can be done real-time, it could gum up the works.  Being able to do it real-time would help, but again, it means that you will need to be online in order to do cast/crew contributions.

Quote:
Also, I think it would be useful to have a second contribution system, where we can add new actor ID's, link ID's, unlink erroneously linked ID's etc. Here, we don't submit profiles but only actor/crew info.


I agree wholeheartedly.  This would allow people, like T!M, who are very gung-ho on fixing linking problems (that's a compliment, BTW), a way to do it in a focused way without having to worry about the rest of the fields in the profile.
Hal
    Invelos Forums->DVD Profiler: Contribution Discussion Page: 1... 6 7 8 9 10 ...15  Previous   Next